The Strand Symmetric Model
This chapter is devoted to the study of strand symmetric Markov models on trees from the standpoint of algebraic statistics. A strand symmetric Markov model is one whose mutation probabilities reflect the symmetry induced by the double-stranded structure of DNA see Chapter 4 . In particular, a strand symmetric model for DNA must have the following equalities of probabilities in the root distribution nA nT and nc nc and the following equalities of probabilities in the transition matrices AA 6TT...
Info Vjs
Table 15.2. Trees on 3-taxa with MC. Table 15.2. Trees on 3-taxa with MC. Here we describe the Jukes-Cantor model on the quartet tree see Figure 15.2 in complete detail. Dimension D 5 note that there are only 5 independent parameters, one for each transition matrix. Degree d 34. Number of Probability Coordinates np 15 and the classes are Class 1 pAAAA Class 5 Paacg Paact Paagc Paagt Paatc Paatg Class 8 Pacag Pacat Pagac Pagat Patac Patag Class 11 Paccg Pacct Paggc Paggt Pattc Pattg Class 12...
The sumproduct algorithm for HMMs
The hidden Markov model is one of the simplest and most popular models used in computational biology. In this subsection we will describe the sum-product algorithm for the HMM. We use the notation of Section 1.4, to which we also refer the reader unfamiliar with the model. Suppose that we have an HMM of length n, with hidden states a , i n , taking values in an alphabet with l letters, and observed variables Tj, i n , taking values in the alphabet ' of size l'. The model parameters are the...
Mutagenetic Tree Models
Mutagenetic trees are a class of graphical models designed for accumulative evolutionary processes. The state spaces of these models form finite distributive lattices. Using this combinatorial structure, we determine the algebraic invariants of mutagenetic trees. We further discuss the geometry of mixture models. In particular, models resulting from mixing a single tree with an error model are shown to be identifiable. 14.1 Accumulative evolutionary processes Some evolutionary processes can be...
Multiz
Table 4.2. Sites for downloading genome alignments. Table 4.2. Sites for downloading genome alignments. Alignments of genomes are available for download from a number of websites Table 4.2 . These sites maintain up-to-date versions of genomes, perform regular updates of the alignments, and provide tools for visualizing and retrieving alignments. Popular alignment programs include pairwise aligners such as AVID Bray et al., 2003 and BLASTZ Schwartz et al., 2003 , as well as tools for multiple...
Markov chains
The Markov chain model is a submodel of the toric Markov chain model. Let 01 denote the subset of all matrices 0 R gt 0 whose rows sum to one. The Markov chain model is the image of 01 under the map By a Markov chain we mean any point p in the model f1,n 01 . This definition agrees with the familiar description of Markov chains in Durbin et al., 1998, Chapter 3 , except that here we require the initial distribution at the first state to be uniform. This assumption is made to keep the exposition...
Info Hlm
The entries in D 3 are the lengths of the shortest paths in the graph G. The tropical computation above can be related to the following matrix computation in ordinary arithmetic. Let e denote an indeterminate, and let AG e be the n x n matrix whose entries are the monomials edij. In our example, Now compute the third power of this matrix in ordinary arithmetic 1 3e3 3e e4 3e2 3e3 e3 6e4 3e2 4e5 1 3e3 3e e3 3e2 3e3 3e4 2e6 3e4 6e5 1 3e2 3e e3 6e5 3e6 3e3 e5 3e e3 1 3e2 The entry of A e 3 in row...
Tropical Pfaffian
linear subspace Lq C R6. Each of the 15 defining linear forms has four terms 1J0 1J2 3 qjojlJ2 Xj3 qjojlJ3 Xj2 q QJ2J3 Xjl q lJ2J3 Xjo If we work in tropical projective 5-space, i.e. modulo the equivalence relation Xi,X2, X3,X4, X5, X6 A 0 X2, X3, X4, X5, X6 , then Lq is a union of planar polygons. We call Lq a phylogenetic surface. A phylogenetic surface is a two-dimensional geometric representation of the dissimilarities among triples of taxa, just like a phylogenetic tree is a...
Bcde Ggggggg
Chapter 8 raised the question of whether all the lattice points inside the projected alignment polygon Pt come from alignments of the pair of sequences t. The construction in the proof of Proposition 9.4 gives instances for which some lattice points inside Pt do not come from any alignment. For example, take Pt to be the projection of the alignment polygon corresponding to the pair of sequences in Figure 9.2. The points 1,1 , 2,1 and 2,2 lie inside Pt. However, there is no alignment of these...
Polytope Propagation on Graphs
Polytope propagation associated with hidden Markov models or arbitrary tree models, as introduced in Pachter and Sturmfels, 2004a , can be carried to a further level of abstraction. Generalizing polytope propagation allows for a clearer view of the algorithmic complexity issues involved. We begin with the simple observation that a graphical model associated with a model graph G, which may or may not be a tree, defines another directed graph r G which can roughly be seen as a product of G with...
Info Pgb
We are interested in the number of optimality regions, or equivalently, the number of vertices of Pxy. The parameters are only biologically meaningful for a, ft gt 0, the case in which spaces and mismatches are penalized. Thus, we will only consider the optimality regions intersecting this quadrant. Equivalently, we are only concerned with the vertices of Pxy along the lower-left border i.e., those for which there is no other vertex below and to the left . Note that for a ft -1, the score of...
The EM Algorithm for Hidden Markov Models 1
Ingileif B. Hallgrimsdottir R. Alexander Milowski Josephine Yu As discussed in Chapter 1, the EM algorithm is an iterative procedure used to obtain maximum likelihood estimates MLEs for the parameters of statistical models which are induced by a hidden variable construct, such as the hidden Markov model HMM . The tree structure underlying the HMM allows us to organize the required computations efficiently, which leads to an efficient implementation of the EM algorithm for HMMs known as the...
A small alignment example
To illustrate our algorithm, we give below a small example of parametric sequence alignment under a highly simplified version of the scoring scheme of Section 2.2. The parameters of our model are a reward assigned to all matches and a penalty assigned to all mismatches and gaps. We disregard completely the scores assigned to transitions between hidden nodes. In the language of Section 2.2, this is equivalent to w'XY 0 for all X, Y H, I, D , wa,a x for all a A, C, G, T and wa, b y for all a, b...
pi0 pi4 pi5 pi0
This expression is the likelihood function of DiaNA's model for the data 1.1 . To stress the fact that the parameters 0i and 02 are unknowns we write L 01,02 pa 01,02 10 pc 01,02 14 pg 01,02 15 pt 01,02 10. This likelihood function is a real-valued function on the triangle 6 01, 02 e R2 01 gt 0 and 02 gt 0 and 01 02 lt 1 . In the maximum likelihood framework we estimate the parameter values that DiaNA used by those values which make the likelihood of observing the data as large as possible....

