ProbCons

From testwiki
Jump to navigation Jump to search

Template:Short description

In bioinformatics and proteomics, ProbCons is an open source software for probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT.[1][2]

Algorithm

The following describes the basic outline of the ProbCons algorithm.[3]

Step 1: Reliability of an alignment edge

For every pair of sequences compute the probability that letters xi and yi are paired in a* an alignment that is generated by the model.

P(xiyi|x,y) =def Pr[xiyi in some a|x,y]= alignment awith xiyiPr[a|x,y]= alignment a𝟏{xiyia}Pr[a|x,y]

(Where 𝟏{xiyia} is equal to 1 if xi and yi are in the alignment and 0 otherwise.)

Step 2: Maximum expected accuracy

The accuracy of an alignment a* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence.

Calculate expected accuracy of each sequence:

EPr[a|x,y](acc(a*,a))=aPr[a|x,y]acc(a*,a)=1min(|x|,|y|)a𝟏{xiyia}Pr[a|x,y]=1min(|x|,|y|)xiyiP(xiyj|x,y)

This yields a maximum expected accuracy (MEA) alignment:

E(x,y)=argmaxa*EPr[a|x,y](acc(a*,a))

Step 3: Probabilistic Consistency Transformation

All pairs of sequences x,y from the set of all sequences 𝒮 are now re-estimated using all intermediate sequences z:

P(xiyi|x,y)=1|𝒮|z1k|z|P(xizi|x,z)P(ziyi|z,y)

This step can be iterated.

Step 4: Computation of guide tree

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.

Step 5: Compute MSA

Finally compute the MSA using progressive alignment or iterative alignment.

See also

References

Template:Reflist