|
|
|
|
Genome Res. 17:127-135, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 Review Multiple sequence alignment: In pursuit of homologous DNA positionsCenter for Evolutionary Functional Genomics, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, Arizona 85287-5301, USA
DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.
The relative positions of nucleotides within the same gene in different species and in duplicated genomic regions are disturbed by insertion and deletion of stretches of DNA over evolutionary time. This leads to differences in the length of the homologous regions in the genome, with more distant relatives having a higher likelihood of sequence length difference. A comparison of lengths of genome segments spanning protein-coding genes in human and mouse shows the extent of the effect of evolution by insertions and deletions (Fig. 1). The lengths of noncoding orthologous sequences have also evolved substantially after divergence over 90 million years ago. A grand challenge in comparative genomics is to line up these bases by inserting gaps in sequences, because genomic analyses must be based on comparisons between bases at positions (sites) that coincided in a common ancestor. The task is to re-establish (estimate) the ancestral site-wise homology obfuscated by the insertiondeletion and substitution processes. Naturally, this operation has come to be known as "alignment," and the resulting set of sequences, all of which are the same length (taking gaps in to account), is also called an alignment (Fig. 2). We can distinguish between "pairwise" alignments, in which sequences, even if they are part of a larger set, are aligned only in pairs, and "multiple" alignments, in which more than two sequences are aligned simultaneously.
Alignment procedures may also be classified as either "global" or "local." In the simplest form, sequences are aligned beginning to end to produce global alignments. This is appropriate for sequences of protein-coding genes and for short stretches of the genomic sequence. For longer genomic DNA, it is necessary to account for medium and large-scale rearrangements in addition to large sequence insertions and deletions, which necessitates the building of local alignments. Local alignments differ from global alignments in that the former focus on shared regions of high similarity while ignoring regions that do not show high sequence homology between sequences. Unlike traditional global and local alignment methods that assume colinearity of homologous segments among sequences, "glocal" alignment methods model the rearrangement process explicitly during the alignment procedure itself and result in a nonlinear mapping between homologous regions of different sequences (Brudno et al. 2003b
For most applications in the areas of molecular phylogenetics and evolution, we are interested in properties and relationships of "rows" of the alignment, which represent species, genes, or genomic regions (Fig. 2). Examples of these include inference of multigene family and species phylogenies, determination of evolutionary rates in different lineages, and identification of bouts of selection and patterns of DNA sequence change over time (Nei and Kumar 2000
The amount of nucleotide sequence data in GenBank and other public databases has expanded exponentially since the inception of these electronic warehouses; the data now consist of over 125 billion base pairs of sequence data from over 200,000 organisms. Understanding the functional significance of these data has become the central problem in comparative genomics. Naturally, because of this great volume of data, scientists would like to establish DNA homologies by applying one or more of the highly innovative alignment methods available today in an automated high-throughput fashion (Thompson et al. 1994
The process of alignment involves insertion of gaps into sequences to make them the same length. These gaps are hypotheses about the site homologies resulting from historical insertiondeletion events. Since mutations can cause two homologous sites to differ from each other (substitutions), the complexities of the alignment process transcend traditional string-matching problems in computer science. The interplay of insertiondeletion events and substitutions over thousands and millions of years produces sequences that may lead to many different alignments, with some containing more gaps than others. One may prefer a specific alignment over another if some "optimality" score is better (Needleman and Wunsch 1970
The scoring functions incorporate differences in the likelihood of change from one base to another and of inserting of gaps of various lengths. It is well-established, for example, that transitional mutations are much more common than transversional mutations (Vogel and Kopun 1977
Insurmountable computational demands limit our ability to generate optimal alignment for more than a few sequences. Finding an optimal alignment for a single pair of sequences, even very long genomic ones, has become practical using dynamic programming methods that now require running times and memory requirement proportional to the total lengths of the two sequences (Needleman and Wunsch 1970
In heuristic approaches, no overall optimal alignment is sought. Instead, it is hoped that optimizing pairwise alignments will lead to a "good" solution. The time efficiency of this approach has led to its becoming the standard, as reflected in its implementation in a variety of well-used software packages (Thompson et al. 1994
Up to this point, we have primarily focused on DNA sequence alignments, but the discussion applies in principle to the alignment of amino acid sequences. In the latter, the probability of substitution from one amino acid to another is incorporated by using 20 x 20 scoring matrices (e.g., PAM and BLOSUM) to accommodate differences in different types of amino acid substitutions (Dayhoff et al. 1978
A major current focus of study in comparative genomics is the identification of short motifs important for gene regulation. Many motif discovery tools have been developed with the common approach to construct a multiple alignment of homologous sequences and identify short stretches of DNA positions that are more conserved over disparate genomes than would be expected by chance (see Stojanovic et al. 1999
The accuracy of alignment algorithms has been assessed directly by measuring the fraction of positions that are aligned correctly between sequence pairs in multiple sequence alignments in computer-simulated data (Altschul and Gish 1996
Limits on the homology accuracy in Figure 3A are for DNA segments in which all positions are evolving strictly neutrally (i.e., without any natural selection). However, a more realistic scenario is to consider situations in which genomic segments contain highly conserved blocks, because natural selection acts to keep important motifs intact to maintain function. As expected, the existence of conserved blocks enhances the homology accuracy and makes the performance of different methods more similar. Differences do exist, however. Global alignment programs (e.g., ClustalW) perform worse than the procedures that are essentially local in nature (e.g., Lagan, DiAlign), especially for highly divergent sequences (Fig. 3B). The local alignment methods work better because they look for regions of very high sequence similarity and, thus, evolutionary conservation (Smith and Waterman 1981
The probability of aligning multiple adjacent sites correctly in a set of sequences is a direct function of the homology accuracy at each position, so it is evident that identifying short genomic segments involved in gene regulation via comparative sequence analysis is likely to produce many false-negatives if the sequences involved are highly divergent. This is clearly seen in a dramatic decline in finding short motifs after the addition of a nonmammalian species (chicken) to a data set containing placental mammals (human, chimp, mouse, and rat) (Prakash and Tompa 2005
The use of multiple sequences and the choice of species critically impacts motif discovery. Certainly, pairwise sequence analysis can be carried out in the absence of a robust phylogeny (Elnitski et al. 2003
Many DNA segments will be found to be conserved in closely related species, but such false positives can be detected by using many sequences and by estimating false-positive rates (Cooper et al. 2005
Knowledge of correct evolutionary relationships is important in motif discovery. Efforts have been made to determine how the use of shared ancestry (specified using phylogenetic relationships) enhances the accuracy of motif detection over simple treatment of sequences as an aligned set (Dubchak and Frazer 2003
Evolutionary distances are routinely estimated from pairs of aligned sequences and are used for inferring phylogenies, divergence times, and rates of evolution (Nei and Kumar 2000
Multiple sequence alignments are considered better than pairwise alignments because more similar sequences will act as intermediates between highly dissimilar ones (Lesk 2005
Another measure of homology accuracy useful for motif detection is the fraction of sites successfully aligned when sequences are added one-by-one to the data set. This is used when the true sequence alignment is not known, as in empirical data analysis. Indeed, the number of sites aligned between two distantly related sequences increases as the data set is expanded by adding more sequences intermediate to two distantly related sequences in question (Margulies et al. 2006
The common modus operandi for building phylogenetic trees is to align a data set using some program, inspect the result for obvious error, perhaps realign, and then infer a phylogeny from the result. Current computer simulations aimed at deciphering the effect of sequence homology accuracy on phylogenetic reconstruction for various shapes of trees and phylogenetic methods show that, on average, homology accuracy correlates with the accuracy of the inferred phylogeny (Ogden and Rosenberg 2006
The inevitable need for the use of heuristic procedures in sequence alignment contributes significantly toward phylogenetic error, when the sequences being aligned have undergone a large number of substitutions and many insertiondeletion events. A key component of any progressive (heuristic) alignment procedure is the guide tree that sets up the order in which sequences (and sequence profiles) are aligned. Guide trees can be created in different ways. ClustalW, for example, creates a matrix of pairwise distances by aligning each sequence pair separately and computing a dissimilarity score from each pairwise alignment. This dissimilarity matrix is then subjected to the neighbor-joining algorithm to generate a directional hierarchy of sequence relationships (Thompson et al. 1994
Guide-tree errors are known to have serious effects on downstream phylogenetic inference, as the increase in the phylogenetic error rate is found to be associated with errors in the guide trees (Lake 1991
Disregarding the effect of guide-tree errors on evolutionary and phylogenetic inference is no longer tenable, because these errors are amplified in todays large data sets. The genomic revolution in building the Tree of Life has now taken root and very long sequences are being used to establish key species relationships (Hedges 2002
When is it appropriate to use genomic multiple sequence alignments available in various database resources, which use a specific phylogeny as a guide tree? The answer depends on the purpose of the analysis. To begin with, the use of such multiple sequence alignments for inferring species phylogenies will be circular and will often (but not always) produce outcomes that merely reflect bias introduced by the alignment procedure (Fig. 7). This will be particularly problematic for traditionally hard-to-resolve phylogenetic relationships, because they are often associated with short internal branches (i.e., small amount of evolutionary change) in the phylogenetic trees. Errors introduced by alignment bias are expected to affect these parts of the phylogeny most severely, as the phylogenetic signal may be overwhelmed by the bias in the sequence alignment. For example, resolving the phylogenetic relationship of major groups of mammals using the sequence alignment of ENCODE data is not desirable, because these alignments are constructed using a guide tree that best reflects our current understanding of mammalian and vertebrate species relationships (http://www.genome.gov/10005). On the other hand, these alignments are appropriate for inferring ancestral genomes and times of species divergence events, because those procedures and all the results obtained are explicitly conditional on the evolutionary tree used. Obviously, one would generally use the same evolutionary tree for aligning sequences and for conducting evolutionary analyses, as long as it is substantiated. Otherwise, the use of an incorrect phylogeny will not only produce ancestral sequences and speciation times for ancestors that never existed, but it will also bias results for those that indeed existed. On the other hand, errors in the guide tree are expected to have a significantly lesser impact on the estimation of evolutionary rates at individual positions (or in sliding windows), and thus on motif finding, as such summary statistics are derived using a large amount of data for each position when many species are used (see, e.g., Yang and Kumar 1996
One unexplored wrinkle in the guide-tree conundrum is the observation that closely related species can show very different base compositions in homologous genomic segments. For instance, equality of substitution pattern can be rejected in
The multiple sequence alignment procedure forms the backbone of comparative and evolutionary genomics. Results from some recent studies involving computer simulations and large-scale genomic data have begun to clarify and quantify sources of bias and the effects of alignment on subsequent downstream processing. Because of rapid growth of large scale data sets and increasing applications of multiple sequence alignments to understand patterns and processes that govern gene, genome, and species evolution, it would be prudent to further intensify these investigations and make their conclusions more accessible to practicing biologists.
We thank Vinod Swarna for his assistance with data analysis (Fig. 7), Drs. Sonja Prohaska and Michael Rosenberg for comments on an earlier version of this manuscript, and Ms. Kristi Garboushian for editorial support. We also thank three anonymous referees for many insightful suggestions. This work was supported in part by a research grant from National Institutes of Health to S.K.
1 Corresponding author.
E-mail s.kumar{at}asu.edu; fax (480) 727-6947. Article is online at http://www.genome.org/cgi/doi/10.1101/gr.5232407
Altschul, S.F. 1991. Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219: 555565.[CrossRef][Medline] Altschul, S.F. and Gish, W. 1996. Local alignment statistics. Methods Enzymol. 266: 460480.[Medline] Barton, G.J. and Sternberg, M.J. 1987. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J. Mol. Biol. 198: 327337.[CrossRef][Medline] Bashir, A., Ye, C., Price, A.L., and Bafna, V. 2005. Orthologous repeats and mammalian phylogenetic inference. Genome Res. 15: 9981006. Bergman, C.M. and Kreitman, M. 2001. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11: 13351345. Blanchette, M. and Tompa, M. 2003. FootPrinter: A program designed for phylogenetic footprinting. Nucleic Acids Res. 31: 38403842. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., and Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708715. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 13911394. Bray, N. and Pachter, L. 2004. MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14: 693699. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., and Batzoglou, S. 2003a. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721731. Brudno, M., Malde, S., Poliakov, A., Do, C.B., Couronne, O., Dubchak, I., and Batzoglou, S. 2003b. Glocal alignment: Finding rearrangements during alignment. Bioinformatics 19: i54i62.[Abstract] Bulyk, M.L. 2003. Computational prediction of transcription-factor binding site locations. Genome Biol. 5: 201.[CrossRef][Medline] Cammarano, P., Creti, R., Sanangelantoni, A.M., and Palm, P. 1999. The archaea monophyly issue: A phylogeny of translational elongation factor G(2) sequences inferred from an optimized selection of alignment positions. J. Mol. Evol. 49: 524537.[CrossRef][Medline] Cerchio, S. and Tucker, P. 1998. Influence of alignment on the mtDNA phylogeny of Cetacea: Questionable support for a Mysticeti/Physeteroidea clade. Syst. Biol. 47: 336344.[CrossRef][Medline] Clark, A.G., Glanowski, S., Nielsen, R., Thomas, P.D., Kejariwal, A., Todd, M.A., Tanenbaum, D.M., Civello, D., Lu, F., and Murphy, B., et al. 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302: 19601963. Cooper, G.M., Stone, E.A., Asimenos, G., Green, E.D., Batzoglou, S., and Sidow, A. 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15: 901913. Covert, M.W., Knight, E.M., Reed, J.L., Herrgard, M.J., and Palsson, B.Ø. 2004. Integrating high-throughput and computational data elucidates bacterial networks. Nature 429: 9296.[CrossRef][Medline] Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1978. A model of evolutionary change in proteins. In Atlas of protein sequence and structure (ed. M.O. Dayhoff,), pp. 345352. National Biomedical Research Foundation, Washington, D.C. Delcher, A.L., Phillippy, A., Carlton, J., and Salzberg, S.L. 2002. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30: 24782483. Dubchak, I. and Frazer, K. 2003. Multi-species sequence comparison: The next frontier in genome annotation. Genome Biol. 4: 122.[CrossRef][Medline] Eddy, S.R. 1995. Multiple alignment using hidden Markov models. Proceedings of the Third International Conference on Intelligence Systems for Molecular Biology 114120 AAAI Press, Menlo Park, CA. Elango, N., Thomas, J.W., and Yi, S.V. 2006. Variable molecular clocks in hominoids. Proc. Natl. Acad. Sci. 103: 13701375. Elnitski, L., Hardison, R.C., Li, J., Yang, S., Kolbe, D., Eswara, P., OConnor, M.J., Schwartz, S., Miller, W., and Chiaromonte, F. 2003. Distinguishing regulatory DNA from neutral sites. Genome Res. 13: 6472. ENCODE Project Consortium 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636640. Felsenstein, J. 2002. Inferring Phylogenies. Sinauer Associates, Sunderland, MA. Feng, D.F. and Doolittle, R.F. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25: 351360.[Medline] Fitch, W.M. and Smith, T.F. 1983. Optimal sequence alignments. Proc. Natl. Acad. Sci. 80: 13821386. Fleissner, R. 2003. "Sequence Alignment and Phylogenetic Inference." Ph.D. thesis. Heinrich-Heine-Universität, Düsseldorf. Fleissner, R., Metzler, D., and von Haeseler, A. 2000. Can one estimate distances from pairwise sequence alignments? In German Conference on Bioinformatics (eds. E. Bornberg-Bauer, et al.), pp. 8996. Logos Verlag, Heidelberg. Fleissner, R., Metzler, D., and von Haeseler, A. 2005. Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54: 548561.[CrossRef][Medline] Gadagkar, S.R. and Kumar, S. 2005. Maximum likelihood outperforms maximum parsimony even when evolutionary rates are heterotachous. Mol. Biol. Evol. 22: 21392141. Gaucher, E.A., Gu, X., Miyamoto, M.M., and Benner, S.A. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem. Sci. 27: 315321.[CrossRef][Medline] Gertz, J., Fay, J.C., and Cohen, B.A. 2006. Phylogeny based discovery of regulatory elements. BMC Bioinformatics 7: 266.[CrossRef][Medline] Gottgens, B., Barton, L.M., Chapman, M.A., Sinclair, A.M., Knudsen, B., Grafham, D., Gilbert, J.G., Rogers, J., Bentley, D.R., and Green, A.R. 2002. Transcriptional regulation of the stem cell leukemia gene (SCL)comparative analysis of five vertebrate SCL loci. Genome Res. 12: 749759. Gu, X. and Li, W.H. 1995. The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol. 40: 464473.[CrossRef][Medline] Hardison, R.C. 2000. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16: 369372.[CrossRef][Medline] Hedges, S.B. 2002. The origin and evolution of model organisms. Nat. Rev. Genet. 3: 838849.[CrossRef][Medline] Hedges, S.B. and Kumar, S. 2003. Genomic clocks and evolutionary timescales. Trends Genet. 19: 200206.[CrossRef][Medline] Hein, J. 1990. Unified approach to alignment and phylogenies. Methods Enzymol. 183: 626645.[Medline] Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89: 1091510919. Hertwig, S., de Sá, R.O., and Haas, A. 2004. Phylogenetic signal and the utility of 12S and 16S mtDNA in frog phylogeny. J. Zoological Syst. Evol. Res. 42: 218. Holmes, I. and Durbin, R. 1998. Dynamic programming alignment accuracy. J. Comput. Biol. 5: 493504.[Medline] Jermiin, L., Ho, S.Y., Ababneh, F., Robinson, J., and Larkum, A.W. 2004. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53: 638643.[CrossRef][Medline] Keich, U. and Pevzner, P.A. 2002. Subtle motifs: Defining the limits of motif finding algorithms. Bioinformatics 18: 13821390. Kolaczkowski, B. and Thornton, J.W. 2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431: 980984.[CrossRef][Medline] Koonin, E.V. and Galperin, M.Y. 2003. SequenceEvolutionFunction: Computational Approaches in Comparative Genomics. Kluwer Academic, Boston. Kumar, S. and Gadagkar, S.R. 2001. Disparity index: A simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158: 13211327. Kumar, S. and Subramanian, S. 2002. Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. 99: 803808. Kumar, S., Tamura, K., and Nei, M. 2004. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief. Bioinform. 5: 150163. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. 2004. Versatile and open software for comparing large genomes. Genome Biol. 5: R12.[CrossRef][Medline] Lake, J.A. 1991. The order of sequence alignment can bias the selection of tree topology. Mol. Biol. Evol. 8: 378385.[Medline] Landan, G. 2005. Multiple sequence alignment errors and phylogenetic reconstruction. In Zoology, p. 93. Tel Aviv University, Tel Aviv. Lebrun, E., Santini, J.M., Brugna, M., Ducluzeau, A.L., Ouchane, S., Schoepp-Cothenet, B., Baymann, F., and Nitschke, W. 2006. The Rieske protein: A case study on the pitfalls of multiple sequence alignments and phylogenetic reconstruction. Mol. Biol. Evol. 23: 11801191. Lesk, A.M. 2005. Introduction to bioinformatics. Oxford University Press, Oxford, New York. Lipman, D.J., Altschul, S.F., and Kececioglu, J.D. 1989. A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. 86: 44124415. Lunter, G., Miklos, I., Drummond, A., Jensen, J.L., and Hein, J. 2005. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6: 83.[Medline] Margulies, E.H., Vinson, J.P., Miller, W., Jaffe, D.B., Lindblad-Toh, K., Chang, J.L., Green, E.D., Lander, E.S., Mullikin, J.C., and Clamp, M. 2005. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc. Natl. Acad. Sci. 102: 47954800. Margulies, E.H., Chen, C.W., and Green, E.D. 2006. Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. Trends Genet. 22: 187193.[CrossRef][Medline] McCue, L.A., Thompson, W., Carmack, C.S., and Lawrence, C.E. 2002. Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 12: 15231532. Miklos, I., Lunter, G.A., and Holmes, I. 2004. A "Long Indel" model for evolutionary sequence alignment. Mol. Biol. Evol. 21: 529540. Morgenstern, B. 1999. DIALIGN 2: Improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15: 211218. Morgenstern, B., Frech, K., Dress, A., and Werner, T. 1998. DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14: 290294. Morgenstern, B., Prohaska, S.J., Pohler, D., and Stadler, P.F. 2006. Multiple sequence alignment with user-defined anchor points. Algorithms Mol. Biol. 1: 6.[CrossRef][Medline] Morrison, D.A. and Ellis, J.T. 1997. Effects of nucleotide sequence alignment on phylogeny estimation: A case study of 18S rDNAs of apicomplexa. Mol. Biol. Evol. 14: 428441.[Abstract] Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. Comput. Appl. Biosci. 4: 1117. |