|
|
|
|
Genome Res. 14:2412-2423, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Letter Reconstructing large regions of an ancestral mammalian genome in silico1 Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA 2 National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA 3 Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can expect to get 98% of the bases correct in reconstructing megabase-scale euchromatic regions of an eutherian ancestral genome from the genomes of 20 optimally chosen modern mammals. Using actual genomic sequences from 19 extant mammals, we reconstruct 1.1 Mb of ancient genome sequence around the CFTR locus. Detailed examination suggests the reconstruction is accurate and that it allows us to identify features in modern species, such as remnants of ancient transposon insertions, that were not identified by direct analysis. Tracing the predicted evolutionary history of the bases in the reconstructed region, estimates are made of the amount of DNA turnover due to insertion, deletion, and substitution in the different placental mammalian lineages since the common eutherian ancestor, showing considerable variation between lineages. In coming years, such reconstructions may help in identifying and understanding the genetic features common to eutherian mammals and may shed light on the evolution of human or primate-specific traits.
Following completion of the human genome sequence, there is now considerable interest in obtaining a more comprehensive understanding of its evolution (International Human Genome Sequencing Consortium [IHGSC] 2001
The hope of learning about long extinct species by recovering and cloning their DNA has engaged the popular as well as the scientific imagination, but the reality of such endeavors falls short of expectations on two grounds. The first is lack of information; there is not enough intact DNA in the modern remains of species that have been extinct for many millions of years to infer ancestral genome sequences (Austin et al. 1997
Maximum likelihood algorithms for the reconstruction of ancestral amino acids or DNA bases have been developed and used by several groups (Yang et al. 1995
We argue that a good target species for a genomic reconstruction is one that has generated a large number of independent, successful descendant lineages through a rapid series of ancestral speciation events. In this case, the problem can be viewed as attempting to reconstruct an original from many independent noisy copies. In the limit of an instantaneous radiation, the accuracy of the reconstruction approaches 100% exponentially fast as the number of copies increases (see Discussion). From the Cretaceous period, a good choice for reconstruction would be the genome of the eutherian ancestor, as this species is believed to have spawned the relatively rapid radiation of the different lineages of modern placental mammals (see Eizirik et al. 2001
Simulations To assess the reconstructability of ancestral mammalian genomic sequences, we performed a series of computational simulations of the neutral evolution of a hypothetical 50 Kb ancestral genomic region into orthologous regions in 20 modern mammals (Fig. 1). Simulation parameters for substitution, deletion, and insertion were based on the analysis of 1.8 Mb of data from nine mammals in the regions orthologous to the human CFTR locus (Margulies et al. 2003
A crucial first step toward reconstructing ancestral sequences is to build an accurate multiple alignment of the extant sequences, thus establishing orthology relationships among the nucleotides of each sequence. To this end, we used a multiple-sequence alignment tool called TBA (Blanchette et al. 2004
We compared the actual ancestral sequence used in our simulations with the predicted ancestral sequence by aligning them and counting the number of missing bases (those present in the actual ancestor, but not in the reconstruction), added bases (present in the reconstruction, but not in the actual ancestor), and mismatch errors (positions in the reconstruction assigned the incorrect nucleotide). The sum of the rates of all three types of errors was calculated separately at each ancestral node in the phylogenetic tree (Fig. 1). The results showed that under this phylogenetic tree with a relatively rapid placental mammalian radiation, the neutral nonrepetitive regions of the Boreoeutherian ancestral genome that have evolved like those in our simulations can be reconstructed with about 99% base-by-base accuracy from the genomes of 20 present-day mammals. Repetitive regions are not reconstructed as accurately, because they are more often involved in misalignments, which can result in incorrect predictions. Nonetheless, even counting errors in repetitive regions, the total accuracy is >98%. If a reconstructed base is chosen at random, chances are it lies at least within a 343-bp error-free sequence, showing that reconstruction errors are often clustered together, leaving large error-free regions. The simulated and reconstructed sequences, as well as statistics validating the simulation process, are available at http://genome.ucsc.edu/ancestors Looking at the reconstructability in other ancestral species in the tree, a strong "local tree topology effect" is seen, whereby ancestral sequences at the center of rapid radiations are much more reconstructable than those with longer incident branches. This effect is so strong that sequences of early eutherians living in times of rapid radiation can be reconstructed more accurately than those of most of the more recent ancestors. Examining reconstructions made using smaller subsets of this set of 20 species, it was found that, including repetitive regions, an accuracy of about 97% can be achieved using only 10 species chosen to sample most major mammalian lineages (Fig. 2). Sampling only five of the most slowly evolving lineages yields an accuracy of about 94%. Little is gained with our current reconstruction procedures by adding more than 10 species, because the risk of misalignment increases, while the unavoidable loss of information in the early branches persists (dashed box, Fig. 1; also see Discussion). However, further improvements to the multiple alignment methodology might change this.
The accuracy of the reconstruction depends crucially on the length of the early branches. Additional simulations (Supplemental Fig. S1) revealed that if the major placental lineages had diverged instantaneously (early branches of length zero, see Fig. 1), we would be able to reconstruct the simulated Boreoeutherian ancestral sequence, including repetitive regions, with <1% error. In contrast, if the early branch lengths inferred by Eizirik et al. (2001
The accuracy of the reconstruction is less dependent on the overall branch length, within reasonable limits. If the neutral substitution and indel rates used in the model are increased by 25%, which is considerably more than the typical 10% regional neutral rate fluctuations observed in different genomic regions in humanmouse genome comparisons (Hardison et al. 2003
An important assumption in our reconstruction procedure is that the topology of the phylogenetic tree is known in advance. Since the early branches of the eutherian tree are very short, there remains some uncertainty about the precise branching order of the main mammalian phyla. Moreover, in situations of rapid speciation, different regions of the genome may actually have different phylogenetic trees because of incomplete lineage sorting due to different recombination histories (Shedlock et al. 2000
Finally, in addition to estimates of the overall accuracy of the reconstruction, the simulations also suggest how we may estimate the confidence in the reconstruction of the ancestral base at a given site based on properties of the local alignment containing that site. In a situation where the phylogenetic tree and sequence alignment are known to be correct and there are no insertions or deletions, the posterior probabilities of each of the four possible ancestral nucleotides can be explicitly computed using standard substitution models (Yang et al. 1995 Here, we take a heuristic approach to estimating the confidence of the reconstructed base at a given site. The probability that an individual reconstructed base is a mismatch error or an added base is empirically estimated based on local properties of the alignment at and around that position (see Methods). Testing this approach in our simulations, we find that about 98.5% of the nucleotides of our simulated Boreoeutherian ancestral sequence can be reconstructed with at least 90% confidence that they are not mismatches or added bases, and about 95%, with at least 99%, confidence. An additional 1% of the bases of the ancestral sequence are missing from the reconstructed sequence, but the locations of these omissions cannot be accurately predicted.
Reconstruction of an ancestral region in the CFTR locus
We confirmed that the 96% accuracy estimate is reasonable by analysis of transposable elements whose insertion predated the Boreoeutherian ancestor ("ancestral repeats") (Fig. 3A). For each family of ancestral repeats, a consensus sequence is available, obtained from the many copies of these elements scattered in the genome. The consensus sequence is thought to represent the transposon sequence at the time of its insertion into this and other regions of the ancestral genome (Jurka 2000 d(C, A) + d(A, H), where d(C, A) and d(A, H) are the expected substitutions per site between C and A, and between A and H, respectively. Reconstruction errors in A* would be expected to take this sequence away from the true evolutionary path, resulting in d(C, H) < d(C, A*) + d(A*, H). Figure 3A shows the average distances observed for ancestral repeats of the CFTR region. It indicates that d(C,A*) + d(A*,H) exceeds d(C,H) by 0.04 substitutions per site, which can be verified to correspond to a mismatch error rate in the reconstructed sequence A* of about 2.6%. This roughly confirms our estimate of 96% overall accuracy, since mismatch errors are expected to account for about half of the base-by-base errors made by our method in this case and errors are concentrated in repetitive regions.
Figure 4 illustrates the reconstruction in a noncoding region of the CFTR locus that exhibits a typical level of sequence conservation. This region is located in a 32-Kb intron of the CAV1 gene, about 13 Kb from the 5' exon. The bases in this region are relics left over from the insertion of a MER20 transposon sometime prior to the mammalian radiation, and are thus unlikely to be under selective pressure.
Notice that despite the fact that the alignment of certain species (in particular, mouse, rat, and hedgehog) appears somewhat unreliable, the inference of the presence or absence of a Boreoeutherian ancestral base at a given position is quite straightforward given the alignment, and to a lesser extent, so is the prediction of the actual ancestral base itself. The MER20 consensus is shown for comparison. Most positions where the reconstructed Boreoeutherian ancestral base disagrees with the MER20 consensus are likely due to substitutions in this MER20 relic that predated the Boreoeutherian ancestor, since the support of the reconstructed base is very strong in the extant species. If the MER20 consensus sequence is used as an outgroup in the reconstruction procedure, only two bases (indicated by a longer arrow) are reconstructed differently, indicating that the reconstructed ancestral sequence is very stable and most of it is likely to be correct. Because the reconstructed Boreoeutherian ancestral sequence is evolutionarily closer to the older mammalian ancestral genomes that existed at the time of the insertions of ancestral transposons, it is superior to the human genome sequence for the recognition of these elements. In essence, it acts as an observatory that allows us to see even farther back in time. When RepeatMasker is run on the inferred Boreoeutherian ancestor, ancient repeat families such as L2 LINES and MIRs are detected in significantly larger fraction than when RepeatMasker is run on the human sequence, because they are much less decayed [Table 1, column (b)]. This improved ability to detect very old repeats results in an increase of 2.7% in the estimated total fraction of the human CFTR region that derives from a transposon insertion (from 37.7% to 40.4%).
More importantly, reconstructed ancestral genome sequences allow us to make inferences about the specific evolutionary path of functional elements such as protein-coding regions (Jermann et al. 1995
The accuracy of the inferred ancestral CFTR protein sequence was verified by comparing it to outgroups like chicken and the marsupial Didelphis virginiana (opposum). Of the 1481 amino acids of the ancestral CFTR protein, 1276 are most likely correct by virtue of a quasi-unanimity within eutherian mammals. Of the remaining 205 amino acids where the reconstruction is not completely obvious, 137 amino acids are strongly confirmed by a match in either chicken or opposum, and 29 others could only be weakly confirmed by a match in either frog or Fugu. On the other hand, 15 amino acids could be incorrectly reconstructed as indicated by the failure of the two tests above and by a match between one of the eutherian amino acids and either Didelphis or chicken. Overall, this gives an estimated accuracy of
Sensible reconstruction of hypothesized structural RNAs was also obtained. Two regions of the CFTR locus in introns of the ST7 gene that appear to form stable RNA secondary structures (Margulies et al. 2003
The reconstructed ancestral sequence can also be used to gather statistics on the rates of gain and loss of DNA in different eutherian lineages, and the shifts in substitution spectra. After reconstruction of the Boreoeutherian ancestral sequence from the 19 present-day genomic sequences, we compared it with those sequences to derive these statistics (Table 2). The reconstructed ancestral sequence had a size (1124 Kb) about 10% smaller than those of extant old-world monkeys (1260 Kb on average, with most of the difference due to Alu insertions) and also smaller than those of most other species, with the exception of the two lemurs. The number of inserted and deleted bases in primates is low compared with many other mammals (Thomas et al. 2003
It is predicted that the human sequence differs from that of the Boreoeutherian ancestor in 30.3% of its bases, 21.7% resulting from insertions, and thus not present in the ancestor, and 8.6% resulting from substitutions. In addition, the human sequence has lost about 11.3% of the ancestral bases. Most differences between the human and ancestral sequences derive from primate lineage insertions of transposons, in agreement with other recent studies (IMGSC 2002 The set of 19 species we used is not a uniform sampling of the eutherian phylogenetic tree, but rather is biased toward close human relatives, containing seven old-world monkeys. To ensure that the number of closely related species does not unduly affect the reconstructed ancestor by biasing it toward the human sequence, we repeated the reconstruction procedure, removing all primates but human and lemur. The new reconstructed ancestor was not significantly farther from the human sequence, with 0.113 expected substitutions per site (compared with 0.111 previously), 10.8% deletions (compared with 11.3% previously), and 23.4% insertions (compared with 21.7% previously).
The availability of predicted ancestral sequences at every internal node of the tree offers a unique perspective on the deletion and insertion processes at work along each branch of the tree. Focusing on a 280-kb region where sequences from all 19 mammals were available, the number of microdeletions and microinsertions (of length at most 10 bp) along each branch of the tree was estimated (Fig. 5). We did not attempt to estimate the indel rates along the four deepest branches of the tree because (1) for the two deepest branches of the tree, deletions cannot be distinguished from insertions, and (2) for the two branches incident upon the Boreoeutherian ancestor, deletions and insertions are crucially determined by the presence or absence of aligned bases in armadillo, which is often unreliably aligned. Among the branches where indels can be accurately counted, the rate of deletions is consistently two to three times higher than the rate of insertions, with the lowest deletion/insertion ratios found in the dog and the prosimian lineages, and the highest ones found in the pre-mouserat-split rodents, horse, and cow lineages. Deletion and insertion rates are definitely not following a molecular clock, with rates in primates
One of the nonintuitive results of this study is the observation that more ancient ancestral genomes can often be reconstructed more accurately than those of their more recent descendants. Why exactly is this so? For simplicity, consider the case of reconstructing a single binary ancestral character state in the root species (e.g., purine vs. pyrimidine at a given site) under a simple model in which the prior probability distribution on the ancestral character is uniform, substitution rates are known, symmetric, homogeneous, and not too high, and the total branch length in the phylogenetic tree from the root ancestor to each of the modern species is the same (i.e., assume a molecular clock). Here, each of n modern species has a state that differs from the ancestral one with the same probability p <1/2. If the tree exhibits a star topology (Fig. 3B), in which each of the modern species derives directly from the ancestor on an independent branch, then it is clear that the maximum likelihood and Bayesian maximum a posteriori reconstructions of the ancestral character agree, and the reconstructed state is the one that is most often observed in the n modern species. The probability of an error in reconstruction is:
In contrast, a non-star topology (Fig. 3C) such as a binary tree that has the same total root-to-leaf branch length and the same number n of modern species at the leaves has two nonzero length branches from the root ancestor R leading to intermediate ancestors A and B, and information is irrevocably lost along these two branches. No matter how large the number n of modern descendant species derived from A and B, one can do no better at reconstructing the state at R than if one knew for certain the state in its immediate descendants A and B. Even with this knowledge, the accuracy of reconstruction of R from A and B will be strictly <100% for all reasonable models and nonzero branch lengths. The reconstruction gets poorer the longer the branch lengths are to A and B. This extends to the case where the ancestor R being reconstructed has a bounded number of independent immediate descendants and to the case where descendants of an earlier ancestor of R (outgroups) are also available. The long branches connecting them to the rest of the tree are why some more recent ancestral sequences in the tree of Figure 1 are less reconstructable than the Boreoeutherian ancestor, which acts almost like the root of a star topology.
The above analysis shows that the star tree is always the best topology for reconstruction in the limit as the number n of observed species becomes large, while the time to the common ancestor remains fixed. A stronger claim is that for every n and every time to the common ancestor, the star tree with n leaves is always more favorable for ancestral reconstruction than any branching tree that has internal "shared" nodes (but the same time to the common ancestor), because the star topology maximizes the mutual information between the residues at the leaves and at the root (Schultz et al. 1996
While suggestive that reconstruction of a reasonable approximation to an eutherian ancestral euchromatic genome may be within our reach, our simulation results have a number of important limitations as follows: (1) The rates of substitutions, deletions, and small insertions are assumed to be constant across sequence position and homogeneous across branches, with branch lengths proportional to those in a particular tree (Eizirik et al. 2001 Despite these shortcomings, our validation of the reconstruction by both simulation and ancestral repeat and codon analysis on actual data suggests that for regions like CFTR, which are likely to be typical, the above issues are not severe enough to prevent a reasonably accurate reconstruction.
More significant technical challenges remain if we wish to conduct in vivo functional tests of reconstructed ancestral genomic regions, either in cell lines or in mouse models. Multikilobase sequences of transgenic DNA can be inserted into mouse embryonic stem cells via homologous recombination ("knockin") methods (Prosser and Rastan 2003
Extant eutherian species are variations on a common "mammalian theme." Accurate reconstruction of large genomic regions of an eutherian ancestor may help us identify and understand the common functional elements of that theme, as well as the lineage-specific evolutionary innovations that led to the modern variations on it. Because distances are reduced and direction of change can be resolved, much can be learned by comparing mammalian genomes to their common ancestor rather than pair-wise among themselves. Because the number of substitutions per site leading from the placental ancestral genome to the human genome is only one third of that from the ancestor to mouse (Cooper et al. 2003
Simulation procedure We built a simulation procedure, based on the Rose program (Stoye et al. 1997 We use the above methods to simulate evolution from an ancestral mammalian sequence forward to modern versions of that sequence, simulating speciation events at the branch points of the tree, and substitutions, insertions, and deletions along each branch. To initiate such a simulation, we first need to generate a hypothetical ancestral mammalian sequence to go at the root of the tree. This is the sequence that we will later try to reconstruct from the sequences at the leaves of the tree. This hypothetical ancestral mammalian sequence is generated by another simulation, i.e., starting with a repeat-free 40% GC-rich random sequence, we simulate its evolution for a time and at a rate similar to those between human and mouse, using the same set of mutational operations as previously described, but inserting transposons that are believed to predate the mammalian radiation. This simulated ancestral sequence thus has a repeat content and age distribution that should approximate that of the actual ancestral mammalian genome.
Alignment and reconstruction In experiments using actual sequence data from present day mammals, the simulation steps are omitted, and the same alignment and reconstruction procedure is followed.
Base-by-base confidence estimates
We thank Jim Kent, Arian Smit, Adam Siepel, Gill Bejerano, Elliot Margulies, Brian Lucena, Leonid Chindelevitch, and Ron Davis for helpful discussions and suggestions. W.M. was supported by grant HG-02238 from the National Human Genome Research Institute, E.G. was supported by NHGRI, D.H. and M.B. were supported by NHGRI Grant 1P41HG02371 and the Howard Hughes Medical Institute. Finally, we thank the NISC Comparative Sequencing Program for providing multispecies comparative sequence data.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2800104.
4 Present address: McGill University, Montreal, Quebec H3A 2B4 Canada.
5 Corresponding authors.
[Supplemental material is available online at www.genome.org and http://genome.ucsc.edu/ancestors
Adey, N.B., Tollefsbol, T.O., Sparks, A.B., Edgell, M.H., and Hutchison III, C.A. 1994. Molecular resurrection of an extinct ancestral promoter for mouse L1. Proc. Natl. Acad. Sci. 91: 1569-1573. Auriche, C., Carpani, D., Conese, M., Caci, E., Zegarra-Moran, O., Donini, P., and Ascenzioni, F. 2002. Functional human CFTR produced by a stable minichromosome. EMBO Rep. 3: 862-868.[CrossRef][Medline] Austin, J.J., Ross, A.J., Smith, A.B., Fortey, R.A., and Thomas, R.H. 1997. Problems of reproducibilityDoes geologically ancient DNA survive in amber-preserved insects. Proc. R. Soc. Lond. B Biol. Sci. 264: 467-474.[Medline]
Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S., and Haussler, D. 2004. Ultraconserved elements in the human genome. Science 304: 1321-1325. Bienvenu, T., Petitpretz, P., Beldjord, C., and Kaplan, J.C. 1994. A missense mutation (F87L) in exon 3 of the cystic fibrosis transmembrane conductance regulator gene. Hum. Mutat. 3: 395-396.[CrossRef][Medline] Birnbaum, D., Coulier, F., Pebusque, M.J., and Pontarotti, P. 2000. Paleogenomics: Looking in the past to the future. J. Exp. Zool. 288: 21-22.[CrossRef][Medline] Blanchette, M., Kunisawa, T., and Sankoff, D. 1999. Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 49: 193-203.[CrossRef][Medline]
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708-715.
Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394.
Bourque, G., Pevzner, P.A., and Tesler, G. 2004. Reconstructing the genomic architecture of ancestral mammals: Lessons from human, mouse, and rat genomes. Genome Res. 14: 507-516. Challem, J.J. 1997. Did the loss of endogenous ascorbate propel the evolution of Anthropoidea and Homo sapiens? Med. Hypotheses 48: 387-392.[CrossRef][Medline] Chang, B.S. and Donoghue, M.J. 2000. Recreating ancestral proteins. Trends Ecol. Evol. 15: 109-114.[CrossRef][Medline]
Chang, B.S., Jonsson, K., Kazmi, M.A., Donoghue, M.J., and Sakmar, T.P. 2002. Recreating a functional ancestral archosaur visual pigment. Mol. Biol. Evol. 19: 1483-1489.
Cooper, G.M., Brudno, M., Green, E.D., Batzoglou, S., and Sidow, A. 2003. Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res. 13: 813-820. Cunningham, C.W., Omland, K.E., and Oakley, T.H. 1998. Reconstructing ancestral states, a critical reappraisal. Trends Ecol. Evol. 13: 361-368.[CrossRef]
Eizirik, E., Murphy, W.J., and O'Brien, S.J. 2001. Molecular dating and biogeography of the early placental mammal radiation. J. Hered. 92: 212-219. El-Mabrouk, N. and Sankoff, D. 1999. On the reconstruction of ancient doubled circular genomes using minimum reversals. Genome Inform. Ser. Workshop, Genome Inform. 10: 83-93.[Medline] Enard, W., Przeworski, M., Fisher, S.E., Lai, C.S., Wiebe, V., Kitano, T., Monaco, A.P., and Paabo, S. 2002. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418: 869-872.[CrossRef][Medline] Evans, W., Kenyon, C., Peres, Y., and Schulman, L. 2000. Broadcasting on trees and the Ising model. Ann. Appl. Probab. 10: 410-433.[CrossRef] Fredslund, J., Hein, J., and Scharling, T. 2003. A large version of the small parsimony problem. Lecture Notes in Bioinformatics, Proc. WABI'03. 2812: 417-432. Gaucher, E.A., Thomson, J.M., Burgan, M.F., and Benner, S.A. 2003. Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature 425: 285-288. Goodman, M., Barnabas, J., Matsuda, G., and Moore, G.W. 1971. Molecular evolution in the descent of man. Nature 233: 604-613.[CrossRef][Medline] Graphodatsky, A.S., Yang, F., Perelman, P.L., O'Brien, P.C., Serdukova, N.A., Milne, B.S., Biltueva, L.S., Fu, B., Vorobieva, N.V., Kawada, S.I., et al. 2002. Comparative molecular cytogenetic studies in the order Carnivora: Mapping chromosomal rearrangements onto the phylogenetic tree. Cytogenet Genome Res. 96: 137-145.[CrossRef][Medline] Guindon, S. and Gascuel, O. 2003. PHYMLA simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. System. Biol. 52: 696-704. Hardison, R.C. 2000. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16: 369-372.[CrossRef][Medline]
Hardison, R.C., Roskin, K.M., Yang, S., Diekhans, M., Kent, W.J., Weber, R., Elnitski, L., Li, J., O'Connor, M., Kolbe, D., et al. 2003. Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res. 13: 13-26. Hasegawa, M., Kishino, H., and Yano, T. 1985. Dating of the humanape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: 160-174.[CrossRef][Medline] Hein, J. 1989. A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol. Biol. Evol. 6: 649-668.[Abstract] Hein, J., Wiuf, C., Knudsen, B., Moller, M.B., and Wibling, G. 2000. Statistical alignment: Computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302: 265-279.[CrossRef][Medline]
Hillis, D.M., Huelsenbeck, J.P., and Cunningham, C.W. 1994. Application and accuracy of molecular phylogenies. Science 264: 671-677. Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. Amer Statist. Assoc. 58: 13-27.[CrossRef] Huelsenbeck, J.P. and Bollback, J. 2001. Empirical and hierarchical Bayesian estimation of ancestral states. Syst. Biol. 50: 351-366.[Medline] International Human Genome Sequencing Consortium (IHGSC). 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.[CrossRef][Medline] |