|
|
|
|
Published online before print
December 14, 2005, 10.1101/gr.4305906 Genome Res. 16:215-222, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Letter Mutation hot spots in mammalian mitochondrial DNACentre National de la Recherche Scientifique, Unité Mixte de Recherche (CNRS UMR) 5171-"Génome, Populations, Interactions, Adaptation," Université Montpellier 2, 34095 Montpellier, France
Animal mitochondrial DNA is characterized by a remarkably high level of within-species homoplasy, that is, phylogenetic incongruence between sites of the molecule. Several investigators have invoked recombination to explain it, challenging the dogma of maternal, clonal mitochondrial inheritance in animals. Alternatively, a high level of homoplasy could be explained by the existence of mutation hot spots. By using an exhaustive mammalian data set, we test the hot spot hypothesis by comparing patterns of site-specific polymorphism and divergence in several groups of closely related species, including hominids. We detect significant co-occurrence of synonymous polymorphisms among closely related species in various mammalian groups, and a correlation between the site-specific levels of variability within humans (on one hand) and between Hominoidea species (on the other hand), indicating that mutation hot spots actually exist in mammalian mitochondrial coding regions. The whole data, however, cannot be explained by a simple mutation hot spots model. Rather, we show that the site-specific mutation rate quickly varies in time, so that the same sites are not hypermutable in distinct lineages. This study provides a plausible mutation model that potentially accounts for the peculiar distribution of mitochondrial sequence variation in mammals without the need for invoking recombination. It also gives hints about the proximal causes of mitochondrial site-specific hypermutability in humans.
Mitochondrial DNA sequence variation in animals is notoriously characterized by a high amount of homoplasy, i.e., phylogenetic/genealogic conflict between sites. This is true between species, decreasing the efficiency of mitochondrial markers for phylogenetic analyses (see Springer et al. 2001
There are two major mechanisms potentially explaining the occurrence of homoplasy within species. The first one is recombination. When partial genetic exchanges occur between distantly related individuals, the various segments of the recombined molecules are phylogenetically incongruent, because they actually have distinct genealogical histories. Alternatively, homoplasy can be generated by convergence due to multiple mutations. If two distantly related individuals independently receive the same mutation at site i, then site i will wrongly support their grouping, in conflict with other sites in the data set. The high amount of homoplasy in mitochondrial DNA could therefore be due to the phylogenetic noise introduced by mutation hot spots. In principle, an obvious difference between the two models is the expected distribution of the number of distinct states taken by polymorphic sites: Mutation hot spots, not recombination, should generate three- or four-state polymorphisms. Mitochondrial DNA, however, undergoes more transitions (C
A number of studies have attempted to demonstrate the occurrence of recombination in animal mitochondria. Recombination first appeared supported in humans by linkage disequilibrium (Awadalla et al. 1999
Curiously, despite the importance of the debate, the alternative mutation hot spots hypothesis has not been examined in depth. In their seminal article, Eyre-Walker et al. (1999 In this article, we take the point of view of trying to detect mutation hot spots from mitochondrial DNA sequence variation, and asking whether they can explain the high level of homoplasy observed within species. Mutation hot spots, if any, should result in the co-occurrence of polymorphisms between closely related species, assuming that a hot spot in species 1 is still hot in species 2. They should also imply a correlation across sites of within-species and between-species variabilitya hot spot should contribute both to polymorphism and divergence and be variable both within and between species. By using mammals as a model taxon, we take a phylogenetic approach to check these predictions of the mutation hot spots hypothesis and to try to elucidate the causes of the high level of homoplasy in mitochondrial DNA.
Data sets Two data sets were built from public databases. We first extracted from Polymorphix (Bazin et al. 2005 T and A G changes) at third-codon positions, following the method of Eyre-Walker et al. (1999 90% of observed polymorphisms. Data sets included 368-380 third-codon positions, and the percentage of polymorphic third-codon positions varied from 1.05% (Lemur catta) to 51.05% (Phyllotis xanthopygus).
In addition to this mammalian cytochrome b data set (see Supplemental material), a Hominoidea full-genome data set was built by gathering 560 human mitochondrial sequences (Herrnstadt et al. 2002
Mammalian cytochrome b
Results are given in Figure 1. Many but not all data sets showed significant homoplasy. The amount of homoplasy is correlated to the proportion of polymorphic sites, as expectedlittle variation implies little conflict. The human data set showed significant homoplasy (observed: eight, expected: 1.74, P < 10-3), consistent with previous studies (Eyre-Walker et al. 1999
Co-occurrence analysis Mutation hot spots, if any, should tend to generate polymorphisms in several species, resulting in the co-occurrence of polymorphic sites among closely related species. We first checked this prediction at the genus levelthis concerns only polyspecific genera, i.e., genera represented by more than one species. For a given polyspecific genus, we call co-occurrence a site polymorphic in strictly more than half the number of species represented. This (arbitrary) threshold is two for genera represented by two or three species, three for genera represented by four or five species, etc. Note that such co-occurrences actually correspond to several mutations having appeared independently in distinct species. Shared alleles due to ancestral polymorphism or secondary introgression were removed before the analysis when making each species monophyletic (see Data sets section).
For every genus, the observed number of co-occurrences was compared to the expectation under the hypothesis of independence of mutation events between species. Sites in each species were randomly permuted 1000 times, and the amount of co-occurrence of polymorphisms was recomputed from shuffled data sets. Randomizing the location of polymorphic sites within species removes the effect of potential mutation hot spots on polymorphism co-occurrence. The P-value was defined as the proportion of randomized data sets for which co-occurrence was higher than in the real data. Purines and pyrimidines were randomized separately in this procedure; i.e., a purine site in the real data set was kept a purine in randomized data sets. We did that because the purine transition rate is generally higher than is the pyrimidine one (Tamura and Nei 1993 An excess of co-occurrence of polymorphisms was detected in 22 polyspecific genera out of 27, and it was significant in 11 cases. These proportions reach 15 out of 18 and 10 out of 18, respectively, if one considers only genera in which at least one species shows significant homoplasy. This analysis supports the existence of mutation hot spots in mammalian cytochrome b third-codon positions. The effect is strong in genera Clethrionomys (Rodentia, Arvicolinae), Neotoma (Rodentia, Sigmodontinae), and Sorex (Insectivora, Soricidae), for instance (Table 1). In other groups, however, no significant co-occurrence was detected, although homoplasy is strong (e.g., Apodemus, Sigmodon). For these groups, and more generally, we asked whether the observed amounts of co-occurrence are compatible with a "pure" hot spots model.
Hot spots model To achieve this aim, we simulated data sets at the genus level under the hot spots hypothesis. For each polyspecific genus, sequences from all species were gathered in a single file, and a maximum-likelihood phylogenetic analysis was performed. Then 100 data sets were simulated by using the inferred tree, branch lengths, and rate matrix, and a distribution of rates across sites, thus mimicking mutation hot spots and assuming shared hot spots between species. We call this model the constant hot spots model. The shape parameter of the assumed distribution was tuned so that simulated data sets resemble the observed one with respect to the number of polymorphic sites and amount of homoplasy within species. Co-occurrence of polymorphisms was then computed for each simulated data set and compared with the actual one. For 20 genera out of 27, the observed level of co-occurrence of polymorphisms between species was lower than expected under the constant hot spots hypothesis, and this trend was significant in eight genera. For these genera, the constant hot spots model cannot explain both the observed level of homoplasy within species and the observed level of polymorphism co-occurrence between species: When more hot spots were introduced in the simulations by increasing the variance of the assumed distribution in order to equate the expected and observed amounts of co-occurrence, the simulated data sets showed significantly more homoplasy within species than did the actual ones (data not shown). Something in the constant hot spots model must therefore be wrong, at least for eight genera. This model assumes a common genealogy for all the sites, i.e., no recombination. A departure from this assumption could of course explain the observed patternrecombination generates some homoplasy but no co-occurrence (since it does not affect the location of polymorphic sites). Another assumption of the constant hot spots model, however, is constancy in time of the mutation rate of every site, which implies that hot spots are shared between species. A departure from this assumption, i.e., site-specific mutation rate variation, would also decrease the observed co-occurrence. One prediction of this hypothesis is that polymorphism co-occurrence should decrease as species diverge; closely related species should tend to share more hot spots than do distantly related species. We calculated for each polyspecific genus the difference between the expected amount of co-occurrence under the constant hot spots model (averaged over simulations) and the observed one. This co-occurrence shortage was plotted against the average nucleotide divergence between species. Figure 2 shows that the constant hot spots model correctly fits the data when species are closely related, whereas genera including distantly related species tend to show a lower level of co-occurrence than expected. This pattern is consistent with the hypothesis of site-specific mutation rate variation in time, and between species. Recombination applies independently of species divergence, so that recombination alone cannot generate such a relationship.
Higher taxonomic levels
Primate full genome Human is by far the most thoroughly studied species as far as mitochondrial diversity is concerned. H. sapiens, however, had to be excluded from the co-occurrence analysis because no complete cytochrome b polymorphism data set is available from its close relatives. What we have, however, are hundreds of complete human mitochondrial genomes, plus eight complete sequences from other Hominoidea species. Given the extensive sampling available, we tried to detect potential mutational hot spots through a phylogenetic analysis, first within H. sapiens and then between species. Under the constant hot spots model, sites showing a strong level of variability within species should also be variable between speciesnote that this rationale requires the additional assumption of selective neutrality of mutations, as we discuss below.
Human hypervariable sites
These sites are obvious outliers of the site-specific rate distribution, i.e., hypervariable sites. The total tree length for the human data set is 0.13 substitution per site. This number becomes 0.53 per site if one conservatively assumes that observable mutations occur only at third-codon positions of protein coding genes, the other sites being invariant due to strong selection. Even with this unrealistic assumption, the probability that one site or more undergoes 10 changes is of the order of 10-6 under the hypothesis of evenly distributed mutations and no recombination, while three such sites are observed in the real data set. Similarly, the expected number of sites showing five mutations or more is 1.13 under the hypothesis of equal mutation rate for every site, while we observe 26 such sites. We checked that these results were robust to uncertainties in the phylogenetic reconstruction. We generated 100 alternative tree topologies by using the bootstrap procedure, and reperformed the analysis. The 26 putative hot spots listed in Table 2 essentially remained hot when the tree topology varied: The minimal parsimony score over the 26 x 100 trials was three, and the minimal average (over trees) parsimony score was 4.8.
The detected hypervariable sites are located in eight distinct protein-coding genes, the two ribosomal RNA genes, and one transfer RNA gene (Table 2). The density of hypervariable sites appears higher around the D-loop and lower opposite to the D-loop (no hot spots were detected between positions 6300 and 10300). Three of the detected sites showed three distinct nucleotide states. Most mutations in hypervariable sites are transitions (Table 2), which is typical of mammalian mitochondrial DNA evolution (Reyes et al. 1998
Among the 26 hypervariable sites listed in Table 2, 22 involve an A We asked whether apparent mutation hot spots in H. sapiens were also divergence hot spots between Hominoidea. We built a data set including six nonhuman Hominoidea species and reproduced the above described analysis by using the well-supported (H. lar, ((P. pygmaeus, P. abelii), (G. gorilla, (P. paniscus, P. troglodytes)))) model tree. The site-specific parsimony score varied from zero to four (total tree length: 0.723 substitution per site). Among the 26 sites showing five or more mutations in humans, 11 showed no substitution between nonhuman Hominoidea species. One site showed 11 mutations in humans but no change between species. These results appear in contradiction with the constant hot spots hypothesis. The discrepancy, however, could be caused by natural selection. Slightly deleterious mutations can segregate as polymorphic but have a low fixation probability, so that they rarely contribute to divergence between species. Sites detected as hot spots in humans but invariant between species might be so because they involve deleterious mutations. To approach mutational effects only, we focused on the third-codon positions of protein-coding sequences.
Third-codon positions
We performed simulations to further check the hot spots hypothesis. Data sets were simulated under the TN93 +
We made use of two mammalian polymorphism DNA sequence data sets to try and detect the existence of mutation hot spots in the mitochondrial genome. The cytochrome b analysis revealed a significant amount of third-codon-position polymorphism co-occurrence among related species, rejecting the hypothesis of equal mutation rates across synonymous sites. Simulations showed that a pure hot spots model can account for the observed within-species homoplasy and between-species polymorphism co-occurrence in genera including little-divergent species. When species reach 10%-12% sequence divergence or more, the constant hot spots model predicts too much co-occurrence (Fig. 2), suggesting that site-specific mutation rates vary in time. The Hominoidea full genome analysis confirmed these findings and generalized it to noncytochrome b data: An A G-biased set of hypervariable sites was detected in humans, and the within-human and between-primates site-specific parsimony scores were weakly but significantly correlated, implying the existence of mutation hot spots. This correlation, however, was lower than expected under a constant hot spots model.
Eyre-Walker et al. (1999 One surprising finding of this study is the high apparent rate of evolution of site-specific mutation rates. As little as 10%-12% sequence divergence between congeneric species is enough to generate a detectable shortage of polymorphism co-occurrence. At the family level, virtually all the co-occurrence signal vanishes: Knowing that site i is polymorphic in species 1 does not increase the probability that it is found polymorphic in species 2.
The process of site-specific variation of evolutionary rate, known as covarion or heterotachy (Fitch 1971
These results also ask the question of the causes of a variable in time site-specific mutation rate: What makes the mutation rate of a site increase or decrease? We examined the 11 human hypervariable sites in Table 2 showing a common state in all nonhuman primates. Nine of these sites are G
Three of the 26 human hypervariable sites listed in Table 2 are located in the 16S ribosomal RNA, a gene whose within-species diversity has been investigated in five additional Hominoidea species, i.e., P. paniscus, P. troglodytes, G. gorilla, P. pygmaeus, and Hylobates syndactylus (between 10 and 35 individuals surveyed per species) (Noda et al. 2001
None of the reported hot spots correspond to known disease-associated mutations, as we checked from the MITOMAP database (Brandon et al. 2005
Now back to the mutation hot spots versus recombination debate. This study aimed at testing whether mutation hot spots actually occurred in the coding region of the mitochondrial genome. The answer is obviously positive: Significant co-occurrence among species of synonymous cytochrome b polymorphism was detected in many mammalian genera, and a correlation between within-species and between-species synonymous site-specific variability, together with an A It should be noted that our results do not exclude the occurrence of recombination. Mutation hot spots might actually make the detection of recombination more difficult by generating patterns of linkage disequilibrium independent of physical distance, so that this work could paradoxically content supporters of the recombination hypothesis as well. What we have, however, is a mutation model potentially accounting for the distribution of mitochondrial DNA sequence variation within and between species. Whether recombination also plays a significant evolutionary role is still an open question, but we are now entitled to demand strong evidence to believe in it.
Cytochrome b alignments were performed by using MABIOS (Abdeddaim 1997 (five classes) distribution of rates across sites. The TN93 model accounts for unbalanced base composition and allows three distinct rates for transversions, A G transitions, and C T transitions. Constant rates across sites mean no hot spots, while a distribution is intended to reflect the existence of hypervariable sites. The other analyses are described in the Results section. They were achieved by using homemade C, PERL, and R programs.
This work was supported by French Ministère de la Recherche ACI IMPBio and CNRS-INRA Equipe Projet Multi Laboratoire "Méthodes informatiques pour la phylogénie moléculaire."
Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4305906.
1 Corresponding author. [Supplemental material is available online at www.genome.org.]
Abdeddaim, S. 1997. Fast and sound two-step algorithms for multiple alignment of nucleic sequences. Int. J. Artif. Intell. Tools 6: 179-192. Awadalla, P., Eyre-Walker, A., and Maynard-Smith, J. 1999. Linkage disequilibrium and recombination in hominid mitochondrial DNA. Science 286: 2524-2525. Bandelt, H.J., Quintana-Murci, L., Salas, A., and Macaulay, V. 2002. The fingerprint of phantom mutations in mitochondrial DNA data. Am. J. Hum. Genet. 71: 1150-1160.[CrossRef][Medline] Bazin, E., Duret, L., Penel, S., and Galtier, N. 2005. Polymorphix, a polymorphism sequence database. Nucleic Acids Res. 33: 481-484. Birky, C.W. 1995. Uniparental inheritance of mitochondrial and chloroplast genes: Mechanisms and evolution. Proc. Natl. Acad. Sci. 92: 11331-11338. Brandon, M.C., Lott, M.T., Nguyen, K.C., Spolim, S., Navathe, S.B., Baldi, P., and Wallace, D.C. 2005. MITOMAP, a human mitochondrial genome database: 2004 update. Nucleic Acids Res. 33: 611-613. Crochet, P.A. and Desmarais, E. 2000. Slow rate of evolution in the mitochondrial control region of gulls (Aves: Laridae). Mol. Biol. Evol. 17: 1797-1806. Delsuc, F., Stanhope, M.J., and Douzery, E.J. 2003. Molecular systematics of armadillos (Xenarthra, Dasypodidae): Contribution of maximum likelihood and Bayesian analyses of mitochondrial and nuclear genes. Mol. Phyl. Evol. 28: 261-275.[CrossRef][Medline] Edgar, R.C. 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 1792-1797. Eyre-Walker, A., Smith, N.G.C., and Maynard-Smith, J. 1999. How clonal are human mitochondria? Proc. Biol. Sci. 266: 477-483. Fitch, W.M. 1971. Rate of change of concomitantly variable codons. J. Mol. Evol. 1: 84-96.[CrossRef][Medline] Galtier, N. 2001. Maximum likelihood phylogenetic analysis under a covarion-like model. Mol. Biol. Evol. 18: 866-873. Gantenbein, B., Fet, V., Gantenbein-Ritter, I.A., and Balloux, F. 2005. Evidence for recombination in scorpion mitochondrial DNA (Scorpiones: Buthidae). Proc. Biol. Sci. 272: 697-704. Gu, X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16: 1664-1674.[Abstract] Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52: 696-704.[CrossRef][Medline] Hagelberg, E. 2003. Recombination or mutation rate heterogeneity? Implications for Mitochondrial Eve. Trends Genet. 19: 84-90.[CrossRef][Medline] Hagelberg, E., Goldman, N., Lio, P., Whelan, S., Schiefenhovel, W., Clegg, J.B., and Bowden, D.K. 1999. Evidence for mitochondrial DNA recombination in a human population of island Melanesia. Proc. Biol. Sci. 266: 485-492. Herrnstadt, C., Elson, J.L., Fahy, E., Preston, G., Turnbull, D.M., Anderson, C., Ghosh, S.S., Olefsky, J.M., Beal, M.F., Davis, R.E. et al. 2002. Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups. Am. J. Hum. Genet. 70: 1152-1171.[CrossRef][Medline] Herrnstadt, C., Preston, G., and Howell, N. 2003. Errors, phantoms and otherwise, in human mtDNA sequences. Am. J. Hum. Genet. 72: 1585-1586.[CrossRef][Medline] Hey, J. 2000. Human mitochondrial DNA recombination: Can it be true? Trends Ecol. Evol. 15: 181-182.[CrossRef][Medline] Ho, S.Y., Philips, M.J., Cooper, A., and Drummond, A.J. 2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22: 1561-1568. Ingman, M. and Gyllensten, U. 2003. Mitochondrial genome variation and evolutionary history of Australian and New Guinean aborigines. Genome Res. 13: 1600-1606. Ingman, M., Kaessmann, H., Pääbo, S., and Gyllensten, U. 2000. Mitochondrial genome variation and the origin of modern humans. Nature 408: 708-713.[CrossRef][Medline] Innan, H. and Nordborg, M. 2002. Recombination or mutational hot spots in human mtDNA? Mol. Biol. Evol. 19: 1122-1127. Kivisild, T. and Villems, R. 2000. Questioning evidence for recombination in human mitochondrial DNA. Science 288: 1931. Kluge, A.G. and Farris, J.S. 1969. Quantitative phyletics and the evolution of anurans. Syst. Zool. 18: 1-32. Kraytsberg, Y., Schwartz, M., Brown, T.A., Ebralidse, K., Kunz, W.S., Clayton, D.A., Vissing, J., and Khrapko, K. 2004. Recombination of human mitochondrial DNA. Science 304: 981. Lopez, P., Casane, D., and Philippe, H. 2002. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19: 1-7. Noda, R., Kim, C.G., Takenaka, O., Ferrell, R.E., Tanoue, T., Hayasaka, I., Ueda, S., Ishida, T., and Saitou, N. 2001. Mitochondrial 16S rRNA sequence diversity of hominoids. J. Hered. 92: 490-496. Pesole, G. and Saccone, C. 2001. A novel method for estimating substitution rate variation among sites in a large data set of homologous sequences. Genetics 157: 859-867. Piganeau, G., Gardner, M., and Eyre-Walker, A. 2004. A broad survey of recombination in animal mitochondria. Mol. Biol. Evol. 21: 2319-2325. Pupko, T. and Galtier, N. 2002. A covarion-based method for detecting molecular adaptation: Application to the evolution of primate mitochondrial genomes. Proc. Biol. Sci. 269: 1313-1316. Raina, S.Z., Faith, J.J., Disotell, T.R., Seligmann, H., Stewart, G.B., and Pollock, D.D. 2005. Evolution of base-substitution gradient in primate mitochondrial genomes. Genome Res. 15: 665-673. Reyes, A., Gissi, C., Pesole, G., and Saccone, C. 1998. Asymmetrical directional mutation pressure in the mitochondrial genome of mammals. Mol. Biol. Evol. 15: 957-966.[Abstract] Schwartz, M. and Vissing, J. 2002. Paternal inheritance of mitochondrial DNA. N. Engl. J. Med. 347: 576-580. Springer, M.S., DeBry, R.W., Douady, C., Amrine, H.M., Madsen, O., de Jong, W.W., and Stanhope, M.J. 2001. Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol. Biol. Evol. 18: 132-143. Stoneking, M. 2000. Hypervariable sites in the mtDNA control region are mutational hotspots. Am. J. Hum. Genet. 67: 1029-1032.[CrossRef][Medline] Tamura, K. and Nei, M. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512-526.[Abstract] Tsaousis, A.D., Martin, D.P., Ladoukakis, E.D., Posada, D., and Zouros, E. 2005. Widespread recombination in published animal mtDNA sequences. Mol. Biol. Evol. 22: 925-933. Vandewoestijne, S., Baguette, M., Brakefield, P.M., and Saccheri, I.J. 2004. Phylogeography of Aglais urticae (Lepidoptera) based on DNA sequences of the mitochondrial COI gene and control region. Mol. Phyl. Evol. 31: 630-646.[CrossRef][Medline] Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K., and Wilson, A.C. 1991. African populations and the evolution of human mitochondrial DNA. Science 253: 1503-1507. Yang, Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555-556. Yoon, K.L., Aprille, J.R., and Ernst, S.G. 1991. Mitochondrial tRNA Thr mutation in fatal infantile respiratory enzyme deficiency. Biochem. Biophys. Res. Commun. 176: 1112-1115.[CrossRef][Medline]
Received June 17, 2005; accepted in revised format September 28, 2005. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||