|
|
|
|
Published online before print
April 10, 2006, 10.1101/gr.4949406 Genome Res. 16:576-583, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Letter A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications1 Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA; 2 Department of Genetics, Case Western Reserve University, Cleveland, Ohio 44106, USA; 3 Bovine Functional Genomics Laboratory, US Department of Agriculture, Beltsville, Maryland 20705, USA; 4 Department of Genetics and Microbiology, University of Bari, 70126 Bari, Italy; 5 The Institute for Genomic Research, Rockville, Maryland 20850, USA; 6 Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, Bethesda, Maryland 20892, USA; 7 Howard Hughes Medical Institute, Seattle, Washington 98195, USA
Compared with other sequenced animal genomes, human segmental duplications appear larger, more interspersed, and disproportionately represented as high-sequence identity alignments. Global sequence divergence estimates of human duplications have suggested an expansion relatively recently during hominoid evolution. Based on primate comparative sequence analysis of 37 unique duplicationtransition regions, we establish a molecular clock for their divergence that shows a significant increase in their effective substitution rate when compared with unique genomic sequence. Fluorescent in situ hybridization (FISH) analyses from 1053 random nonhuman primate BACs indicate that great-ape species have been enriched for interspersed segmental duplications compared with representative Old World and New World monkeys. These findings support computational analyses that show a 12-fold excess of recent (>98%) intrachromosomal duplications when compared with duplications between nonhomologous chromosomes. These architectural shifts in genomic structure and elevated substitution rates have important implications for the emergence of new genes, gene-expression differences, and structural variation among humans and great apes.
Duplications play a pivotal role in disease process, gene evolution, and genome rearrangement. Structurally, these sequences have been linked to an increasing number of human genomic disorders within humans (Stankiewicz et al. 2004
Previous analyses confirm that In this study, we attempt to provide a preliminary, unbiased assessment of rates of substitution and changes in duplication architecture based on genomic comparisons with nonhuman primates. We begin by summarizing the apparent unique properties of human segmental duplications compared with other sequenced vertebrate genomes. We then establish a molecular clock for single-base divergence based on the analysis of orthologous primate sequence located at the transition regions between unique and duplicated sequence, and directly estimate the frequency of segmental duplication among other species based on FISH analysis of random genomic clones. Our data demonstrate a proclivity toward expansion of interspersed duplications during the emergence of humans and the great apes.
Properties of human segmental duplications Segmental duplications are distributed nonrandomly across the human genome. We identified 378 regions in excess of 100 kb in length where duplications have accumulatedthis includes 98 regions within 2 Mb of centromere and telomere positions (She et al. 2004a
Interestingly, duplicated regions are particularly rich in transcripts. Overall, when the best-placement of spliced transcripts was considered, we found a higher exon density (62%) in duplicated regions when compared with unique regions of the human genome (Table 2), consistent with earlier findings of the draft genome sequence (Bailey et al. 2002
Compared with other sequenced vertebrate genomes, three properties of human segmental duplications emerge (Table 3; Methods). Human segmental duplications are larger, more interspersed, and show a high degree of sequence identity. Based on the analysis of 25,318 pairwise alignments, we determined that 86.5% of all duplicated bases are part of alignments that exceed 10 kb in length. A total of 55% of human segmental duplications are distributed in an interspersed fashion, where the paralogous pairs are separated by more than 1 Mb or map to nonhomologous chromosomes (Table 3). More than 77% (119/154 Mb) of duplicated bases are part of alignments with >95% sequence identity. These properties contrast sharply with other sequenced vertebrate genomes (Table 3). One caveat to this analysis is that the quality of the various genome sequences differ substantially. Two observations suggest that the observed differences are biological and not an artifact of assembly. First, assembly of the human genome based strictly on whole-genome shotgun sequence (Istrail et al. 2004 20% of segmental duplications are polymorphic within the human and chimpanzee populations (Cheng et al. 2005
We analyzed the divergence of interchromosomal and intrachromosomal alignments and plotted the fraction of duplicated bases as a function of the total number of aligned bases (Fig. 1). Several trends emerge. First, there is mode at 0.05 substitutions per site for interchromosomal duplications, and this dramatically decreases by count and by base-pair representation at lower divergences (Supplemental Fig. S1). Most of the increase in higher sequence identity duplications is due to an expansion of intrachromosomal duplications. The majority of intrachromosomally duplicated bases show <0.03 substitutions per site. These high-identity duplications significantly outnumber interchromosomal duplications by count and by total base pairs (4:1 and 12:1, respectively) at comparable levels of divergence (Supplemental Fig. S1; Fig. 1A). The expansion increases until 0.005 substitutions per site, at which point the number of intrachromosomal alignments reduce. This intrachromosomal expansion of duplications is nonuniformly distributed among human chromosomes, largely restricted to nine autosomes and the sex chromosomes (Fig. 1B). In some cases, as much as 9.4% or 32.1% of the chromosomes total base pairs (chromosome 9 and Y, respectively) arose as a consequence of either recent intrachromosomal duplication or gene conversion events.
A molecular clock for primate segmental duplications We sought to establish a molecular clock to determine the evolutionary age of segmental duplications within the human genome. We first aligned 16.78 Mb of unique noncoding genomic DNA between human and nonhuman primates. Four different nonhuman primate species (chimpanzee, macaque, marmoset, and lemur) were selected, representing different divergence branch points from the human lineage. We limited our analysis to high-quality (i.e., finished) sequences derived from bacterial artificial chromosome (BAC) clones; such sequences were associated with a known error rate. Using these sequences, we calculated the genetic distance (substitutions per base pair) from the human sequence (Fig. 2A; Supplemental Table S1a). Based on estimated divergence times of each primate (Goodman 1999
Estimating the age of duplication events, however, is confounded by the propensity for these sequences to undergo gene conversion (Hurles 2001 ek et al. 2005
To quantify the substitution rate for duplicated sequence more precisely, we specifically compared 37 duplicated regions between human and nonhuman primates (chimpanzee and baboon). We selected BACs containing duplications that were completely anchored within unique regions of the genome, allowing for unambiguous determination of orthologous relationships (Methods). Compared with strictly unique genome sequences, duplicated regions are significantly more diverged (Fig. 2B; Supplemental Table S1b). Between chimpanzee and human, we estimated a 10% increase in the rate of mutation, while an
Several possible explanations might account for the increased substitution rate of duplicated DNA, including CpG bias, gene conversion, and/or relaxed selective constraint (Chen and Li 2001
The human great-ape expansion of segmental duplications
The single most important caveat of this model is that our predictions are based on the human reference sequence and infer history based solely on that evolutionary trajectory. Based simply on the sequence, we cannot, for example, exclude the possibility that other nonhuman primate species have similarly undergone independent intrachromosomal expansions. Based on our model, such expansions are expected to be observed among great-ape species, but be less common among Old World and New World monkey species. To estimate the frequency of segmental duplications more directly, we performed FISH analyses with three nonhuman primates (chimpanzee, macaque, and marmoset). We randomly selected 384 BACs from each species and counted the number of clones displaying a multi-site distribution pattern, thereby indirectly providing an estimate of segmental duplication content (Table 4; Supplemental Table S3). The map position of each locus was determined based on matching the end-sequences of each BAC to positions along the human reference sequence. Previous cytogenetic and in silico estimates of segmental duplication in humans revealed that in situ estimates are a remarkably accurate indicator of recent duplication content (Cheung et al. 2001
These FISH analyses augment two important aspects of our model. First, we observed an increase in the number of segmental duplications among chimpanzees compared with either baboon (P = 0.0679, Fisher exact test) or marmoset (P = 0.0002) (Table 4). In fact, the marmoset estimate for segmental duplications ( 2%) is similar to experimental and computational predictions for other mammals, such as the rat and mouse (Table 3) (Cheung et al. 2003b 30% (10/26) of the duplicated BACs map to corresponding unique regions in the human genome. This suggests that both great apes and humans have been predisposed to expansions of interspersed segmental duplications, and that a significant number of these will have occurred within different regions of the genome. These findings of extensive de novo duplication in each lineage are consistent with the recent analysis of the chimpanzee genome (Cheng et al. 2005
Our studies establish a baseline for estimating the age of segmental duplication and predict an elevated primate substitution rate for duplicated DNA compared with unique noncoding sequence. Surprisingly, our analyses also show that unique sequence-flanking segmental duplications experienced comparable increases in substitution rate. Evolutionary variability in the boundaries between unique and duplicated DNA (i.e., duplication shadowing) may account for this property. We tested for this effect based on our knowledge of the duplication map of the chimpanzee genome (Cheng et al. 2005
Our assessment of segmental duplication among three nonhuman primates provides the first experimental evidence that humans and great apes are enriched for interspersed segmental duplications compared with other primate lineages. The evolutionary basis for this predilection is unknown, but may be related to smaller effective population size, adaptation, a slowdown in the molecular clock, and/or relaxed selective constraints (Li and Tanimura 1987
Segmental duplication analysis We used a BLAST-based detection scheme (WGAC) (Bailey et al. 2001
Substitution rates
FISH analysis
We thank Devin Locke and Matthew Johnson for technical assistance. This work was supported, in part, by NIH grants GM58815 to E.E.E and by funds provided through the NHGRI Intramural Program of the NIH to E.D.G. E.E.E. is an investigator of the Howard Hughes Medical Institute. In addition, the authors gratefully acknowledge CEGBA (Centro di Eccellenza Geni in campo Biosanitario e Agroalimentare), MIUR (Ministero Italiano della Università e della Ricerca; Cluster C03, Prog. L.488/92), the European Commission (INPRIMAT, QLRI-CT-2002-01325), and the BMBF (Bundesministerium für Bildung und Forschung) for financial support.
8 These two authors contributed equally to this work.
E-mail eee{at}gs.washington.edu; fax (206) 685-7301. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4949406
Armengol L., Pujana M.A., Cheung J., Scherer S.W., Estivill X. 2003. Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum. Mol. Genet. 12: 22012208. Bailey J.A., Yavor A.M., Massa H.F., Trask B.J., Eichler E.E. 2001. Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 11: 10051017. Bailey J.A., Gu Z., Clark R.A., Reinert K., Samonte R.V., Schwartz S., Adams M.D., Myers E.W., Li P.W., Eichler E.E. 2002. Recent segmental duplications in the human genome. Science 297: 10031007. Bailey J.A., Liu G., Eichler E.E. 2003. An Alu transposition model for the origin and expansion of human segmental duplications. Am. J. Hum. Genet. 73: 823834.[CrossRef][Medline] Bailey J.A., Baertsch R., Kent W.J., Haussler D., Eichler E.E. 2004a. Hotspots of mammalian chromosomal evolution. Genome Biol. 5: R23.[CrossRef][Medline] Bailey J.A., Church D.M., Ventura M., Rocchi M., Eichler E.E. 2004b. Analysis of segmental duplications and genome assembly in the mouse. Genome Res. 14: 789801. Chen F.C. and Li W.H. 2001. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68: 444456.[CrossRef][Medline] Cheng Z., Ventura M., She X., Khaitovich P., Graves T., Osoegawa K., Church D., DeJong P., Wilson R.K., Pääbo S.et al. 2005. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437: 8893.[CrossRef][Medline] Cheung V.G., Nowak N., Jang W., Kirsch I.R., Zhao S., Chen X.N., Furey T.S., Kim U.J., Kuo W.L., Olivier M.et al. 2001. Integration of cytogenetic landmarks into the draft sequence of the human genome. The BAC Resource Consortium. Nature 409: 953958.[CrossRef][Medline] Cheung J., Estivill X., Khaja R., MacDonald J.R., Lau K., Tsui L.C., Scherer S.W. 2003a. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4: R25.[CrossRef][Medline] Cheung J., Wilson M.D., Zhang J., Khaja R., MacDonald J.R., Heng H.H., Koop B.F., Scherer S.W. 2003b. Recent segmental and gene duplications in the mouse genome. Genome Biol. 4: R47.[CrossRef][Medline] Courseaux A., Richard F., Grosgeorge J., Ortola C., Viale A., Turc-Carel C., Dutrillaux B., Gaudray P., Nahon J.L. 2003. Segmental duplications in euchromatic regions of human chromosome 5: A source of evolutionary instability and transcriptional innovation. Genome Res. 13: 369381. DeSilva U., Massa H., Trask B.J., Green E.D. 1999. Comparative mapping of the region of human chromosome 7 deleted in williams syndrome. Genome Res. 9: 428436. Goodman M. 1999. The genomic record of Humankinds evolutionary roots. Am. J. Hum. Genet. 64: 3139.[CrossRef][Medline] Hurles M.E. 2001. Gene conversion homogenizes the CMT1A paralogous repeats. BMC Genomics 2: 11.[CrossRef][Medline] Iafrate A.J., Feuk L., Rivera M.N., Listewnik M.L., Donahoe P.K., Qi Y., Scherer S.W., Lee C. 2004. Detection of large-scale variation in the human genome. Nat. Genet. 36: 949951.[CrossRef][Medline] International Human Genome Sequencing Consortium (IHGSC). . 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931945.[CrossRef][Medline] Istrail S., Sutton G.G., Florea L., Halpern A.L., Mobarry C.M., Lippert R., Walenz B., Shatkay H., Dew I., Miller J.R.et al. 2004. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. 101: 19161921. Jackson M.S., Oliver K., Loveland J., Humphray S., Dunham I., Rocchi M., Viggiano L., Park J.P., Hurles M.E., Santibanez-Koref M. 2005. Evidence for widespread reticulate evolution within human duplicons. Am. J. Hum. Genet. 77: 824840.[CrossRef][Medline] Johnson M.E., Viggiano L., Bailey J.A., Abdul-Rauf M., Goodwin G., Rocchi M., Eichler E.E. 2001. Positive selection of a gene family during the emergence of humans and African apes. Nature 413: 514519.[CrossRef][Medline] Jurka J. 2004. Evolutionary impact of human Alu repetitive elements. Curr. Opin. Genet. Dev. 14: 603608.[CrossRef][Medline] Keightley P.D., Kryukov G.V., Sunyaev S., Halligan D.L., Gaffney D.J. 2005. Evolutionary constraints in conserved nongenic sequences of mammals. Genome Res. 15: 13731378. Kimura M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 111120.[CrossRef][Medline] Li W.H. and Tanimura M. 1987. The molecular clock runs more slowly in man than in apes and monkeys. Nature 326: 9396.[CrossRef][Medline] Linardopoulou E.V., Williams E.M., Fan Y., Friedman C., Young J.M., Trask B.J. 2005. Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature 437: 94100.[CrossRef][Medline] Liu G., Zhao S., Bailey J.A., Sahinalp S.C., Alkan C., Tuzun E., Green E.D., Eichler E.E. 2003. Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome Res. 13: 358368. Murphy W.J., Larkin D.M., Everts-van der Wind A., Bourque G., Tesler G., Auvil L., Beever J.E., Chowdhary B.P., Galibert F., Gatzke L.et al. 2005. Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science 309: 613617. Orti R., Potier M.C., Maunoury C., Prieur M., Creau N., Delabar J.M. 1998. Conservation of pericentromeric duplications of a 200-kb part of the human 21q22.1 region in primates. Cytogenet. Cell Genet. 83: 262265.[CrossRef][Medline] Paulding C.A., Ruvolo M., Haber D.A. 2003. The Tre2 (USP6) oncogene is a hominoid-specific gene. Proc. Natl. Acad. Sci. 100: 25072511. Pavli Samonte R.V. and Eichler E.E. 2002. Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 3: 6572.[Medline] Sebat J., Lakshmi B., Troge J., Alexander J., Young J., Lundin P., Maner S., Massa H., Walker M., Chi M.et al. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525528. Shaikh T.H., Kurahashi H., Emanuel B.S. 2001. Evolutionarily conserved low copy repeats (LCRs) in 22q11 mediate deletions, duplications, translocations, and genomic instability: An update and literature review. Genet. Med. 3: 613.[Medline] Sharp A.J., Locke D.P., McGrath S.D., Cheng Z., Bailey J.A., Vallente R.U., Pertz L.M., Clark R.A., Schwartz S., Segraves R.et al. 2005. Segmental duplications and copy number variation in the human genome. Am. J. Hum. Genet. 77: 7888.[CrossRef][Medline] She X., Horvath J.E., Jiang Z., Liu G., Furey T.S., Christ L., Clark R., Graves T., Gulden C.L., Alkan C.et al. 2004a. The structure and evolution of centromeric transition regions within the human genome. Nature 430: 857864.[CrossRef][Medline] She X., Jiang Z., Clark R.A., Liu G., Cheng Z., Tuzun E., Church D.M., Sutton G., Halpern A.L., Eichler E.E. 2004b. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431: 927930.[CrossRef][Medline] Skaletsky H., Kuroda-Kawaguchi T., Minx P.J., Cordum H.S., Hillier L.W., Brown L.G., Repping S., Pyntikova T., Ali J., Bieri T.et al. 2003. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 433: 825837. Stankiewicz P., Shaw C.J., Withers M., Inoue K., Lupski J.R. 2004. Serial segmental duplications during primate evolution result in complex human genome architecture. Genome Res. 14: 22092220. Tuzun E., Bailey J.A., Eichler E.E. 2004. Recent segmental duplications in the working draft assembly of the brown Norway rat. Genome Res. 14: 493506. Tuzun E., Sharp A.J., Bailey J.A., Kaul R., Morrison V.A., Pertz L.M., Haugen E., Hayden H., Albertson D., Pinkel D.et al. 2005. Fine-scale structural variation of the human genome. Nat. Genet. 37: 727732.[CrossRef][Medline] Wall J.D., Andolfatto P., Przeworski M. 2002. Testing models of selection and demography in Drosophila simulans.. Genetics 162: 203216. Zhang L., Lu H.H., Chung W.Y., Yang J., Li W.H. 2005. Patterns of segmental duplication in the human genome. Mol. Biol. Evol. 22: 135141.
Received November 22, 2005; accepted in revised format February 14, 2006. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||