|
|
|
|
Published online before print
December 19, 2005, 10.1101/gr.4456006 Genome Res. 16:173-181, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Letter Identification by full-coverage array CGH of human DNA copy number increases relative to chimpanzee and gorilla1 Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada V5Z 4S6 2 Cornell University, Ithaca, New York 14853, USA
Duplication of chromosomal segments and associated genes is thought to be a primary mechanism for generating evolutionary novelty. By comparative genome hybridization using a full-coverage (tiling) human BAC array with 79-kb resolution, we have identified 63 chromosomal segments, ranging in size from 0.65 to 1.3 Mb, that have inferred copy number increases in human relative to chimpanzee. These segments span 192 Ensembl genes, including 82 gene duplicates (41 reciprocal best BLAST matches). Synonymous and nonsynonymous substitution rates across these pairs provide evidence for general conservation of the amino acid sequence, consistent with the maintenance of function of both copies, and one case of putative positive selection for an uncharacterized gene. Surprisingly, the core histone genes H2A, H2B, H3, and H4 have been duplicated in the human lineage since our split with chimpanzee. The observation of increased copy number of a human cluster of core histone genes suggests that altered dosage, even of highly constrained genes, may be an important evolutionary mechanism.
Gene duplication has long been considered a primary mechanism of adaptive evolution (Ohno 1970
Using full-coverage BAC array CGH, we executed a three-phased approach to identify segments of human genomic DNA that have likely been acquired since divergence from the common ancestor we share with chimpanzee. First, two samples of human genomic DNA (gDNA), one pooled from seven unrelated males and the other pooled from four unrelated females, were cohybridized to identify and exclude nodes on the array that gave aberrant ratios in a human-only comparison. Pooled DNAs were used in order to minimize the number of hybridization experiments, and to favor the detection of fixed rather than polymorphic copy number differences. Of the 31,842 mapped autosomal clones on the array, 212 showed aberrant ratios (>1.5 H-spread; see Methods) in the human-human comparison, and were excluded from further analysis. Next, we hybridized the human test DNA sample pooled from seven human males to a reference DNA sample comprised of DNA pooled from three unrelated male chimpanzees (Coriell Institute, Repository numbers NAO3448, NAO3450, NAO3452) (Fig. 1). These hybridizations were repeated under dye reversal, and a total of 1319 clones (855 increases, 464 decreases) were identified that consistently showed ratios that exceeded threshold in both dye orientations. As an added measure of stringency, we retained clones only if (1) they were confirmed by an equivalent copy number aberration in at least one additional overlapping clone, or (2) their location in the human reference genome sequence (NCBI_34) is supported by both their restriction digest pattern and BAC end sequence placement (Krzywinski et al. 2004
We used gorilla as an outgroup to determine the most likely ancestral copy number state. By parsimony, human chromosomal segments showing an increased copy number ratio relative to both chimpanzee and gorilla most likely represent insertions specific to the human lineage. This is true regardless of whether the human genomic region containing the given segment is more similar to chimpanzee or gorilla. Note, however, that there are further caveats to the parsimony approach that must be considered. While the widely accepted species tree of hominoids places human and chimpanzee as a clade, with gorilla as an outgroup, there are regions of the genome that are incongruent with the species tree. For regions of the genome consistent with a human-gorilla clade, the assignment of a copy number increase to human is unaffected, that is, parsimony still favors a single event in the human branch over two independent events in the chimp and gorilla branches. However, a study by Chen and Li (2001 20% of the genome. In some regards, orangutan may be a more suitable outgroup, since the ratio of unresolved ancestral polymorphism to divergence is much lower because of the longer divergence time. However, a potential drawback in using orangutan as an outgroup in these experiments is that the arrays are spotted with human genomic clones, and hybridization becomes less reliable when more distant species are evaluated. Thus, we proceeded with hybridization of the pooled human male test DNA sample to reference DNA from a single female gorilla (Coriell Institute, Repository number NGO5251). We decided to use human male test DNA rather than female test DNA for consistency with previous experiments. Because we had fewer chimpanzee and gorilla samples than human samples, there is some possibility that sites that are polymorphic in chimpanzee and gorilla have impacted our analysis. The fact that we restrict analysis to genomic segments where chimpanzee and gorilla copy number agree, relative to human, minimizes this impact. Of the 585 clones that had an elevated ratio in human relative to chimpanzee, 235 also gave elevated ratios relative to gorilla and therefore likely represent human-specific copy number increases. Presumably, the subset of clones that did not show elevated ratios relative to gorilla represent copy number decreases in chimpanzee relative to the ancestral state. Again, as an added measure of stringency, clones have been retained in the set of 235 only if they are confirmed by an equivalent copy number aberration in at least one additional overlapping clone, or their location in the human reference genome sequence (NCBI_34) is supported by both their restriction digest pattern and BAC-end sequence placement. These 231 clones collapse into 55 contiguous chromosomal segments (43 with multiple clones, and 12 singletons) with minimum, maximum, and average segment lengths of 65,252 bp, 1,133,633 bp, and 308,959 bp, respectively, and a cumulative genome footprint of 16,992,728 bp. Separately, we evaluated ratios of clones located on the X and Y chromosomes. We identified a total of eight X chromosome (ChrX) and 28 Y chromosome (ChrY) clones that met the criteria of concordant dye-flip ratios and an equivalent copy number difference in at least one overlapping clone. Sex chromosome ratios from the female gorilla sample are not directly comparable to those from the male chimpanzee and human reference samples, thus for sex chromosome differences we are unable to infer human increase rather than chimp decrease. However, evaluation of duplicate segments within the human reference genome sequence (below) supports the notion that these are copy number increases in the human lineage. These eight ChrX clones and 28 ChrY clones collapse into two ChrX contigs and six ChrY contigs covering 415,787 bp and 1,190,263 bp, respectively, bringing the cumulative genomic footprint of all segments (autosomal plus sex chromosome) to 18,598,778 bp (Table 1). These segments are the basis of further analysis. While loss of genetic material on the human lineage is of considerable interest, here we consider only observed copy number increases. This is because copy number increases, as opposed to losses, can be readily validated through design of quantitative PCR experiments and through evaluation of signatures of duplication events in the reference human genome sequence as we describe below.
Since it is expected that genomic segments recently gained in the human lineage have originated through duplication of existing sequence, we evaluated the degree of overlap between segments identified by CGH and segments identified by in silico analysis (BLAST matches >1 kb long with >90% identity, as described in Krzywinski et al. 2004 79 kb) will not have been detected by CGH.
A total of 192 non-pseudogene Ensembl genes were detected on the 63 duplicated segments. If these segments arose through segmental duplication, we would expect representation from paralogous genes within this set. The coding sequences of these genes were compared by reciprocal BLAST analysis (expect-value cutoff = 10-10), which identified 41 strict paralogous gene pairs (82 genes total) (Table 2). For these genes, pairwise synonymous and nonsynonymous substitution rates were estimated for the aligned sequence using the codon substitution models of Yang and Nielsen (2000
We selected two duplicated loci (AMY1A and CNTNAP3) for evaluation by an independent method (real-time quantitative PCR; Taqman). These loci were sequenced in our human, chimpanzee, and gorilla samples, and then primer/probe sets were designed to regions of sequence that were perfectly conserved between duplicates and among species. Of note, a third human amylase family member (AMY2B) was present on the duplicated segment that contained the two copies of AMY1A; thus the amylase primer probe sets were designed to a region of exact sequence identity among all three amylases. Results from these PCR assays (Fig. 2) verify increased copy number of these loci in human versus chimpanzee and gorilla.
Using full-coverage BAC array CGH, we have identified 63 genomic segments with an increased hybridization ratio in human versus chimpanzee. Because these segments also show an increased hybridization ratio in human versus gorilla, the most parsimonious explanation is that these CGH-defined segments have been duplicated very recently in human evolutionary history, subsequent to our divergence from chimpanzee. This interpretation is supported by the high representation within these segments of in silico defined human segmental duplications, and the verification by real-time quantitative PCR of copy number differences at selected loci. However, the formal possibility remains that some subset of these CGH-defined segments has been independently lost in both chimpanzee and gorilla, rather than gained in humans. Owing to high sequence similarity among these three closely related primates and the substantial length of the CGH BAC probes ( 200 kb), it is exceedingly unlikely that sequence divergence is responsible for any observed differences. It must be considered that a portion of the genome does not represent the species tree but, rather, supports a chimp-gorilla clade over a chimp-human clade. For copy number differences in this portion of the genome, which remains to be accurately mapped, parsimony is not effective in assigning the ancestral copy number state. However, the fact that we have relied on gorilla as an outgroup should not have a significant impact on the results of the present study because we evoke parsimony only in the first data-filtering step of our analysis. Subsequent analysis is strictly focused on paralogous gene pairs within genomic segments with copy number alteration. Tandem duplication is a signature of DNA copy number increase, and provides a level of internal validation to our analysis. Furthermore, where we have done quantitative gene dosage analysis for further verification of human copy number gains (Fig. 2), the data have supported this interpretation.
Interestingly, we observe a substantial DNA copy number increase at chromosome 2q13 in human. This is the site of the telomeric fusion between chimpanzee chromosomes 12 and 13 in the human/chimp common ancestor that resulted in human chromosome 2 (Yunis et al. 1980
We analyzed the gene content of the 63 chromosomal segments with increased copy number in human. Genes within these regions were subjected to reciprocal BLAST analysis to find duplicated copies. Among the 41 high-confidence paralogous gene pairs we detected, the most highly represented gene family is immunoglobulin (IGK) genes, with five paralogous pairs. This is consistent with earlier whole-genome comparative analysis of, for example, fly and mosquito (Christophides et al. 2002
Under neutral evolution, coding mutations will be fixed at the same rate as silent mutations, giving a dN/dS ratio of 1. The median dN/dS ratio observed in our gene set was 0.388, which is consistent with net purifying selection acting on these recently duplicated genes. This observation is consistent with previous reports of reduced dN/dS ratios between paralogous genes in Drosophila (Thornton and Long 2002 The dN/dS ratios reported here are average ratios for the aligned length of each protein pair. Identification and sequencing of the strict orthologs of these genes in chimpanzee and additional primates will allow evaluation of synonymous and nonsynonymous substitution rates in a site-specific and lineage-specific manner and will likely yield further insight into human adaptive evolution. Further exploration of genes and noncoding functional sequences within the boundaries of these variable segments will be helpful for elucidating the genetic basis of human-specific traits.
Comparative genomic hybridization Hybridizations were done using the whole-genome SMRT array (Ishkanian et al. 2004
Microarray analysis
We selected individual thresholds for each array using the same type of calculations used to compute box-and-whisker plots (Tukey 1977
Quantitative PCR
Identification of paralogs
We thank Wan Lam for fabrication of arrays, and Evan Eichler for in silico predictions of segmental duplications in the human reference genome sequence. We thank the Canadian Institutes of Health Research and the BC Cancer Agency for funding this study. R.A.H. is a Michael Smith Foundation for Health Research scholar. A.G.C. is supported by NIH grant HG003229.
Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4456006.
3 Corresponding author. [Supplemental material is available online at www.genome.org.]
Adams, R. and Bischof, L. 1994. Seeded region growing. IEEE Trans. Pattern Anal. Machine Intell. 16: 641-647.[CrossRef] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007. Chen, F.-C. and Li, W.-H. 2001. Genomic differences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68: 444-456.[CrossRef][Medline] Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437: 69-87.[CrossRef][Medline] Christophides, G.K., Zdobnov, E., Barillas-Mury, C., Birney, E., Blandin, S., Blass, C., Brey, P.T., Collins, F.H., Danielli, A., Dimopoulos, G., et al. 2002. Immunity-related genes and gene families in Anopheles gambiae. Science 298: 159-165. Fortna, A., Kim, Y., MacLaren, E., Marshall, K., Hahn, G., Meltesen, L., Brenton, M., Hink, R., Burgers, S., Hernandez-Boussard, T., et al. 2004. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol. 2: e207. Fujiyama, A., Watanabe, H., Toyoda, A., Taylor, T.D., Itoh, T., Tsai, S.F., Park, H.S., Yaspo, M.L., Lehrach, H., Chen, Z., et al. 2002. Construction and analysis of a human-chimpanzee comparative clone map. Science 295: 131-134. Gibbs, R.A., Weinstock, G.M., Metzker, M.L., Muzny, D.M., Sodergren, E.J., Scherer, S., Scott, G., Steffen, D., Worley, K.C., Burch, P.E., et al. 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428: 493-521.[CrossRef][Medline] Grunstein, M., Schedl, P., and Kedes, L. 1976. Isolation and sequence analysis of sea urchin (Lytechinus pictus) histone H4 messenger RNA. J. Mol. Biol. 104: 351-369.[CrossRef][Medline] Hill, C.A., Fox, A.N., Pitts, R.J., Kent, L.B., Tan, P.L., Chrystal, M.A., Cravchik, A., Collins, F.H., Robertson, H.M., and Zwiebel, L.J. 2002. G protein-coupled receptors in Anopheles gambiae. Science 298: 176-178. Ishkanian, A.S., Malloff, C.A., Watson, S.K., DeLeeuw, R.J., Chi, B., Coe, B.P., Snijders, A., Albertson, D.G., Pinkel, D., Marra, M.A., et al. 2004. A tiling resolution DNA microarray with complete coverage of the human genome. Nat. Genet. 36: 299-303.[CrossRef][Medline] Krzywinski, M., Bosdet, I., Smailus, D., Chiu, R., Mathewson, C., Wye, N., Barber, S., Brown-John, M., Chan, S., Chand, S., et al. 2004. A set of BAC clones spanning the human genome. Nucleic Acids Res. 32: 3651-3660. Locke D.P., Segraves, R., Carbone, L., Archidiacono, N., Albertson, D.G., Pinkel, D., and Eichler, E.E. 2003. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13: 347-357. Newman, T.L., Tuzun, E., Morrison, V.A., Hayden, K.E., Ventura, M., McGrath, S.D., Rocchi, M., and Eichler, E.E. 2005. A genome-wide survey of structural variation between human and chimpanzee. Genome Res. 15: 1344-1356. Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin. She, X., Jiang, Z., Clark, R.A., Liu, G., Cheng, Z., Tuzun, E., Church, D.M., Sutton, G., Halpern, A.L., and Eichler, E.E. 2004. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431: 927-930.[CrossRef][Medline] Soille, P. 2003. Morphological image analysis: Principles and applications, 2d ed. Springer-Verlag, Berlin, Heidelberg. Spiegel, I., Salomon, D., Erne, B., Schaeren-Wiemers, N., and Peles, E. 2002. Caspr3 and caspr4, two novel members of the caspr family are expressed in the nervous system and interact with PDZ domains. Mol. Cell. Neurosci. 20: 283-297.[CrossRef][Medline] Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G., Korf, I., Lapp, H., et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12: 1611-1618. Thornton, K. and Long, M. 2002. Rapid divergence of gene duplicates on the Drosophila melanogaster X chromosome. Mol. Biol. Evol. 19: 918-925. Tukey, J.W. 1977. Explanatory data analysis. Addison-Wesley, Reading, MA. Watanabe, H., Fujiyama, A., Hattori, M., Taylor, T.D., Toyoda, A., Kuroki, Y., Noguchi, H., BenKahla, A., Lehrach, H., Sudbrak, R., et al. 2004. DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature 429: 382-388.[CrossRef][Medline] Yang, Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555-556. Yang, Z. and Nielsen, R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17: 32-43. Yang, Y.H., Buckley, M.J., and Speed, T.P. 2001. Brief. Bioinformatics 2: 341-349. Yang, Y.H., Dudoit, S., Luu, P., Li, D.M., Peng, V., Ngai, J., and Speed, T.P. 2002. Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30: e15. Yunis, J.J., Sawyer, J.R., and Dunham, K. 1980. The origin of man: A chromosomal pictorial legacy. Science 208: 1145-1148. Zdobnov, E.M., von Mering, C., Letunic, I., Torrents, D., Suyama, M., Copley, R.R., Christophides, G.K., Thomasova, D., Holt, R.A., Subramanian, G.M., et al. 2002. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 298: 149-159. Zhang, L., Vision, T.J., and Gaut, B.S. 2002. Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 19: 1464-1473.
Received July 19, 2005; accepted in revised format November 9, 2005. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||