|
|
|
|
Genome Res. 15:1487-1495, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00 Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome1 Genetic Information Research Institute, Mountain View, California 94043, USA 2 Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York 10461, USA 3 Department of Molecular Genetics, Albert Einstein College of Medicine, Bronx, New York 10461, USA
Velo-cardio-facial syndrome/DiGeorge syndrome results from unequal crossing-over events between two 240-kb low-copy repeats termed LCR22 (LCR22-2 and LCR22-4) on Chromosome 22q11.2, comprised of modules, each of which are >99% identical in sequence. To delineate regions in the LCR22s that might contain hotspots for 22q11.2 rearrangements, we scanned the interval for increased rates of recombination with the hypothesis that these regions might be more prone to breakage. We generated an algorithm to detect sites of altered recombination by searching for single nucleotide polymorphic positions in BAC clones from different libraries mapped to LCR22-2 and LCR22-4. This method distinguishes single nucleotide polymorphisms from paralogous sequence variants and complex polymorphic positions. Sites of shared polymorphism are considered potential sites of gene conversion or double cross-over between the two LCR22s. We found an inverse correlation between regions of paralogous sequence variants that are unique to a given position within one LCR22 and clusters of shared polymorphic sites, suggesting that these clusters depict altered recombination and not remnants of ancestral single nucleotide polymorphisms. We postulate that most shared polymorphic sites are products of past transfers of DNA information between the LCR22s, suggesting that frequent traffic of genetic material may induce genomic instability in the two LCR22s. We also found that gaps up to 1.5 kb long can be transferred between LCR22s.
Diseases involving chromosome rearrangements of >1 Mb are referred to as genomic disorders, and most are mediated by region-specific low-copy repeats (LCRs) (Lupski 1998
The 22q11.2 region is particularly susceptible to meiotic chromosome rearrangements associated with genomic disorders including velo-cardio-facial syndrome/DiGeorge syndrome (VCFS/DGS MIM192430/MIM188400) (DiGeorge 1965
Recently, it has been found that there are positional recombination hotspots responsible for the CMT1A/HNPP rearrangements on Chromosome 17p12 (Reiter et al. 1998 To determine whether there are variations between gene conversion or recombination levels spanning the two LCR22s on Chromosome 22q11, we examined the sequence between clones spanning each. Using single nucleotide variants from multiple different BAC clone alignments from different libraries, we detected signatures or clusters of frequent gene conversion or recombination between LCR22-2 and LCR22-4 computationally and experimentally.
Polymorphisms in LCR22-2 and LCR22-4 The LCR22s comprise 11% of the 22q11.2 region and contain genes and unprocessed pseudogene copies (Bailey et al. 2002
To detect potential recombination/gene conversion events between LCR22-2 and LCR22-4, we searched for polymorphic positions between the two. We first created a global alignment of all BAC clones that harbor the LCR22 segments but are anchored because of the asymmetric pattern of blocks in the two LCR22s (Supplemental Fig. 1S) and/or by the presence of flanking unique sequences (except for AP000551
[GenBank]
) (Edelmann et al. 1999a
The clone alignment revealed many polymorphic positions (Fig. 2C). Each type of single nucleotide variant was defined as shown in Figure 3. Paralogous sequence variants (PSVs) are positions that are conserved in each LCR22, but different between them; such as an A in LCR22-2 and a T in LCR22-4. LCR-specific single nucleotide polymorphisms (SNPs) correspond to positions that vary in one LCR22, but not in the other. If both LCR22 positions are variable, they are classed as either shared or nonshared. If a nucleotide variant in one LCR22 is equal, or included within the variation of the second LCR22, then the position is termed a shared polymorphism site (SPS); the other positions are unshared polymorphic sites (NPSs). SPSs are sites of potential recombination/gene conversion. Positions from all categories were further divided into nonrepetitive (unique DNA); those found in interspersed repeats (copies of transposable elements); and polymorphic sites located in simple repeats such as micro- and minisatellites, satellites, or low complexity regions (Fig. 2C). We detected a total of 2492 non-gap (gap-free), polymorphic positions in the 176,245-bp-long alignment. Next, we excluded all positions that mapped to simple repeats. From the 2308 remaining positions, there were 1058 SNPs in LCR22-2, 443 SNPs in LCR22-4, 688 PSVs, 114 SPSs, and five NPSs.
The density of single nucleotide variants was quite high compared to the genome average of 1 SNP/kb (Li and Sadler 1991
We found that the distribution of individual groups of polymorphic sites is highly nonrandom (Fig. 2C,D). Sequences comprising the most centromeric 20 kb of LCR22-4 are almost identical between the LCR22-4 clones AC008018
[GenBank]
and AC000550, and, as a consequence, PSVs, SPSs, and LCR22-4 SNPs are absent from the first 20 kb. The SPSs form several clusters, implicating high levels of recombination/gene conversion (see below). Using a probabilistic model (see Methods), we defined clusters of highly nonrandom concentration of SPSs (Fig. 2E). One such cluster is located at positions 3540 kb corresponding to the pseudogene GGTLA. A large region of high SPS density is located at positions 65165 kb. There are particularly SPS-rich regions at positions 7583 kb and within pseudogenes DKFZp434P211 and BCR. No obvious correlation between SPS hotspots and unstable motifs such as palindromes (Fig. 2G) or repetitive DNA (Fig. 2H) was detected. Furthermore, analysis of various recognition motifs of endonucleases and recombinases failed to reveal any association (data not shown). Similar negative results have been reported for gene conversion in the AZFa region on Yq (Bosch et al. 2004
Signature of DNA transfer between LCR22-2 and LCR22-4 Shared polymorphic positions can be considered as potential sites of information transfer by recombination (gene conversion or double cross-over) between LCR22-2 and LCR22-4. Nevertheless, shared polymorphic sites could have been created by independent mutations in both LCR22-2 and LCR22-4. The probability of such events can be estimated from nonshared polymorphic sites (NPSs); since random events should create both shared and nonshared polymorphisms. For simplicity, we consider only shared and nonshared sites with the most common dinucleotide (not tri- or tetranucleotide) polymorphism in both LCR22s, after excluding all simple repeat positions because of possible alignment artifacts. The expected ratio of shared polymorphism/nonshared polymorphism is 0.4 (six shared, 15 nonshared dinucleotide combinations in 21 possible). Having found five different NPSs in the entire clone alignment, we expect to find two shared polymorphic dinucleotide sites, compared to 114 observed. This discrepancy is highly statistically significant (p <108, Binomial test). As a consequence, most if not all SPSs are not independent, that is, they were not created by independent mutations between LCR22-2 and LCR22-4.
The fact that SPSs seem to be interdependent between the LCRs can be explained by two different mechanisms: (1) by the preservation of ancestral, pre-duplication polymorphism, or (2) by transfer of genetic information between the LCRs by recombination/gene conversion (concerted evolution). If the second scenario is correct, the prediction is that in places of high concentration of shared polymorphism sites, we should find nearly no PSVs. PSVs should be homogenized between the LCRs by recombination/gene conversion. On the other hand, if shared polymorphic sites are just remnants of ancient, pre-duplication polymorphism, no correlation between PSVs and SPSs is expected (Fig. 4). Figure 2D shows that the PSVs are underrepresented in regions with frequent SPSs. This was confirmed by a statistical analysis of 10-kb-long, nonoverlapping segments after removal of the first 20 kb. The correlation between the number of PSVs and number of shared polymorphism sites was negative, 0.55 (p < 0.05; Spearman's correlation coefficient). This strongly indicates that many shared polymorphic sites are a result of true recombination and not remnants of ancestral polymorphisms. In conclusion, we can postulate that LCR22-2 and LCR22-4 SPSs are not independent and most of them are products of past transfers of DNA information between the LCRs.
Representative PSVs and SPSs in LCR22-2 and LCR22-4
Indel shuffling by inter-LCR recombination Notably, both the indels are found in the large region of high SPS concentration at positions 65165 kb. The short duplication is located around positions 77,47377,635, within a particularly SPS-rich region, 7583 kb. Given the high concentration of shared polymorphic sites and low concentration of PSVs, both the indel regions seem to be products of concerted evolution, rather than remnants of ancient pre-duplication polymorphism.
Sequence comparison of BAC clones covering 240-kb repeats on Chromosome 22q11.2, frequently deleted in patients with velo-cardio-facial syndrome/DiGeorge syndrome, revealed a complex pattern of polymorphic sites. Apart from paralogous sequence variants (PSVs) and LCR-specific SNPs, we have detected positions that are polymorphic in both LCRs. Based on equality/inclusion of the variations, these were classified as shared and nonshared polymorphic sites (SPSs and NPSs, respectively). The SPSs are equivalent to previously reported multisite variation type 2 (MSV2) (Fredman et al. 2004
The SNP density along LCR22-2/LCR22-4 is relatively high (6.4 and 3.0 SNPs/kb for LCRs 22-2 and 22-4, respectively), despite the fact that our method precisely maps polymorphic sites to the LCRs and avoids frequent identification of PSVs as ambiguous SNPs in segmental duplications (Estivill et al. 2002
The overwhelming majority of positions polymorphic in both LCR22s represent shared polymorphism, indicating interdependence of polymorphism between the LCR22-2/LCR22-4 segmental duplications. Taking into account the presence of several LCR22-specific insertions/deletions and 1% divergence along the homologous segments, the potential contribution of ancestral polymorphism seems limited because of the relatively ancient origin of the duplications (Shaikh et al. 2000
Our BAC clones-based method cannot formally distinguish between crossovers and gene conversion. Several recent approaches addressed this difficulty by different approaches including sperm typing (Jeffreys and May 2004
More complicated is the situation with shared indels between LCR22-2/LCR22-4, since one is 1470 bp long. Typical interallelic gene conversion tracts detected in the human genome are relatively short, with a range estimated to be somewhere between dozens and several hundred base pairs (Bosch et al. 2004
One of our major goals was to predict potential hotspots of deleterious 22q11.2 rearrangements. Both gene conversion and crossover hotspots tend to colocalize in the human genome (Jeffreys and May 2004
Current evidence indicates that recent segmental duplications may exchange genetic information, preferably via gene conversion (Rozen et al. 2003
Sequence analysis DNA and protein sequences were aligned by BLAT (Kent 2002
Detection of SPSs clusters
-function Ip(r, n r + 1) such that P{Sn r} = Ip(r, n r + 1). Hence for any window, we can count the number of events and evaluate the probability that at least this many would have occurred in that interval by random chance. The lower the probability, the higher is the significance of the window. Probabilities were calculated in this way for all possible windows spanning SPSs. The windows were ordered from most to least significant (lowest to highest probability). Each position in the BAC alignment was then assigned the value of the lowest SPS probability window within which it falls.
PCR analysis of BAC clones
We thank Melanie Babcock for helpful discussions. This work was supported by the NIH, P01 HD039420-04S2 (B.E.M.).
[Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4281205. Freely available online through the Genome Research Immediate Open Access option.
4 Corresponding authors.
Babcock, M., Pavlicek, A., Spiteri, E., Kashork, C.D., Ioshikhes, I., Shaffer, L.G., Jurka, J., and Morrow, B.E. 2003. Shuffling of genes within low-copy repeats on 22q11 (LCR22) by -mediated recombination events during evolution. Genome Res. 13: 25192532.
Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002. Recent segmental duplications in the human genome. Science 297: 10031007.
Baumer, A., Dutly, F., Balmer, D., Riegel, M., Tukel, T., Krajewska-Walasek, M., and Schinzel, A.A. 1998. High level of unequal meiotic crossovers at the origin of the 22q11.2. 2 and 7q11.23 deletions. Hum. Mol. Genet. 7: 887894.
Benson, G. 1999. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27: 573580. Bergman, A. and Blennow, E. 2000. Inv dup(22), del(22)(q11) and r(22) in the father of a child with DiGeorge syndrome. Eur. J. Hum. Genet. 8: 801804.[Medline] Bi, W., Park, S.S., Shaw, C.J., Withers, M.A., Patel, P.I., and Lupski, J.R. 2003. Reciprocal crossovers and a positional preference for strand exchange in recombination events resulting in deletion or duplication of chromosome 17p11.2. Am. J. Hum. Genet. 73: 13021315.[CrossRef][Medline]
Bosch, E., Hurles, M.E., Navarro, A., and Jobling, M.A. 2004. Dynamics of a human interparalog gene conversion hotspot. Genome Res. 14: 835844.
Bray, N. and Pachter, L. 2004. MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14: 693699. Burn, J. and Goodship, J. 1996. Congenital heart disease. In Emery and Rimoin's principles and practice of medical genetics, 3rd ed. (eds. D.L. Rimoin et al.), Vol. 1, pp. 767828. Churchill Livingston, New York. Cargill, M., Altshuler, D., Ireland, J., Sklar, P., Ardlie, K., Patil, N., Shaw, N., Lane, C.R., Lim, E.P., Kalyanaraman, N., et al. 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22: 231238.[CrossRef][Medline] DiGeorge, A. 1965. A new concept of the cellular basis of immunity. J. Pediatr. 67: 907.
Edelmann, L., Pandita, R.K., Spiteri, E., Funke, B., Goldberg, R., Palanisamy, N., Chaganti, R.S., Magenis, E., Shprintzen, R.J., and Morrow, B.E. 1999a. A common molecular basis for rearrangement disorders on chromosome 22q11.2. Hum. Mol. Genet. 8: 11571167. Edelmann, L., Pandita, R.K., and Morrow, B.E. 1999b. Low-copy repeats mediate the common 3-Mb deletion in patients with velo-cardio-facial syndrome. Am. J. Hum. Genet. 64: 10761086.[CrossRef][Medline]
Edelmann, L., Stankiewicz, P., Spiteri, E., Pandita, R.K., Shaffer, L., Lupski, J.R., and Morrow, B.E. 2001. Two functional copies of the DGCR6 gene are present on human chromosome 22q11 due to a duplication of an ancestral locus. Genome Res. 11: 208217. Ensenauer, R.E., Adeyinka, A., Flynn, H.C., Michels, V.V., Lindor, N.M., Dawson, D.B., Thorland, E.C., Lorentz, C.P., Goldstein, J.L., McDonald, M.T., et al. 2003. Microduplication 22q11.2, an emerging syndrome: Clinical, cytogenetic, and molecular analysis of thirteen patients. Am. J. Hum. Genet. 73: 10271040.[CrossRef][Medline]
Estivill, X., Cheung, J., Pujana, M.A., Nakabayashi, K., Scherer, S.W., and Tsui, L.C. 2002. Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. Hum. Mol. Genet. 11: 19871995. Fredman, D., White, S.J., Potter, S., Eichler, E.E., Den Dunnen, J.T., and Brookes, A.J. 2004. Complex SNP-related sequence variation in segmental genome duplications. Nat. Genet. 36: 861866.[CrossRef][Medline]
Galtier, N., Gouy, M., and Gautier, C. 1996. SEAVIEW and PHYLO_WIN: Two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12: 543548. Giordano, M., Marchetti, C., Chiorboli, E., Bona, G., and Momigliano Richiardi, P. 1997. Evidence for gene conversion in the generation of extensive polymorphism in the promoter of the growth hormone gene. Hum. Genet. 100: 249255.[CrossRef][Medline] Guanti, G. 1981. The aetiology of the cat eye syndrome reconsidered. J. Med. Genet. 18: 108118.[Abstract] Hurles, M. 2002. Are 100,000 "SNPs" useless? Science 298: 1509. Hurles, M.E., Willey, D., Matthews, L., and Hussain, S.S. 2004. Origins of chromosomal rearrangement hotspots in the human genome: Evidence from the AZFa deletion hotspots. Genome Biol. 5: R55.[CrossRef][Medline] Jeffreys, A.J. and May, C.A. 2004. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat. Genet. 36: 151156.[CrossRef][Medline] Johnson, R.D. and Jasin, M. 2000. Sister chromatid gene conversion is a prominent double-strand break repair pathway in mammalian cells. EMBO J. 19: 33983407.[CrossRef][Medline] Jurka, J. 2000. Repbase update: A database and an electronic journal of repetitive elements. Trends Genet. 16: 418420.[CrossRef][Medline] Jurka, J., Klonowski, P., Dagman, V., and Pelton, P. 1996. CENSORA program for identification and elimination of repetitive elements from DNA sequences. Comput. Chem. 20: 119121.[CrossRef][Medline] Kapitonov, V.V., Pavlicek, A., and Jurka, J. 2004. Anthology of human repetitive DNA. In Encyclopedia of molecular cell biology and molecular medicine (ed. R.A. Meyers), Vol. 1, pp. 251305. Wiley-VCH, New York.
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. 2002. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30: 30593066.
Kent, W.J. 2002. BLATThe BLAST-like alignment tool. Genome Res. 12: 656664.
Krzywinski, M., Bosdet, I., Smailus, D., Chiu, R., Mathewson, C., Wye, N., Barber, S., Brown-John, M., Chan, S., Chand, S., et al. 2004. A set of BAC clones spanning the human genome. Nucleic Acids Res. 32: 36513660. Li, W.H. and Sadler, L.A. 1991. Low nucleotide diversity in man. Genetics 129: 513523.[Abstract] Lindsay, E.A., Goldberg, R., Jurecic, V., Morrow, B., Carlson, C., Kucherlapati, R.S., Shprintzen, R.J., and Baldini, A. 1995. Velo-cardio-facial syndrome: Frequency and extent of 22q11 deletions. Am. J. Med. Genet. 57: 514522.[CrossRef][Medline] Lopez-Correa, C., Brems, H., Lazaro, C., Marynen, P., and Legius, E. 2000. Unequal meiotic crossover: A frequent cause of NF1 microdeletions. Am. J. Hum. Genet. 66: 19691974.[CrossRef][Medline] Lupski, J.R. 1998. Genomic disorders: Structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14: 417422.[CrossRef][Medline] Lupski, J.R. 2003. 2002 Curt Stern Award Address. Genomic disorders recombination-based disease resulting from genomic architecture. Am. J. Hum. Genet. 72: 246252.[CrossRef][Medline] Morrow, B., Goldberg, R., Carlson, C., Das Gupta, R., Sirotkin, H., Collins, J., Dunham, I., O'Donnell, H., Scambler, P., Shprintzen, R., et al. 1995. Molecular definition of the 22q11 deletions in velo-cardio-facial syndrome. Am. J. Hum. Genet. 56: 13911403.[Medline] Reiter, L.T., Hastings, P.J., Nelis, E., De Jonghe, P., Van Broeckhoven, C., and Lupski, J.R. 1998. Human meiotic recombination products revealed by sequencing a hotspot for homologous strand exchange in multiple HNPP deletion patients. Am. J. Hum. Genet. 62: 10231033.[CrossRef][Medline]
Richardson, C. and Jasin, M. 2000. Coupled homologous and nonhomologous repair of a double-strand break preserves genomic integrity in mammalian cells. Mol. Cell. Biol. 20: 90689075.
Richardson, C., Moynahan, M.E., and Jasin, M. 1998. Double-strand break repair by interchromosomal recombination: Suppression of chromosomal translocations. Genes & Dev. 12: 38313842. Rozen, S., Skaletsky, H., Marszalek, J.D., Minx, P.J., Cordum, H.S., Waterston, R.H., Wilson, R.K., and Page, D.C. 2003. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423: 873876.[CrossRef][Medline]
Saitta, S.C., Harris, S.E., Gaeth, A.P., Driscoll, D.A., McDonald-McGinn, D.M., Maisenbacher, M.K., Yersak, J.M., Chakraborty, P.K., Hacker, A.M., Zackai, E.H., et al. 2004. Aberrant interchromosomal exchanges are the predominant cause of the 22q11.2 deletion. Hum. Mol. Genet. 13: 417428. Schmollinger, M., Nieselt, K., Kaufmann, M., and Morgenstern, B. 2004. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors. BMC Bioinformatics 5: 128.[CrossRef][Medline]
Shaikh, T.H., Kurahashi, H., Saitta, S.C., O'Hare, A.M., Hu, P., Roe, B.A., Driscoll, D.A., McDonald-McGinn, D.M., Zackai, E.H., Budarf, M.L., et al. 2000. Chromosome 22-specific low copy repeats and the 22q11.2 deletion syndrome: Genomic organization and deletion endpoint analysis. Hum. Mol. Genet. 9: 489501. Shprintzen, R.J., Goldberg, R.B., Lewin, M.L., Sidoti, E.J., Berkman, M.D., Argamaso, R.V., and Young, D. 1978. A new syndrome involving cleft palate, cardiac anomalies, typical facies, and learning disabilities: Velo-cardio-facial syndrome. Cleft Palate J. 15: 5662.[Medline] Stankiewicz, P. and Lupski, J.R. 2002a. Molecular-evolutionary mechanisms for genomic disorders. Curr. Opin. Genet. Dev. 12: 312329.[CrossRef][Medline] Stankiewicz, P. and Lupski, J.R. 2002b. Genome architecture, rearrangements and genomic disorders. Trends Genet. 18: 7482.[CrossRef][Medline]
Stankiewicz, P., Shaw, C.J., Withers, M., Inoue, K., and Lupski, J.R. 2004. Serial segmental duplications during primate evolution result in complex human genome architecture. Genome Res. 14: 22092220. Visser, R., Shimokawa, O., Harada, N., Kinoshita, A., Ohta, T., Niikawa, N., and Matsumoto, N. 2005. Identification of a 3.0-kb major recombination hotspot in patients with Sotos syndrome who carry a common 1.9-Mb microdeletion. Am. J. Hum. Genet. 76: 5267.[CrossRef][Medline] Vowles, E.J. and Amos, W. 2004. Evidence for widespread convergent evolution around human microsatellites. PLoS Biol. 2: E199.[CrossRef][Medline]
Wang, D.G., Fan, J.B., Siao, C.J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., et al. 1998. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280: 10771082.
http://baboon.math.berkeley.edu/mavid/; MAVID. http://bibiserv.techfak.uni-bielefeld.de/dialign/; Dialign2.2. http://bioinformatics.uams.edu/mafft/; MAFFT. http://genome.ucsc.edu/; UCSC browser. http://pbil.univ-lyon1.fr/software/seaview.html; Seaview. http://tandem.bu.edu/trf/trf.html; Tandem Repeat Finder. http://www.girinst.org/Censor_Server.html; Censor. http://www.girinst.org/Repbase_Update.html; Repbase Update. http://www.repeatmasker.org; RepeatMasker.
Received June 14, 2005; accepted in revised format August 10, 2005. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||