|
|
|
|
Published online before print
January 12, 2004, 10.1101/gr.1929904 Genome Res. 14:239-246, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Letter Closing the Gaps on Human Chromosome 19 Revealed Genes With a High Density of Repetitive Tandemly Arrayed Elements1 Laboratory of Biosystems and Cancer, Center for Cancer Research, National Cancer Institute (NCI, NIH), Bethesda, Maryland, 20892, USA 2 Department of Biology, Dong-A University, Busan 604-714, Korea 3 Department of Genetics, Stanford University School of Medicine, Stanford, California, 94305, USA 4 U.S. Department of Energy Joint Genome Institute, Walnut Creek, California, 94598, USA
The reported human genome sequence includes about 400 gaps of unknown sequence that were not found in the bacterial artificial chromosome (BAC) and cosmid libraries used for sequencing of the genome. These missing sequences correspond to 1% of euchromatic regions of the human genome. Gap filling is a laborious process because it relies on analysis of random clones of numerous genomic BAC or cosmid libraries. In this work we demonstrate that closing the gaps can be accelerated by a selective recombinational capture of missing chromosomal segments in yeast. The use of both methodologies allowed us to close the four remaining gaps on the human chromosome 19. Analysis of the gap sequences revealed that they contain several abnormalities that could result in instability of the sequences in microbe hosts, including large blocks of micro- and minisatellites and a high density of Alu repeats. Sequencing of the gap regions, in both BAC and YAC forms, allowed us to generate a complete sequence of four genes, including the neuronal cell signaling gene SCK1/SLI. The SCK1/SLI gene contains a record number of minisatellites, most of which are polymorphic and transmitted through meiosis following a Mendelian inheritance. In conclusion, the use of the alternative recombinational cloning system in yeast may greatly accelerate work on closing the remaining gaps in the human genome (as well as in other complex genomes) to achieve the goal of annotation of all human genes.
The International Human Genome Sequencing Consortium recently reported that
A traditional method of filling gaps includes screening additional BAC and cosmid libraries. However, this approach is time-consuming and may be not applicable to some gap regions with unusual DNA structures. For example, it is well documented that long inverted repeats, AT-rich sequences, and sequences with structures such as Z-DNA are extremely unstable in Escherichia coli (Hagan and Warren 1982
The introduction of alternative cloning systems and hosts, allowing isolation of genomic segments that are poorly clonable in E. coli cells, may assist the effort to close the gaps. Such a system is yeast artificial chromosome (YAC) cloning in yeast. Several recent reports demonstrate that genomic segments that are unstable in E. coli vectors can be accurately recovered as YACs in yeast (Bigger et al. 2000 For the purpose of gap closure, the radial TAR cloning is the most suitable, because sequences of the flanking clones may be deleted or rearranged, making the development of two specific targeting hooks difficult. In the present study, the TAR cloning approach and screening of additional genomic libraries were used to close gaps on human chromosome 19. A subsequent analysis of the gap sequences allowed us to annotate four human genes and shed light on the nature of poorly clonable chromosome segments.
Closing the Gaps on Human Chromosome 19 The first phase of the chromosome 19 mapping and sequencing was based on chromosome 19-specific cosmid libraries constructed from flow-sorted chromosomes isolated from human-hamster hybrid cell lines containing chromosome 19 as the only human chromosome (Carrano et al. 1989
Radial TAR cloning was successfully used to isolate genomic fragments containing the GAP1, GAP2, GAP3, and GAP6 sequences in yeast. Figure 1 illustrates the scheme of the GAP6 closure between two flanking clones. All four gap regions were selectively cloned as circular YACs using vectors carrying a GAP-specific targeting hook and an Alu repeat as a second targeting sequence (see Methods). Transformation experiments were carried out with freshly prepared yeast spheroplasts and a linearized TAR GAP-specific vector as described (see Methods). For each GAP, from one to five clones positive for either one or both flanking clones were identified (Table 1). The size of the positive YACs was determined (see Methods). The results are summarized in Table 1. Two approaches were taken to verify the integrity of the YACs and their stability during propagation in yeast. In the first approach, YAC DNA was isolated from four subclones of each original GAP1, GAP2, GAP3, and GAP6 clones in plugs, digested by NotI, separated by clamped homogeneous electrical field (CHEF), and then hybridized with an Alu probe. Subclones of each gap carried YACs of the same size, indicating that these clones do not have detectable deletions in yeast. For the second approach, the Alu profiles of four subclones of each clone were determined and shown to be identical, indicating no detectable rearrangements during propagation in yeast cells (Fig. 2A,B). Thus, the YAC clones are relatively stable during propagation in yeast. To evaluate the size of each GAP, the Alu ends of the YAC clones positive for both flanking clones were rescued in E. coli and sequenced (see Methods). The sequences were compared to the draft sequence of chromosome 19 at NCBI and UCSC (build 29, April, 2002) using BLAST. Positions of YAC ends for GAP1, GAP2, GAP3, and GAP6 TAR clones are shown in Table 1. With knowledge of the size of the YACs, the positions of the YAC end sequences, and the hooks in the clones towards the gaps, we estimated their sizes: GAP1,
For further analysis, circular YACs were retrofitted into BACs by homologous recombination in yeast, and transformed into E. coli. Retrofitted YAC/BACs usually transform E. coli with high efficiency: a 1-µL sample of a melted agarose plug usually produces 100-500 transformants. In contrast, the YAC/BACs with GAP inserts transformed E. coli with an efficiency 10 times lower. Most of the BACs rescued in E. coli underwent deletions, suggesting that the inserts are intrinsically unstable in bacterial cells. This observation was consistent with the absence of these sequences in genomic libraries observed when this work was begun. After screening the E. coli transformants, we succeeded in finding BAC clones with no detectable rearrangements for two gaps, GAP1 and GAP6, when transformation and subsequent growth of E. coli cells was performed at 30°C (Fig. 3). Lack of rearrangements in the BAC inserts was confirmed by Alu profile comparison of original YAC isolates, retrofitted YAC/BACs, and BACs with the size of insert not changed (data not shown). The GAP1 and GAP6 BACs were chosen for further sequencing analysis. For GAP2 and GAP3, circular YAC DNAs were isolated from the yeast cells and used for sequencing. In addition, GAP2- and GAP3-deleted BACs were also sequenced to determine whether the deletion(s) occurred at the same region, which might suggest the reason for the instability of these regions in bacterial cells.
In addition to the TAR cloning strategy, clones for three gaps (GAP1, GAP2, and GAP6) were identified by screening two newlibraries (a BAC library, RP13 and an LLNL fosmid library, XXfos), as well as by additional screening of RP11. No bacterial clones linking the contigs that flank GAP3 were identified.
Analysis of GAP1, GAP2, GAP3, and GAP6 Sequences For GAP6, the sequence information obtained from the bacterial clone found in the BAC library (AC135592 [GenBank] ) was confirmed by the sequence information obtained from the TAR isolate (AY207046 [GenBank] ). Analysis of the sequences revealed the presence of two blocks of telomeric repeats and minisatellites that are known to be unstable in E coli. Such a sequence is presumably a cause of inefficient recovery of the gap in E. coli.
For other gaps, discrepancies were observed between inserts propagated in E. coli and yeast cells. For GAP1, a fosmid clone and the TAR isolate in a BAC form were sequenced. Comparison of the two sequences revealed a great difference in one of the minisatellite regions. This minisatellite is located in intron 8 of the gene SCK1/SLI, spanning GAP1. The size of the minisatellite in the fosmid clone is For GAP2, two clones were also sequenced. One of the clones was identified in a BAC library. Another clone was isolated by TAR and retrofitted into BAC. Because transfer of the retrofitted YAC/BAC into E. coli resulted in deletions, three subclones of the BAC were shotgun-sequenced. Comparison of sequences of these subclones revealed that each clone has overlapping deletions in the same region (Fig. 4) and is highly enriched by Alu repeats (33 Alu copies per 11-kb sequence). The gap sequence was reconstructed using the sequences of the deleted BACs. The sequence obtained matched the sequence present in the BAC clone AC136469 [GenBank] , identified by screening the BAC library.
Because no clones with GAP3 sequence were identified by screening the additional BAC libraries, the only clones used for closing the gap were the yeast YAC clones obtained by TAR cloning (AC140008 [GenBank] ). Similar to clones with GAP1 and GAP2 sequences, GAP3 TAR isolates revealed instability during transfer to E. coli cells. For this reason, three BAC subclones of the GAP3 isolate were shotgun-sequenced. The sequencing showed that the GAP3 BAC clones were rearranged in multiple configurations during growth, prohibiting the development of a sequence contig. Most rearrangements are presumably due to a large block of TGG repeats that is known to be unstable in E. coli cells (Pan and Leach 2000
All of the Gap Sequences Are a Part of Gene-Encoding Regions It is noteworthy that all gap sequences analyzed span gene-encoding regions. Analysis of the GAP2 region revealed the presence of EST (BG705726 [GenBank] ) encoding the hypothetical protein (HSPC240) expressed in CD34+ hematopoietic stem/progenitor cells. The EMR3 gene, encoding human EFG-like module-containing mucin-like receptor (Stacey et al. 2001
Sequence analysis of the entire SCK1/SLI gene allowed the determination of 12 blocks of tandem repeats (Fig. 5). The degree of polymorphism of these minisatellites was examined using diagnostic PCR primers (Suppl. Table A1) in human DNA samples isolated from 103 unrelated individuals as well as in the TAR YAC clone and DNA isolated from the hybrid cell line containing a single human chromosome 19. The results revealed 10 blocks of variable minisatellites (VNTRs) and two blocks that contained nonpolymorphic minisatellites (TR6 and TR7; Table 3; Suppl. Fig. A1). For the VNTR1 minisatellite in intron 10 of SCK1/SLI, nine alleles ranging in size from 130 bp to 615 bp in length, corresponding to two to 11 copies of the repeat, and a degree of heterozygosity of 0.746 were recovered. The most common allele had 10 repeats. VNTR2, VNTR3, and VNTR4 are located within intron 9 of SCK1/SLI. For the VNTR2 minisatellite, seven alleles ranging in size from 400 bp to 680 bp in length, corresponding to 10-18 copies of the repeat and a degree of heterozygosity of 0.717, were recovered. The most common allele had 16 repeats. Five alleles of VNTR3 range from 13 to 19 repeats. The most common allele had 19 repeats and a degree of heterozygosity of 0.671. Four alleles of VNTR4 ranging from 50 to 63 repeats, with 51 repeats for the most common allele and a corresponding degree of heterozygosity of 0.186, were found. SCK1/SLI includes six additional variable minisatellites, that is, VNTR5 located in intron 8 and VNTR8, VNTR9, VNTR10, VNTR11, and VNTR12 located in intron 1. VNTR5 has 27 alleles with 51 repeats for the most common 3.0 kb allele and a degree of heterozygosity of 0.94. VNTR8 has two alleles with 17 repeats most common. VNTR9 has three alleles with 8 repeats most common, VNTR10 has four alleles with 12 repeats most common. VNTR11 has four alleles with 45 repeats most common, and VNTR12 has three alleles with 17 repeats most common (Table 3; Suppl. Fig. A1). The degree of heterozygosity was 0.1, 0.279, 0.351, 0.076, and 0.039, respectively (Suppl. Table A2). The repeats within one VNTR diverge by
Eleven families were selected for segregation analysis of VNTRs in the SCK1/SLI gene. Blood was collected from grandparents, parents, and one to three children from each family. Hereditary segregation of 10 VNTRs was traced for two generations in nine families and three generations in two families. In most cases, alleles of VNTR1, VNTR2, VNTR3, VNTR4, VNTR5, VNTR8, VNTR9, VNTR10, VNTR11, and VNTR12 could be identified and their transmission traced from parent to child. The results showed that these VNTRs are subject to Mendelian inheritance (i.e., children carried one VNTR allele from each parent). New VNTRs were not observed during this analysis (Suppl. Table A2). Thus, these 10 VNTRs in the SCK1/SLI gene are meiotically stable and could potentially be used as DNA typing markers to follow meiotic segregation of SCK1/SLI alleles. The individual differences in minisatellite lengths of the SCK1/SLI gene may result in differences in expression pattern. Sequence analysis of the SCK1/SLI gene revealed that the minisatellites contain specific cis-regulatory elements/domains that may interact with transcription factor proteins such as HRE1, ZF87, XPB1, GATA3, KE1, REX, NF-KB, uE4, and ETS, which are involved in region-specific expression. It is also possible that changes in DNA conformation due to the repetitive nature of the minisatellites might influence gene transcription. It should be also noted that, although the most striking individual differences are in the length of minisatellites, there may be also differences in their sequence, such as single-base mutations, which could also contribute to the variability in expression.
Chromosome 19 is among the smallest and gene-dense human chromosomes, spanning 64 Mb and estimated to contain 1760 genes. Sequencing and assessment of the chromosome sequence were performed by the Joint Genome Institute (JGI) and the Stanford Human Genome Center and relied almost exclusively on cosmid and BAC libraries. In general, this approach has been extremely successful. However, as the Human Genome Project drewto a close, there were four regions of the chromosome 19 that were not spanned by sequenced BAC clones. Because these regions were not identified in five different BAC and cosmid libraries, they were referred to as type 3 gaps.
In this work, we demonstrate that closing the gaps can be achieved by a combination of two strategies, that is, screening of newBAC and fosmid libraries and selective TAR cloning in yeast. The opportunity to compare the clones isolated in different hosts allowed us to determine the structure of the missing genomic segments. Sequence analysis of the chromosome 19 gap isolates revealed at least three types of sequences that could destabilize the corresponding inserts during cloning in microbe hosts. Two gap regions contained large blocks of micro- and/or minisatellite repeats. Another gap region was highly enriched by Alu repeats. In the fourth clone, a large block of TGG trinucleotide repeat was detected. We showed previously that regions containing AT-rich blocks are also unstable in BAC vectors (Kouprina et al. 2003
The fact that some human DNA sequences, including unique genes, are unstable and even unclonable in E. coli (Kouprina et al. 2003
All four chromosome 19 gaps are mapped to regions corresponding to genes. The GAP1 region corresponds to the neuronally expressed Shc adaptor homolog SCK1/SLI (Kojima et al. 2001
The fact that gaps in the human genome may correspond to chromosomal regions encoding functional genes emphasizes the importance of the final step of genome sequencing. There are still
Construction of the TAR Vectors and Cloning by In Vivo Recombination in Yeast The TAR circularizing vectors, pVC-GAP1, pVC-GAP2, pVC-GAP3, and pVC-GAP6, containing one unique targeting sequence and an Alu repeat as the second targeting sequence were constructed as follows. Either a 187-bp Alu XbaI-BamHI (for pVC-GAP3) or ApaI-XhoI (for pVC-GAP1, pVC-GAP2, and pVC-GAP6) fragment was inserted into a polylinker site of pVC604 (CEN6-HIS3; Kouprina and Larionov 1999 2 µg of genomic DNA isolated from a human/hamster monochromosomal somatic cell hybrid UV5HL9-5B, 1 µg of vector, and 8 x 108 spheroplasts were used (Leem et al. 2003
Yeast and Mammalian Cell Culture
Isolation and Physical Analysis of YAC and BAC Clones
Genomic Libraries Used to Recover Gap Sequences
Sequencing
Analysis of Polymorphism in the SCK1/SLI Gene Minisatellites
This research was partially supported by the Biological and Environmental Research Program (BER), U.S. Department of Energy, Interagency Agreement No. DE-AI02-01ER63079. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1929904. Article published online before print in January 2004.
5 Corresponding author. [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to DDBJ under accession nos. AC140008 [GenBank] , AY207046 [GenBank] , and AY345879 [GenBank] .]
Benson, G. 1999. Tandem repeats finder: A program to analyze DNA sequences. Nucl. Acids Res. 27: 573-580. Bigger, B., Tolmachov, O., Collombet, J.M., and Coutelle, C. 2000. Introduction of chloramphenicol resistance into the modified mouse mitochondrial genome: Cloning of unstable sequences by passage through yeast. Anal Biochem. 277: 236-242.[CrossRef][Medline] Carrano, A.V., de Jong, P.J., Branscomb, E., Slezak, T., and Watkins, B.W. 1989. Constructing chromosome- and region-specific cosmid maps of the human genome. Genome 31: 1059-1065.[Medline] Devenish, R.J. and Newlon, C.S. 1982. Isolation and characterization of yeast ring chromosome III by a method applicable to other circular DNAs. Gene 3: 277-288. Gardner, M.J., Shallom, S.J., Carlton, J.M., Salzberg, S.L., Nene, V., Shoaibi, A., Ciecko, A., Lynn, J., Rizzo, M., Weaver, B., et al. 2002. Sequence of Plasmodium falciparum chromosomes 2, 10, 11, and 14. Nature 419: 531-534.[CrossRef][Medline] Glockner, G., Eichinger, L., Szafranski, K., Pachebat, J.A., Bankier, A.T., Dear, P.H., Lehmann, R., Baumgart. C., Parra, G., Abril, J.F., et al. 2002. Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418: 79-85.[CrossRef][Medline] Grimwood, J. and Schmutz, J. 2003. Genomics: Six is seventh. Nature 425: 775-776.[CrossRef][Medline] Hagan, C.E. and Warren, G.J. 1982. Lethality of palindromic DNA and its use in selection of recombinant plasmids. Gene 19: 147-151.[CrossRef][Medline] Kang, H.K. and Cox, D.W. 1996. Tandem repeats 3' of the IGHA genes in the human immunoglobulin heavy chain gene cluster. Genomics 35: 189-195.[CrossRef][Medline] Kim, J.H., Leem, S.H., Sunwoo, Y., and Kouprina, N. 2003. Separation of long-range human TERT gene haplotypes by transformation-associated recombination cloning in yeast. Oncogene 22: 2452-2456.[CrossRef][Medline] Kojima, T., Yoshikawa, Y., Takada, S., Sato, M., Nakamura, T., Takahashi, N., Copeland, N.G., Gilbert, D.J., Jenkins, N.A., and Mori, N. 2001. Genomic organization of the Shc-related phosphotyrosine adapters and characterization of the full-length Sck/ShcB: Specific association of p68-Sck/ShcB with pp135. Biochem. Biophys. Res. Commun. 284: 1039-1047.[CrossRef][Medline] Kouprina, N. and Larionov, V. 1999. Selective isolation of mammalian genes by TAR cloning. In Current protocols in human genetics, 1, pp. 1, 5.17.1-5.17.21. Wiley, New York. . 2003. Exploiting the yeast Saccharomyces cerevisiae for the study of the organization of complex genomes. FEMS Microbiol. Rev. 27: 629-649.[CrossRef][Medline]
Kouprina, N., Annab, L., Graves, J., Afshari, C., Barrett, J.C., and Larionov, V. 1998a. Functional copies of a human gene can be directly isolated by transformation-associated recombination cloning with a small 3' end target sequence. Proc. Natl. Acad. Sci. 95: 4469-4474. Kouprina, N., Campbell, M., Graves, J., Campbell, E., Meincke, L., Tesmer, J., Grady, D.L., Doggett, N.A., Moyzis, R.K., Deaven, L.L., et al. 1998b. Construction of human chromosome 16 and 5-specific circular YAC/BAC libraries in vivo recombination in yeast (TAR cloning). Genomics 53: 21-28.[CrossRef][Medline] Kouprina, N., Leem, S.-H., Solomon, G., Ly, A., Koriabine, M., Otstot, J., Pak, E., Dutra, A., Zhao, S., Barrett, J.C., et al. 2003. Segments missing from the draft human genome sequence can be isolated by TAR cloning in yeast. EMBO Rep. 4: 257-262.[CrossRef][Medline]
Larionov, V., Kouprina, N., Solomon, G., Barrett, J.C., and Resnick, M.A. 1997. Direct isolation of human BRCA2 gene by transformation-associated recombination in yeast. Proc. Natl. Acad. Sci. 94: 7384-7387. Leem, S.H., Londono-Vallejo, J.A., Kim, J.H., Bui, H., Tubacher, E., Solomon, G., Park, J.E., Horikawa, I., Kouprina, N., Barrett, J.C., et al. 2002. The human telomerase gene: Complete genomic sequence and analysis of tandem repeat polymorphisms in intronic regions. Oncogene 21: 769-777.[CrossRef][Medline]
Leem, S.H., Noskov, V.N., Park, J.E., Kim, S.I., Larionov, V., and Kouprina, N. 2003 Optimum conditions for selective isolation of genes from complex genomes by transformation-associated recombination cloning. Nucleic Acids Res. 31: e29. Lovejoy, E.A., Scott, A.C., Fiskerstrand, C.E., Bubb, V.J., and Quinn, J.P. 2003. The serotonin transporter intronic VNTR enhancer correlated with a predisposition to affective disorders has distinct regulatory elements within the domain based on the primary DNA sequence of the repeat unit. Eur. J. Neurosci. 17: 417-420.[CrossRef][Medline] Osoegawa, K., Mammoser, A.G., Wu, C., Frengen, E., Zeng, C., Catanese, J.J., and de Jong, P.J. 2001. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 11: 493-496.
Pan, X. and Leach, D.R. 2000. The roles of mutS, sbcCD and recA in the propagation of TGG repeats in Escherichia coli. Nucleic Acids Res. 28: 3178-3184. Polushin, N., Malykh, A., Malykh, O., Zenkova, M., Chumakova, N., Vlassov, V., and Kozyavkin, S. 2001. 2'-modified oligonucleotides from methoxyoxalamido and succinimido precursors: Synthesis, properties, and applications. Nucleosides Nucleotides Nucl. Acids 20: 507-511.[CrossRef] Razin, S.V., Ioudinkova, E.S., Trifonov. E., and Scherrer, K. 2001. Non-clonability correlates with genome instability: A case of unique DNA region. J. Mol. Biol. 307: 481-486.[CrossRef][Medline]
Schroth, G.P. and Ho, P.S. 1995. Occurrence of potential cruciform and H-DNA forming sequences in genomic DNA. Nucl. Acids Res. 23: 1977-1983.
Stacey, M., Lin, H.H., Hilyard, K.L., Gordon, S., and McKnight, A.J. 2001. Human epidermal growth factor (EGF) module-containing mucin-like hormone receptor 3 is a newmember of the EGF-TM7 family that recognizes a ligand on human macrophages and activated neutrophils. J. Biol. Chem. 276: 18863-18870. Takamatsu, K., Maekawa, K., Togashi, T., Choi, D.K., Suzuki, Y., Taylor, T.D., Toyoda, A., Sugano, S., Fujiyama, A., Hattori, M., et al. 2002. Identification of two novel primate-specific genes in DSCR. DNA Res. 9: 89-97.[Abstract]
http://genome.ucsc.edu/goldenPath/apr2001Traks.html; UCSC. http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html; BLAST genome analysis software. http://www.ncbi.nlm.nih.gov/genome/guide/; NCBI.
Received September 1, 2003;
accepted in revised format November 24, 2003.
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||