|
|
|
|
Vol. 9, Issue 10, 994-1001, October 1999
RESOURCE
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Human genome sequencing is accelerating rapidly. Multiple genome maps link this sequence to problems in biology and clinical medicine. Because each map represents a different aspect of the structure, content, and behavior of human chromosomes, these fundamental properties must be integrated with the genome to understand disease genes, cancer instability, and human evolution. Cytogenetic maps use 400-850 visible band landmarks and are the primary means for defining prenatal defects and novel cancer breakpoints, thereby providing simultaneous examination of the entire genome. Recent genetic, physical, and transcript maps use PCR-based landmarks called sequence-tagged sites (STSs). We have integrated these genome maps by anchoring the human cytogenetic to the STS-based genetic and physical maps with 1021 STS-BAC pairs at an average spacing of ~1 per 3 Mb. These integration points are represented by 872 unique STSs, including 642 polymorphic markers and 957 bacterial artificial chromosomes (BACs), each of which was localized on high resolution fluorescent banded chromosomes. These BACs constitute a resource that bridges map levels and provides the tools to seamlessly translate questions raised by genomic change seen at the chromosomal level into answers based at the molecular level. We show how the BACs provide molecular links for understanding human genomic duplications, meiosis, and evolution, as well as reagents for conducting genome-wide prenatal diagnosis at the molecular level and for detecting gene candidates associated with novel cancer breakpoints.
| |
INTRODUCTION |
|---|
|
|
|---|
Translating problems of human disease into the
language of the human genome requires a unified resource that bridges
DNA sequence through chromosome bands. Such a resource must link the
three types of linear arrays that represent the human genome: database arrays (genetic and physical maps and ultimately DNA sequence), chromosome bands visible in single cells, and ordered clone arrays. Genome maps have been previously either STS-based, with marker order
obtained using a combination of STS-content of large insert yeast
artificial chromosome (YAC) clones, radiation hybrid (RH) mapping, and
genetic mapping (Hudson et al. 1995
; Deloukas et al. 1998
), or
BAC-based, with order obtained at 2-6 Mb through high resolution
mapping by fluorescence in situ hybridization (FISH) with respect to
human chromosome landmarks (Korenberg et al. 1992
). During the course
of these efforts, a strategy to integrate these maps was established.
BACs are well suited for a permanent FISH-mapped and integrated clone
resource in that they represent a stable and easily manipulated form of
cloned DNA and produce bright, well defined signals on metaphase and
interphase chromosome preparations (Korenberg and Chen 1995
). We now
report the construction of a genome-wide array of bacterial artificial
chromosomes (BACs) that is integrated with the cytogenetic, genetic,
and STS maps and characterized for homology to the remainder of the
human genome by FISH.
| |
RESULTS |
|---|
|
|
|---|
Integrated BAC Framework of the Human Genome
One or more BACs were identified for 872 STSs (Table 1) or ~1 STS marker BAC per 3 Mb throughout the genome. The majority of these STSs (642) are simple sequence length polymorphisms (SSLPs) from the human genetic map generated at Genethon (Table 1). The term STS-BAC pair describes the result of an independent experiment in which a BAC is tested and found to be positive with a single STS marker. Because some STSs identified multiple BACs, and in some cases, single BACs contained multiple STSs, the 1021 STS-BAC pairs represent 872 unique STSs and 957 BACs. BACs were not included in the resource when the STS failed to map to a unique position on the Whitehead (MIT) maps or when the STS result and the FISH data were discordant for chromosome assignment. Some discordant pairs were the result of multiple or independent clones occurring in different replicates of the BAC library. Apparent "true" discordance between the genetic and cytogenetic maps was observed for several reasons, including data entry errors, genome duplications, and putative chimeric clones. The 1021 STS-BAC pairs were used to generate integration points between the STSs and the cytogenetic maps (Fig. 1). Of the 957 BACs used to build the integrated map, 804 BACs yielded a FISH signal at only one chromosome band, sub-band, or band border and 153 (16%) BACs gave signals at more than one band. Data tables containing the identity of the BAC clones and the STSs, their respective FISH, genetic, or RH map positions, and subsets located near telomeres, are available at http://www.csmc.edu/genetics/korenberg/korenberg.html and http://www-genome.wi.mit.edu. Clones are available from Research Genetics (Huntsville, AL). Images stored on the optical drive are available by arrangement with the.
|
|
Table 2 displays the number of integration points per
chromosome between the STS map and the cytogenetic map for all
autosomes, the X chromosome, and the pseudoautosomal region of the X
and Y chromosomes. Each point represents a FISH-mapped BAC containing at least one STS. The Y chromosome is under-represented largely because
we did not screen with markers that had been assigned previously on the
Y (Foote et al. 1992
). Chromosome 19 and the X may be under represented
due to a relative lack of SSLPs, to multisite STSs, or to lower BAC
density on these chromosomes. The over-representation of chromosome 21 is due to the enrichment of BACs mapped previously on chromosome 21 in
DNA pools screened (Hubert et al. 1997
; J.R. Korenberg, X-N. Chen, Z. Sun, D. Noya, H. Shizuya, M. Simon, in prep.). Some large
Giemsa-positive bands have only one or no BAC-STS pairs (i.e., 3p12,
4q13, 4q28, and 9q31), and some R bands have none (i.e., 7p22 and
4p16), perhaps both due to a bias in the distribution of SSLPs used or
BACs and genomic organization in these regions. The repeat-containing
regions at 1qh, 9qh, and 16qh are also devoid of STS-BAC pairs,
possibly due to the lack of SSLPs.
|
Reproducibility and Resolution
We have assessed the accuracy and resolution of the integrated STS-BAC maps. For the subset of 28 BACs that were mapped twice because they carry at least two different markers, all pairs mapped to the same band or sub-band. This is a stringent test of reproducibility. Also, the positions of BACs containing 43 previously mapped genes (Table 1) agree with the predicted locations (Fig. 1 and web site table), even with the expected bias in map position that is found at band borders due to the spread of stain or signal. Furthermore, the best estimate of the STS order (generated by the integrated genetic, STS-content, and RH map) correlates well with the cytogenetic positions of the FISH-mapped BACs.
True FISH resolution varies with chromosome length, band size, and
structure and was therefore represented independently for each BAC
(Fig. 1). Higher resolution BAC order is provided by the associated
STSs and their links to the emerging genomic sequence. Therefore,
regardless of standard cytogenetic resolution, these STS-linked BACs
now provide resolution approaching that of the STSs. Visual inspection
shows a small number of markers that map to different subchromosomal
regions in the STS versus the FISH maps, some of which may be the
result of errors in the genetic, RH, or YAC maps (Hudson et al. 1995
).
It is also possible that some STSs are low-copy repeats that identified
nonoverlapping BACs or that some of these BACs may be chimeric.
Alternatively, in the pericentromeric region of chromosomes 2 and 9, the inverted order of STSs versus BACs may reflect variation in the
human population that may now be tested.
The reproducibility of map position within the resource was also examined by using the 102 STSs which identify two or more BACs. In 100 cases (98%), the FISH signals from the different BACs were localized to either the same (92), overlapping (7), or closely adjacent bands (1). Only the BACs from two STSs (D5S477 on 5p and IB665 on 11p) mapped to different locations separated by two bands on the same chromosome. For thirteen of the 102 STSs, different BACs members shared the band containing the confirmed STS site and a second signal was noted at a site on another chromosome (web site table). This points to limitations in precisely localizing some STS markers, possibly due to repeated regions. Repeats notwithstanding, the vast majority of the BACs are mapped to the correct subchromosomal position and therefore these multisite BAC-STS pairs provide links to cryptic human duplications.
The accuracy of the STS and genetic maps (Hudson et al. 1995
) is also
reflected in the subset of BACs containing two or more STSs. BAC
1004A13 located on 1p22.1-21, carries three STSs at the same RH map
position but the genetic map position of one differs by ~4 cM
compared to the other two (129.3 vs. 133.0), suggesting a possible
recombinational not spot. Nonetheless, most STS pairs on a single BAC
were located within 1.0 cM, providing independent physical validation
of the genetic map.
| |
DISCUSSION |
|---|
|
|
|---|
This work yields tools for the investigation of human cancers, as
well as for chromosome structure and behavior. New probes have been
produced for regions corresponding to the ends of the genetic maps
(1000:1 likelihood of order), for example, for 5q, 9p, 10q; to
regions beyond the genetic ends as determined by comparison of genetic
and RH maps; for some markers located on neither the RH nor the genetic
maps (Hudson et al. 1995
); and for the telomeric bands in the regions
of 39 telomeres (Fig. 1; Table 2 and web site table). Distances of the
most distal STS-linked BACs from the ends of the genetic maps are given
in Table 2. The integrated BAC map also places boundaries on the
location of centromeres in the genetic framework maps (Fig. 1). We note
preliminary evidence that suggests genetic interference (observation of
less crossing over than expected from the physical distance) across the
large blocks of pericentromeric and highly repetitive regions of
chromosomes 1 and 9 (Table 2) and across the 2q14.1-21.3 region of
human chromosome 2. This corresponds to the now inactive centromere of
the ancestral chimpanzee fusion chromosome. In contrast, there is no
evidence as yet to support interference across the active centromeres
including chromosome 2 and others (i.e., 5, 7, and 11) for which
closely flanking pericentromeric BACs were identified.
Single BACs that recognize probes lighting up multiple sites in the
genome provide clues to duplicated DNA segments, human variation, and
ultimately evolution. Some of these regions are suspected but not
defined. For example, multisite BACs identify the pseudo-autosomal
regions of Xp22.3 and Yp13.3; BACs mapping to both 16p13.3 and 21q22.2
may represent the region of the olfactory receptors that map only to
these two bands (Trask et al. 1998
). Multiple novel regions with
duplications are defined (web site table; J.R. Korenberg, X-N. Chen, Z. Sun, D. Noya, H. Shizuya, and M. Simon, in prep.). BACs for
D7S2323 in 7q11.2 and D7S2561 in 7p12-13 also
generate signals in chromosome 7q22 and 7q11.2, respectively. These are
regions that contain known homologous sequences, and are members of the
PMS2 pseudogene family (Lengauer et al. 1998
; Bellugi et al. 1999
).
Therefore, the independent STS-linked BACs that reveal either three or
more sites, or second sites defined by independent clones, lead us to
believe that these are regions of homology apart from previously
characterized duplications (Trask et al. 1998
; Eichler 1999
). The STS
linkage in these regions permits the definition of duplicated sequences
and possibly genes as candidates for human genetic and acquired
diseases (Lupski 1998
). The power of our approach derives from the
ability to simultaneously assess the entire genome for homology
directly linked to related genes and sequences.
This STS-linked resource allows a suspected cancer or prenatal
breakpoint to be pinpointed with flanking STS markers to within an
average of ~1.5 Mb (Chen et al. 1998
). Once BAC-STS pairs are verified by independent techniques as reported here, sensitivity and
visibility is provided by the molecular cytogenetic approach and the
marker resolution is provided to these signals by the linkage to the
higher resolution technique (STS, RH, and sequence order), obviating
the necessity for fiber or interphase analyses except to resolve
difficult regions and relationships to chromosome structure.
The STS order of each chromosome can now be made visible in a single
experiment, as illustrated for chromosome 11 in Figure 2. Thirty STS-linked BACs were combined in 16 groups,
each containing 1-3 BACs mapped to single bands, and were labeled and
hybridized using standard direct methods. The resulting chromosomal
image clearly represents the genetic map of chromosome 11. Using this four-color scheme, rearrangements can be defined among the 27 STS
markers by changes in color patterns, both between and within groups.
This analysis is independent of the quality of the metaphase preparation and extends below the limit of cytogenetic detection. Similarly, flanking markers of new genes can be mapped by using FISH
and STS-linked subsets in this resource (Chen et al. 1998
). In addition
to providing order, the clones themselves are a source for additional
markers as well as for the detailed analysis of the genomic regions
containing these markers.
|
Future efforts will focus on generating overlapping and interdigitated Visible Human Genome sets (including about one color-linked BAC marker per 500 kb), employing and integrating these sets with "chip" arrays, filling the gaps in the map, and increasing the density to one per 300-1000 kb.
Beyond Sequence
Within the next 2 years, most of the human genome sequence will be available to supplement the traditional tools of genetic analysis. The integrated BAC resource provides the entry points and physical resources needed for studies of human genetics and human variation, mutation analysis and DNA sequencing, chromosome structure, and gene discovery and diagnosis associated with cancer. To understand the chromosomal rearrangements that underlie many cancers, genomic diseases, and evolution, even the knowledge of all 3,000,000,000 nucleotides of human sequence will not diminish the need for a fully integrated molecular cytogenetic array. In mapping these BACs on chromosomes we have provided a bridge from biomedical problems to the molecular world.
| |
METHODS |
|---|
|
|
|---|
BAC Clone and STS Analysis
A subset of 17,700 clones were identified and pooled from the
Caltech human BAC libraries A (Shizuya et al. 1992
) and B (Kim et al.
1996
), after sampling analyses by PFGE to determine transformation subsets with the larger inserts (data not shown). DNA representing pooled BAC clones was screened by PCR using automated systems at the
Whitehead Institute/MIT Center for Genome Research. Initially, 2600 FISH-mapped clones were screened for STS content. Pools representing an
additional 14,000 random human clones were later added to the screening
process. Initially, only addresses of positive clones were transmitted
between the laboratories, but the high rate of multiple independent
clones present in the same well of the library plate led to ambiguous
or discrepant STS versus FISH data. Therefore, the process was modified
such that only bacterial stocks or DNAs derived from single colonies,
for which PCR had confirmed the STS content, were transferred between laboratories.
Several schemes were used to generate BAC DNA pools for STS content
screening. Initially, a simple two-level pooling scheme similar to one
used for YAC screening at the Genome Center was utilized (Bell et al.
1995
; Hudson et al. 1995
). Subsequently, DNAs were combined using a
five-dimensional pooling scheme that provides redundant PCR information
to increase the yield of addresses from the screens (B. Birren,
unpubl.). The STSs used to screen the BAC DNA pools were selected from
the larger set of STSs mapped to the Whitehead Institute physical map
or the Genethon genetic map (Dib et al. 1996
).
BAC DNA Preparation and FISH Analysis
BAC DNA for FISH was prepared manually or by using an Autogen 740 automated DNA preparation device (Integrated Separation Systems,
Natick, MA). Initially, BAC DNAs were prepared from cultures inoculated
directly from the wells of the multiwell plates identified by the
library screen. BAC DNA prepared manually was treated with RNase A and
extracted with phenol/chloroform prior to labeling (Hubert et al.
1997
), whereas DNA prepared by the Autogen was used directly. DNAs were
labeled by nick translation in the presence of biotin-14-dATP or
digoxigenin-11-dUTP, respectively using a nick translation labeling
kit (GIBCO BRL). DNA samples were mapped in pairs on high-resolution
human metaphase chromosomes by multicolor FISH and reverse banding
(Korenberg and Chen 1995
). For each BAC, 20-30 cells were examined, an
initial map position was assigned, two to three chromosomal images were
captured using a Photometrics Cooled-CCD camera (CH250) and BDS image
analysis software (ONCOR Imaging, Inc., Gaithersburg, MD), and stored
on an optical disc drive. All images were reviewed independently by a
second cytogeneticist to determine final band assignment (400-850
bands) in accordance with ISCN (1995)
. Data were recorded on
spreadsheets and transferred to a four-dimensional relational database
of chromosome bands, constructed according to the levels of resolution
assigned. The resolution is indicated on Figure 1. The FISH image shown
in Figure 2 was performed using a different method. Four different
nucleotides (rhodamine-4-dUTP, coumarin-4-dUTP, Cy5-dUTP, and
fluorescein-11-dUTP) (Amersham Life Science) were used to directly
label a total of 30 DNAs according to the manufacturer's instructions.
One to three DNAs per band were grouped, each group was labeled with
one of four dyes, and all were hybridized simultaneously (20-40 ng of each) with the order of colors designed to allow clear definition of
overlaps. After an overnight hybridization samples were washed briefly.
The first wash was in buffer containing 2× SSC and 50% formamide at
44°C for 5 min, followed by a wash in 2× SSC at room temperature
for 2 min. Chromomycin A3 and distamycin were used for
chromosome counterstaining to simultaneously detect signal and
fluorescent banding (Korenberg and Chen 1995
).
| |
ACKNOWLEDGMENTS |
|---|
This work was supported in part by Department of Energy grants 92ER61402 and 96ER62294 (J.R.K.), National Institutes of Health (NIH) grants HD17449 and HD33113 (J.R.K.), and NIH award HG00098 (E.S.L.). J.R.K. holds the Geri and Richard Brawerman Chair in Molecular Genetics. T.J.H. is a recipient of a clinician-scientist award from the Medical Research Council of Canada. We thank Eric S. Lander for support and encouragement.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
6 Corresponding author.
E-MAIL julie.korenberg{at}cshs.org; FAX (310) 652-8010.
| |
REFERENCES |
|---|
|
|
|---|
Received May 26, 1999; accepted in revised form August 3, 1999.
This article has been cited by other articles:
![]() |
F. I. E. Amarillo and H. W. Bass A Transgenomic Cytogenetic Sorghum (Sorghum propinquum) Bacterial Artificial Chromosome Fluorescence in Situ Hybridization Map of Maize (Zea mays L.) Pachytene Chromosome 9, Evidence for Regions of Genome Hyperexpansion Genetics, November 1, 2007; 177(3): 1509 - 1526. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-H. Yen, C. Lee, H.-S. Liu, and C.-L. Ho A precise and scalable method for querying genes in chromosomal banding regions based on cytogenetic annotations Bioinformatics, September 1, 2005; 21(17): 3469 - 3474. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. K. Anderson, N. Salameh, H. W. Bass, L. C. Harper, W. Z. Cande, G. Weber, and S. M. Stack Integrating Genetic Linkage Maps With Pachytene Chromosome Structure in Maize Genetics, April 1, 2004; 166(4): 1923 - 1933. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kost-Alimova, H. Kiss, L. Fedorova, Y. Yang, J. P. Dumanski, G. Klein, and S. Imreh Coincidence of synteny breakpoints with malignancy-related deletions on human chromosome 3 PNAS, May 27, 2003; 100(11): 6622 - 6627. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Uechi, N. Maeda, T. Tanaka, and N. Kenmochi Functional second genes generated by retrotransposition of the X-linked ribosomal protein genes Nucleic Acids Res., December 15, 2002; 30(24): 5369 - 5375. [Abstract] [Full Text] [PDF] |
||||
![]() |
D Kamnasaran and D W Cox Current status of human chromosome 14 J. Med. Genet., February 1, 2002; 39(2): 81 - 90. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Rogan, P. M. Cazcarro, and J. H.M. Knoll Sequence-Based Design of Single-Copy Genomic DNA Probes for Fluorescence In Situ Hybridization Genome Res., June 1, 2001; 11(6): 1086 - 1094. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. G. Goodyer, G. Zogopoulos, G. Schwartzbauer, H. Zheng, G. N. Hendy, and R. K. Menon Organization and Evolution of the Human Growth Hormone Receptor Gene 5'-Flanking Region Endocrinology, May 1, 2001; 142(5): 1923 - 1934. [Abstract] [Full Text] |
||||
![]() |
S. Zhao A comprehensive BAC resource Nucleic Acids Res., January 1, 2001; 29(1): 141 - 143. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Pujana, M. Nadal, M. Gratacòs, B. Peral, K. Csiszar, R. González-Sarmiento, L. Sumoy, and X. Estivill Additional Complexity on Human Chromosome 15q: Identification of a Set of Newly Recognized Duplicons (LCR15) on 15q11-q13, 15q24, and 15q26 Genome Res., January 1, 2001; 11(1): 98 - 111. [Abstract] [Full Text] |
||||
![]() |
A. M SLAVOTINEK, X.-N. CHEN, A. JACKSON, L. GAUNT, A. CAMPBELL, J. CLAYTON-SMITH, and J. R KORENBERG Partial tetrasomy 21 in a male infant J. Med. Genet., October 1, 2000; 37(10): 30e - 30. [Full Text] |
||||
![]() |
S. Treves, G. Feriotto, L. Moccagatta, R. Gambari, and F. Zorzato Molecular Cloning, Expression, Functional Characterization, Chromosomal Localization, and Gene Structure of Junctate, a Novel Integral Calcium Binding Protein of Sarco(endo)plasmic Reticulum Membrane J. Biol. Chem., December 8, 2000; 275(50): 39555 - 39568. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||