|
Vol. 9, Issue 1, 91-98, January 1999
RESOURCE
"KARIBIN," an Information Resource for Obtaining Genomic Information in a Cytogenetic Band
Jinghui
Zhang,1,2,4
Grace
Shen-Ong,3 and
Jim
Ostell1
1 National Center for Biotechnology Information, National
Library of Medicine, National Institutes of Health, Bethesda, Maryland
20894 USA;
2 Glaxo Wellcome, Research Triangle Park, North
Carolina 27709-3398 USA;
3 National Cancer Institute,
Rockville, Maryland 20892 USA
 |
ABSTRACT |
KARIBIN (http://sgiweb.ncbi.nlm.nih.gov:80/Zjing/yac.html)
is a karyotypic
region-based integrated
information resource that provides a comprehensive view of the
integrated mapping and sequencing data for the human genome. A
cytogenetic band is linked to a genetic or physical location using
fluorescence in situ hybridization (FISH) mapping data. The genetic,
physical mapping data and the sequencing data are integrated using STS
markers positioned on multiple maps. For each cytogenetic band, the
user can obtain the most up-to-date information that includes genetic
and physical maps, human transcript gene map, YAC and PAC/BAC clone
coverage, disease gene phenotype, and high throughput genomic sequences from the major human genome sequencing centers. This information provides a framework for future experiments and may accelerate the
process of disease gene hunting. It is envisioned that other cytogenetic-based information such as chromosome aberrations can be
linked to this framework.
 |
INTRODUCTION |
Disease gene hunting, an up-to-now expensive and time consuming
effort, requires extensive genetic and physical mapping and sequencing
of the target region. For positional cloning of
disease genes and cancer genetics, clinical observations of recurring, nonrandom chromosome aberrations may provide the first indication of
the candidate gene location. One example is cytogenetic studies by in
situ hybridization to partially deleted chromosomes with known
breakpoints in Wolf-Hirschhorn Syndrome (Tranebjaerg et al. 1984 ; Park
et al. 1987 ), which played an important role in the discovery of the
cystic fibrosis transmembrane conductance regulator
(CFTR) gene (Zielenski et al. 1991 ). Many cancer genes are
cloned as genes located in breakpoints involved in recurring, nonrandom
chromosome aberrations associated with specific cancer types (Mitelman
et al. 1997 ). With the completion of the high-resolution genetic maps
(Murray et al. 1994 ; Dib et al. 1996 ), the closure of the physical maps
(Hudson et al. 1995 ; Bouffard et al. 1997 ; Stewart et al. 1997 ), the
ongoing effort of transcript mapping of human genes (Schuler et al.
1996 ) and the onset of high-throughput genomic sequencing (Boguski et
al. 1996 ; Marshall and Pennisi 1996 ), a large amount of comprehensive
genome information pertinent to a candidate region is available.
Effective use of genome mapping and sequencing data can greatly reduce
de novo mapping and sequencing effort, thereby accelerating the process
of identifying the target disease gene.
The wealth of genome mapping and sequencing data is a mixed blessing
for disease gene hunting. The high density of the markers and the
redundancy of the DNA clone coverage provide a high level of resolution
and accuracy in genetic and physical maps. However, finding the
information for a candidate region is a challenging data mining
process. Genetic, physical, and cytogenetic maps were constructed with
different methods, and are most useful when integrated. At present,
there is no uniform, linear relationship that can directly associate a
region on one map to that on another. Furthermore, there are
discrepancies between genetic distance and physical proximity because
of variations in recombination frequency in various chromosomal regions.
Sequence-tagged sites (STSs) have become the most widely used type of
markers in construction of the genetic and physical maps, and a number
of STS markers have been localized on both types of maps. These markers
have been used to establish the correlation of the genetic, physical
mapping data and the sequencing data (J. Zhang, K.M. Chao, and J. Ostell, in prep.). A comprehensive, integrated human genome database
has been constructed, which includes the radiation hybrid (RH) maps
from the Stanford Human Genome Center (SHGC) and the Whitehead/MIT
Center for Genome Research (WI/MIT), the genetic linkage maps from
Généthon and the Cooperative Human Linkage Center (CHLC),
and the yeast artificial chromosome (YAC)-based STS-content maps from
the National Human Genome Research Institute (NHGRI) and WI/MIT.
To make full use of this integrated genome data for disease gene
identification, we developed KARIBIN, (karyotypic
region-based integrated), a World Wide Web (WWW) application that
provides the integrated genome mapping and sequencing data associated
with a karyotypic region. With this tool, researchers can easily
explore the most up-to-date and comprehensive genome data about a
cytogenetic location, which will greatly facilitate target
identification in disease gene hunting. The information presented
includes the genetic, physical, and transcript mapping data, the clone
coverage of YACs, P1 artificial chromosomes (PACs), and bacterial
artificial chromosomes (BACs), known disease gene phenotypes, and
sequencing status of the region.
Genome Data Integration
Genetic, phyiscal, and cytogenetic maps are integrated with an
STS-mediated approach (Fig. 1). Alignments of the
various genetic and physical maps are established using consistent STS
markers positioned on multiple maps. To evaluate the consistency of STS markers, only their orders are taken into account. Distances between the markers are ignored because of the variations in defining marker
distances on different maps. For example, on chromosome 10, D10S578 and
D10S604 are positioned with consistent order on WI/MIT, SHGC,
Généthon, and CHLC maps. However, their distance on the
four maps corresponds to 12%, 4.0%, 1.0%, and 0.6% of the total
length of the chromosome. A framework map was constructed from the
consistent markers, which provides the integration points for
interpolating a region on one map to that on the other. Figure 2 shows
an overview of the integrated genome maps for chromosome 7. The genetic and physical maps are presented in
parallel with their alignments to the consistent map.

View larger version (25K):
[in this window]
[in a new window]
|
Figure 1
Linking a cytogenetic band to the rest of the genome data using an
STS-mediated approach. The input data are shown in light rectangles,
which include the cytogenetic band and the FISH-anchored STS markers.
The output data are displayed in dark rectangles, which include the
OMIM gene map, the sequencing status, the genetic, physical, and
transcript maps and YAC and BAC/PAC coverage.
|
|

View larger version (82K):
[in this window]
[in a new window]
|
Figure 2
Graphic overview of the integrated genetic, physical, and cytogenetic
maps for chromosome 7. Seven maps are displayed from top to
bottom in the following order: the RH map from SHGC, the
integrated RH and YAC contig map from WI/MIT, the map with consistent
markers, the YAC contig map from NHGRI, the genetic map from
Généthon, the genetic map from CHLC, and the cytogenetic
map. Green lines link the consistent STS markers mapped by at least
three groups, and their relative orders are preserved on the consistent
map. The black lines linking a cytogenetic band to STS markers on the
genetic and physical maps represent the whole-genome FISH mapping data
(Green et al. 1994 ). Red lines indicate the inconsistent STS markers.
The yellow polygon highlights the regions on various maps that
correspond to 7p21. The graphic view is a screen dump of the
Chromoscope program (J. Zhang and J. Ostell, unpubl.).
|
|
For chromosomes 1-18, 20, and X, the correlation between the
cytogenetic map and the genetic/phyiscal maps are established using two
sets of genome-wide FISH mapping data (Green et al. 1994 ; Bray-Ward et
al. 1996 ). A cytogenetic band can be linked directly to a genetic or
physical location if it has a FISH-mapped YAC clone that contains a
genetically or physically anchored STS marker. For example, in Figure
2, 7p21 is linked to STS marker AFM200we7 on the Généthon
map as well as the NHGRI map (the NHGRI map uses the alias sWSS1164),
because the marker is present on yWSS1498, a YAC clone mapped to 7p21
by FISH. For maps that cannot be directly linked to a band, such as the
WI/MIT and CHLC maps in Figure 2, their locations are interpolated from
the consistent markers flanking the FISH-anchored STS marker. The size
of a band is estimated from the fractional length based on the
measurements of chromosome idiograms (Report of the Standing Committee
on Human Cytogentic Nomenclature 1985 ). If FISH-mapping data is not
available for a band, such as 1q12 in Figure 3, the
map locations are interpolated from the flanking bands with
FISH-anchored YAC clones (1p13 and 1q22-q23 in Fig. 3). Currently, FISH
mapping data for chromosomes 19, 21, and 22 have not been incorporated.
For these chromosomes, the fractional length of a band is projected to
the framework map, and the consistent markers in the region are used to
compute the corresponding locations on the other maps.

View larger version (80K):
[in this window]
[in a new window]
|
Figure 3
Mapping a cytogenetic band by interpolation. 1q12 is a band with no
FISH mapping data. The yellow polygon highlights the regions that
corresponds to 1q12, which were interpolated from the flanking
FISH-mapped bands 1p13 and 1q22-q23.
|
|
By incorporating the human transcript gene map with the integrated
genetic, physical, and FISH mapping resource, it is possible to anchor
genes on the cytogenetic map even though many of them are not
FISH-mapped. On the transcript gene map, an mRNA or expressed sequence
tag (EST) representing a unique gene cluster (UniGene) was mapped to a
genetic interval defined by two Généthon markers (Boguski
and Schuler 1995 ; Schuler et al. 1996 ). As the connection between a
cytogenetic band and the Généthon map is established, the
genes localized between a pair of Généthon markers can be inferred to map to the cytogenetic region containing those markers. We
use "electronic FISH mapping" to refer to such a process.
DNA Clone Coverage for a Cytogenetic Band
On the physical map, YAC contigs, which were constructed from
overlapping YAC clones ordered by STS-content mapping, represent contiguous DNA coverage of the chromosome. If any of the clones in a
contig was mapped by FISH, the cytogenetic locations of the other
clones can be inferred from their locations in relation to the
FISH-mapped clone. The YAC clone coverage for a cytogenetic band
provides the researchers with a set of candidate DNA clones for disease
gene mapping and isolation. In KARIBIN, we include YAC-based
STS-content maps from WI/MIT (for chromosomes 1-22 and X) and NHGRI
(for chromosome 7) for YAC clone coverage on the chromosome. Because of
variations in their size, there is no one-to-one mapping between a YAC
contig and a cytogenetic band; a large contig may extend over several
cytogenetic bands and a band may contain several contigs. YAC clone
coverage for a query band is derived from three layers of mappings.
First, a cytogenetic band is linked to a location on the physical map,
based on FISH mapping data. The YAC contigs mapped to this physical
location are then identified and their STS-content mapping is examined
to gather the clones that were mapped to the region of interest. YAC
clones with large deletions are excluded if they do not contain any STS
markers in the region of interest.
A YAC contig is normally constructed with a high level of redundancy to
ensure its accuracy and contiguity. For example, on the NHGRI map,
there is an average of 9.9 positive YAC clones per STS marker (Bouffard
et al. 1997 ). For a biologist who is interested in selecting clones for
future experiment, it is important to distinguish high-quality YAC
clones from those that are either chimeric or have deletions. With the
default setting in KARIBIN, only the high-quality clones with
unambiguous mapping results are selected. To determine the mapping
quality of YAC clones, the NHGRI map and the WI/MIT maps were processed
differently because they were constructed from different libraries. The
NHGRI's chromosome 7 map was constructed from YAC clones highly
enriched for chromosome 7 DNA, which are less chimeric than clones
derived from the whole genome library. The original mapping data from
NHGRI is well organized, and the YAC clones with inconsistent mapping
results, such as deletion and chimerism, are grouped separately from
those with consistent mapping results. In KARIBIN, inconsistent YAC
clones on the NHGRI map are excluded by the default option as
low-quality data, because consistent YAC clones alone are sufficient to
represent contiguous DNA coverage. On the other hand, the WI/MIT maps
were constructed from the whole genome CEPH mega-YAC library, which has
a high rate of chimerism (~50%). In addition, the information about
YAC deletions as well as cross mapping to multiple contigs cannot be
accessed directly. To evaluate the mapping quality of the YAC clones in
a contig, we combined the data from the two files on the
Whitehead ftp site
(ftp://ftp.genome.wi.mit.edu/pub/human_STS_releases/july97/), CONTIG2STS and YAC2STS. The former describes the orders of the STS markers on the contigs, and the latter describes the mapping quality of each STS marker on a YAC clone. Based on their STS mapping quality as well as the occurrence of cross-mapping to multiple contigs, YAC clones in each contig are grouped into three categories: consistent, inconsistent, and ambiguos. Low-quality YAC
clones with no STS markers mapped unambiguously to the region of
interest are excluded with the default option in KARIBIN because WI/MIT
has cautioned that ambiguous mapping result be used with extreme care.
If a clone has one or more STS markers mapped unambiguously to multiple
contigs, it is classified as an inconsistent clone because of high
probability of chimerism. However, inconsistent YAC clones are included
in the results because consistent clones alone usually are not
sufficient to provide contiguous DNA coverage.
BAC or PAC clone coverage is calculated using STS markers that were
mapped on a YAC clone as well as a PAC or BAC contig. Currently, the
data are only available for chromosome 7. However, the database is
designed to incorporate BAC/PAC clone information for any of the human chromosomes.
WWW Query Interface
We use 7q31, the cytogenetic band that harbors the gene for cystic
fibrosis, as an example to illustrate how to use KARIBIN to explore the
comprehensive genome mapping and sequencing data. The user can either
type a query band in the dialog box "Search for CytoGenetic
Band," or select any band in one of the 23 human chromosomes from
the pull-down lists (Fig. 4). The checkbox "Anchor Genetic/Physical Maps with FISH," selected by default, allows the
user to choose whether to use the whole genome FISH-mapping data to
link a cytogenetic band to the genetic and physical maps. If not
selected, the fractional length of the query band will be used to
interpolate the corresponding map locations, which is unreliable and
inaccurate because of the variations of marker distances on different
maps. The check boxes in "Retrieve Data" and "Display
Option" allow the user to select the data content as well as the
display format. For YAC clone coverage of a band, the default option
excludes low-quality YAC clones that are either chimeric or have
deletions.
The search results for 7q31, which include genetic and physical
transcript mapping data, disease phenotype and sequencing effort, are
summarized in "Table of Contents" (Fig. 5), a
road map for exploring the vast array of genome information. Clicking on each item shows the details either graphically and/or in the tabular
form with links to the related web sites. The URLs supported in KARIBIN
are summarized in Table 1. Transcript mapping data, OMIM, and sequencing effort in the query band are shown in italics because they are linked to external web sites that will be launched with a separate browser. For 7q31, the link to OMIM shows that cystic
fibrosis is one of the 25 human diseases mapped in this region. The
link to the human transcript map is especially important because it
provides the information about the genes located in a cytogenetic band.
In this example, 7q31 correlates to a region between
Généthon markers D7S2480 and D7S635, and the genes
localized to this region on the transcript map are therefore
electronically FISH-mapped to 7q31. In this case, the CFTR
gene, along with the other 25 UniGene clusters, was mapped to a 9-cM
subregion between D7S692 and D7S648 within 7q31. The result from
electronic FISH mapping is consistent with the experimentally
FISH-mapped cytogenetic loci for the disease gene.

View larger version (69K):
[in this window]
[in a new window]
|
Figure 5
Summary of the genome information related to band 7q31. The chromosomal
locations of the YAC contigs from WI/MIT are based on their RH mapping
coordinates.
|
|
The genetic and physical mapping data, which include STS markers and
YAC clones, are displayed in the same browser. The graphic overview of
the YAC contig coverage displays a region that centers the query (7q31)
band and extends with the half-band distance to the left (7q22) and the
right (7q32-q33) so the user is able to browse information in the
neighboring bands as well. In Figure 5, there are four contigs on the
Whitehead map and nine contigs on the NHGRI map located between 7q22
and 7q33. The picture is a live image map. When clicked, the detailed
graphic view of the STS-content map of the selected YAC contig will be
displayed on a separate browser. Figure 6 shows the graphic view of the
CFTR region on YAC contig Ctg[sWSS26]. A
total of 9 STS markers and 13 YAC clones were mapped for the
CFTR gene on the NHGRI map. If a large contig extends over
more than one band, the graphic view will only display the subregion
corresponding to the query band. In the tabular view of the physical
mapping data (Fig. 7), the YAC clones are ordered by
their locations on the contig. If a YAC clone is experimentally FISH
mapped to a band, such as yWSS1744 on the NHGRI map, it is highlighted
in red for distinction from the computed results. The number of STS
markers on each YAC clone is also listed to facilitate the selection of
the best YAC clones for future experiments. For convenience, the
section "Related Database" lists the web sites that supply the
YAC clones, such as CEPH YAC library, Research Genetics, and NHGRI.
 |
DISCUSSION |
The integrity of the genetic, physical, and FISH mapping data are
crucial to the accuracy of the computational method that selects the
comprehensive genome information associated with a cytogenetic band.
Though we have not analyzed the variations among the different mapping
results systematically, several chromosomes have been reviewed manually
to evaluate data consistency. On chromosome 1, FISH mapping data of 14 CEPH YAC clones containing STS markers on the Généthon map
or the WI/MIT map are used to establish the link between the
cytogenetic map and the other maps with the STS-mediated approach. All
except 725 _e_9, a chimeric YAC clone, have
consistent locations on the WI/MIT YAC contig map. The FISH-mapping results show that 725 _e_9 is mapped to band
1p36. However, the STS mapping results from the Whitehead Institute
show the clone is mapped unambiguously to chromosome 16 though it has
two STS markers mapped ambiguously on chromosome 1. (http://www.genome.wi.mit.edu:80/cgi-bin/contig/yac_info?yac=725_e _9). As the default option in KARIBIN excludes clones with
ambiguous mapping data for YAC coverage, it is no surprise that this
clone does not show up when the query band is 1p36. On chromosome 7, discrepancies of the STS mapping results have been found between the
YAC-content map from NHGRI and the PAC/BAC contig maps from Washington
University. For example, both STS markers sWSS241 and sWSS2538 are
mapped on a 500-kb PAC contig
(http://genome.wustl.edu/gsc/human/Chrom7/Images/ctg1105.gif) constructed by the Genome Sequencing Center in the Washington University. However, on the NHGRI map, the two markers are mapped on
two different YAC contigs (sWSS241 on contig sWSS173 and sWSS2538 on
contig sWSS9) that are estimated to be 28 Mb apart, which is about the
size of three cytogenetic bands. It is not clear whether this is caused
by chromosome duplication or errors in the mapping or data entry in one
of the groups.
Currently, for chromosomes 19, 21, and 22, a cytogenetic band is linked
to the genetic and physical maps using the fractional length approach,
which is a good approximation but not as accurate as the results
derived from the FISH-mapping data. For chromosome 22, a number of BAC
or PAC clones have been FISH-mapped by the Sanger Centre, and we are in
the process of incorporating this data into KARIBIN to establish a more
accurate relationship between the cytogenetic map and the rest of the
genome mapping and sequencing data.
KARIBIN is the first whole-genome integrated data resource that links a
wide range of the genome data to a cytogenetic band. It provides the
crucial connection between clinical studies of chromosomal
abnormalities and biological studies of molecular genetics. Its utility
will be greatly enhanced with the progress of the Human Genome Project.
 |
ACKNOWLEDGMENTS |
We thank W. Yan and E. Green for helpful suggestions to improve the
design and interface of KARIBIN. Special thanks go to Michael Wagner
and Ray Grimaila for critical review of the manuscript.
The publication costs of this article were defrayed in part by payment
of page charges. This article must therefore be hereby marked
"advertisement" in accordance with 18 USC section 1734 solely to
indicate this fact.
 |
FOOTNOTES |
4
Corresponding author.
E-MAIL jz70309{at}glaxowellcome.com; FAX (919) 483-0659.
 |
REFERENCES |
-
Boguski, M.S. and
G.D. Schuler.
1995.
ESTablishing a human transcript map.
Nat. Genet.
10:
369-371[CrossRef][Medline].
-
Boguski, M.,
A. Chakravarti,
R. Gibbs,
E. Green, and
R.M. Myers.
1996.
The end of the beginning: The race to begin human genome sequencing.
Genome Res.
6:
771-772[Free Full Text].
-
Bouffard, G.G.,
J.R. Idol,
V.V. Braden,
L.M. Iyer,
A.F. Cunningham,
L.A. Weintraub,
J.W. Touchman,
R.M. Mohr-Tidwell,
D.C. Peluso,
R.S. Fulton
1997.
A physical map of human chromosome 7: An integrated YAC contig map with average STS spacing of 79 kb.
Genome Res.
7:
673-692[Abstract/Free Full Text].
-
Bray-Ward, P.,
J. Menninger,
J. Lieman,
T. Desai,
N. Mokady,
A. Banks, and
D.C. Ward.
1996.
Integration of the cytogenetic, genetic, and physical maps of the human genome by FISH mapping of CEPH YAC clones.
Genomics
32:
1-14[CrossRef][Medline].
-
Dib, C.,
S. Faure,
C. Fizames,
D. Samson,
N. Drouot,
A. Vignal,
P. Millasseau,
S. Marc,
J. Hazan,
E. Seboun
1996.
A comprehensive genetic map of the human genome based on 5,264 microsatellites.
Nature
380:
152-154[CrossRef][Medline].
-
Green, E.D.,
J.R. Idol,
R.M. Mohr-Tidwell,
V.V. Braden,
D.C. Peluso,
R.S. Fulton,
H.F. Massa,
C.L. Magness,
A.M. Wilson,
J. Kimura
1994.
Integration of physical, genetic and cytogenetic maps of human chromosome 7: Isolation and analysis of yeast artificial chromosome clones for 117 mapped genetic markers.
Hum. Mol. Genet.
3:
489-501[Abstract/Free Full Text].
-
Hudson, T.,
L. Stein,
S. Gerety,
J. Ma,
A. Castle,
J. Silva,
D. Slonim,
R. Baptista,
L. Kruglyak,
S. Xu
1995.
An STS-based map of the human genome.
Science
270:
1945-1954[Abstract].
-
Marshall, E. and
E. Pennisi.
1996.
NIH launches the final push to sequence the genome.
Science
272:
188-189[CrossRef][Medline].
-
Mitelman, F.,
F. Mertens, and
B. Johansson.
1997.
A breakpoint map of recurrent chromosomal rearrangements in human neoplasia.
Nat. Genet.
15:
417-474.
-
Murray, J.C.,
K.H. Buetow,
J.L. Weber,
S. Ludwigsen,
T. Scherpbier-Heddema,
F. Manion,
J. Quillen,
V.C. Sheffield,
S. Sunden,
G.M. Duyk
1994.
A comprehensive human linkage map with centimorgan density.
Science
265:
2049-2054[Abstract/Free Full Text].
-
Park, M., J.R. Testa, D.G. Blair, M. Dean, N.Z. Parsa, and G.F. Vande
Woude. 1987.
-
The CF locus is distal to and upstream from the met protooncogene
transcription unit which is located at 7q31-32. Cytogenet. Cell
Genet. 46: 674-675.
-
Report of the Standing Committee on Human Cytogenetic Nomenclature.
1985.
An international system for human cytogenetic nomenclature (1985) ISCN 1985.
Birth Defects Orig. Artic. Ser.
21:
1-117.
-
Schuler, G.D.,
M.S. Boguski,
E.A. Stewart,
L.D. Stein,
G. Gyapay,
K. Rice,
R.E. White,
P. Rodriguez-Tome,
A. Aggarwal,
E. Bajorek
1996.
A gene map of the human genome.
Science
274:
540-546[Abstract/Free Full Text].
-
Stewart, E.A.,
K.B. McKusick,
A. Aggarwal,
E. Bajorek,
S. Brady,
A. Chu,
N. Fang,
D. Hadley,
M. Harris,
S. Hussain
1997.
An STS-based radiation hybrid map of the human genome.
Genome Res.
7:
422-433[Abstract/Free Full Text].
-
Tranebjaerg, L.,
A. Petersen,
K. Hove,
H. Rehder, and
M. Mikkelsen.
1984.
Clinical and cytogenetic studies in a large (4;8) translocation family with pre- and postnatal Wolf syndrome.
Ann. Genet.
27:
224-229[Medline].
-
Zielenski, J.,
R. Rozmahel,
D. Bozon,
B. Kerem,
Z. Grzelczak,
J.R. Riordan,
J. Rommens, and
L.C. Tsui.
1991.
Genomic DNA sequence of the cystic fibrosis transmembrane conductance regulator (CFTR) gene.
Genomics
10:
214-228[CrossRef][Medline].
Received August 19, 1998; accepted in revised form November 13, 1998.
9:91-98 ©1999 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/99 $5.00

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
|
|