|
|
|
|
Vol. 10, Issue 10, 1561-1567, October 2000
LETTER
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
For those searching for human disease-causing genes, information on the position of genes with respect to genetic markers is essential. The physical map composed of ESTs and genetic markers provides the positional information of these markers as well as the starting point of gene identification in the form of genomic clones containing exons. To facilitate the effort of identification of genes in the region spanning D12S1629 and D12S312, we constructed a high-resolution transcript map with PAC/BAC/cosmid clones. The strategy for the construction of such a map involved utilization of STSs for the screening of the large insert bacterial chromosome libraries and a chromosome 12-specific cosmid library by hybridization. The contig was constructed based on the STS contents of the clones. The resulting high-resolution transcript map of the region between P273P14/SP6 and D12S312 spans 4.4 cM from 66.8 to 71.2 cM of the Généthon genetic map and represents ~2.4 Mb. It was composed of 81 BAC, 45 PAC, and 91 cosmid clones with a minimal tiling path consisting of 16 BAC and 4 PAC clones. These clones are being used to sequence this part of chromosome 12. We determined the order of 135 STSs including 74 genes and ESTs in the map. Among these, 115 STSs were unambiguously ordered, resulting in one ordered marker per 21 kb. The order of keratin type II locus genes was determined. This map would greatly enhance the positional cloning effort of the responsible genes for those diseases that are linked to this region, including male germ cell tumor as well as palmoplantar keratoderma, Bothnian-type, and triple A syndrome. This transcript map was localized at human chromosome 12q13.
| |
INTRODUCTION |
|---|
|
|
|---|
The chromosome 12q13 (ch12q13) region of the human chromosome has
been implicated in many diseases as well as various
chromosomal anomalies. The region is linked to numerous inherited
genetic diseases including keratin-associated diseases, triple A
syndrome (Weber et al. 1996
), persistent Mullerian duct syndrome, type II (Imbeaud et al. 1995
), and hereditary hemorrhagic telangiectasia 2 (HHT2) (Johnson et al. 1996
). In addition, chromosomal aberrations such
as deletion and translocation of this region are associated with male germ cell tumor (Murty et al. 1992
) and
B-cell lymphoma (R. Chaganti, pers. comm.), respectively.
Although the molecular etiologies of a few of these diseases have been
characterized, those of others such as triple A syndrome, male germ
cell tumor, and palmoplantar hyperkeratoderma are poorly understood.
The identification and isolation of the genes responsible for these
diseases is an essential step toward the understanding of the molecular
defects of the diseases.
Positional cloning has been used efficiently to isolate disease-causing
genes. Transcript maps have been instrumental in the positional cloning
of numerous genes responsible for various genetic diseases such as
cystic fibrosis (Rommens et al. 1989
), ataxia telangiectasia (Savitsky
et al. 1995
), and Holt-Oram syndrome (Li et al. 1997
). A human
transcript map (Schuler et al. 1996
; Deloukas et al. 1998
) that
provided the location of 30,000 expressed sequence tag (ESTs) along
with a number of genetic markers using radiation hybrid cell panel was
described. However, its usefulness is somewhat limited in that the
relative order of ESTs and genetic markers is not clearly known and
radiation hybrids cannot readily provide DNA inserts to isolate genes.
The way to overcome this limitation is to develop a physical map based
on overlapping cloned fragments of human DNA. Although there is a yeast
artificial chromosome (YAC)-based clone contig map available for the
most of the human genome, such a map has its own limitation that
hinders its use in the gene identification effort. The limitation of a
YAC-based map is caused by the facts that >50% of YAC clones are
chimeric, inserts are occasionally rearranged, and it is difficult to
isolate YAC DNAs from yeast genomic DNA (Nemani et al. 1994
; Slim R et al. 1994
). The most useful map that circumvents these problems is based
on large insert bacterial chromosomes; therefore, this is the map of
choice for sequence-ready maps currently built and the one that most
positional cloning efforts requires. The high-resolution transcript map
based on the large insert bacterial clones can provide the relative
order of all the available transcripts to genetic markers as well as
supply the clones necessary for the gene identification methodology
such as direct sequencing or exon trapping (Buckler et al. 1991
).
Here we describe the sequence-tagged sites (STS) clone contig map of
2.4 Mb, composed of transcripts for the region spanning D12S1629 and D12S312, which represents 4.4 cM in the
Généthon genetic map (Dib et al. 1996
).
| |
RESULTS |
|---|
|
|
|---|
Construction of STS Content Map
The starting point of the construction of a high-resolution
transcript map for the region spanning D12S1629 and
D12S312 was the YAC-based STS content map we previously
reported (Krauter et al. 1995
). Since that report, we had been
continuously extended and refined the YAC map and it consisted of 34 clones encompassing the region in which 58 STS markers were ordered
(Fig.1A). These previously mapped STSs
served as a framework in generating bacterial clone contig map for the
region. Twenty of 58 STSs on the YAC map were used for screening of the
large insert bacterial chromosome libraries, 12X BAC (6X RPCI 11 BAC
library and 6X CIT-HSP library) and 6X PAC (RPCI 1 PAC library)
libraries, as well as 6-8X chromosome 12-specific cosmid library
(LL12NCO1). In addition, some ESTs that were localized at the ch12q13
region from human transcript maps (Schuler et al. 1996
; Deloukas et al.
1998
) and genome database were also used for the screening. ESTs longer
than 150 bp were used in screening for efficient hybridization. Once
the clones with positive signals for labeled probes were identified
from screening, true positive clones were confirmed by PCR-based
analysis for the markers used for screening. Bacterial artificial
chromosome (BAC) and P1-derived artificial chromosome (PAC) DNAs were
prepared and pooled into individual pools of 12 clones, which in turn
were pooled into a superpool. The superpool was tested for the presence of clones containing markers by PCR and at this point, any markers that
failed to recognize clones were set aside for a subsequent hybridization. Testing individual pools followed by individual clones
of the positive pools identified the clones containing markers used for
screening.
|
The STS content of each clone was determined by hierarchical screening of DNAs with all the available STSs including genetic markers. Initially, DNAs were pooled randomly and screened with a number of STSs including the genetic markers and ESTs that were not used in hybridization. Screening of the individual clones in the positive pools followed identification of positive pools with STSs. Any clones that were negative for STSs by PCR were discarded. Initial screening resulted in several minicontigs. When the minicontigs were formed, DNAs were repooled based on their relationships to each other to minimize the number of futile PCR reactions and the linking process began.
The initial step to link minicontigs involved the generation of new
STSs from clone ends. To generate new STSs, the ends of clones located
at the ends of minicontigs were specifically amplified by the
vectorette PCR method and sequenced (Riley et al. 1990
). In some cases,
direct sequencing of the clone ends was performed. The sequences were
analyzed for the presence of repetitive sequences or any homologous
sequences in public databases. New STSs, defined by a pair of primers,
were generated from the unique sequences and used for further mapping.
First, the clone-end STSs were used to screen adjacent contigs. When
these STSs failed to identify any positive clones from the adjacent
contigs, they were used to isolate new sets of clones. This process was
repeated until the contig was completed.
We used cosmid library mainly for the regions where finer mapping was required, for example, keratin type II locus. Although initially we used PAC library for screening, it was discontinued in favor of BACs. We used a total of 46 probes for library screening and overall screening efficiency was ~40% for PAC and BAC libraries. To complete the map, we generated and placed 24 new clone-end STSs and in addition, we also generated numerous gene-specific ESTs (sequence data submitted to dbSTS; accession nos. G64226 through G64265).
Characteristics of the STS Content Map for the Region Spanning D12S1629 and D12S312
The clone contig transcript map we report represents the region encompassed by P273P14/SP6 proximally and D12S312 distally (Fig. 1, provided as insert). This region of 4.4 cM from 66.8 to 71.2 cM of Généthon genetic map was represented by 217 clones in a single contig. The map consists of 81 BACs, 45 PACs, and 91 cosmids. A total of 135 STSs have been ordered in the contig, which included 22 genetic markers, 74 genes/ESTs, and 39 random genomic markers. Of 39 random genomic markers, all but two are clone-end STSs and 22 were newly developed in the present study. In addition, two clone-end STSs, 800J22/T7 and P216N19/SP6, turned out to be gene-specific STSs for KIAA0165 and ATP5G2, respectively.
This map contains 22 genetic markers including 14 derived by
Généthon 3 by us, two by Utah, two by the Cooperative Human Linkage Center (CHLC), and one by Marshfield. All these markers were
mapped in separate genetic maps and we were able to map all of them
onto a single physical map resulting in complete integration of all of
the markers. On the basis of the most recent Généthon genetic map (Dib et al. 1996
), the genetic positions of the markers D12S359 and D12S1618 at 69.9 cM are not concordant
with their physical positions, which are placed between D12S96
and D12S1651, both at 70.6 cM. The order of these markers in
our map is comparable to not only our YAC map but also the YAC-based
physical map generated by the Whitehead Institute, Massachusetts
Institute of Technology (WI) genome center. Thus, it is likely that
D12S359 and D12S1618 were misplaced in the genetic
map. There are two more genetic markers, D12S1633 and
D12S1635 at 66.8 cM and three, D12S1707, D12S1622, and D12S1724 at 71.2 cM placed in the
Généthon genetic map. We investigated the presence of these
markers in the contig and found that these markers were not present,
suggesting that they are outside the physical limits of the map. This
result was consistent with the YAC-based physical map (Fig. 1).
The ch12q13 region has been known to be one of the most gene-rich regions (GeneMap 1999). In addition to the gene-specific STSs we developed, we also placed a number of ESTs we gathered from public databases, mainly from the National Center for Biotechnology Information (NCBI) Unigene cluster and human transcript maps. The ESTs were first tested on the superpool to determine their presence in this contig and positive ESTs were mapped by hierarchical screening of the clones. Almost all of the ESTs placed in the contig are members of EST clusters. The mapped 74 genes/ESTs correspond to 66 unique clusters and 39 of them correspond to known genes. Thus, on average, there is a gene every 28 kb in this region. With the exception of the keratin type II locus and HOXC locus, where homologous genes were clustered, the most gene-rich region in this contig is the region between D12S96 and D12S1651 where 34 independent genes/ESTs were mapped on ~575 kb suggesting a density of 1 gene per 17 kb.
Recently, several keratin type II genes (Takahashi et al. 1995
) and
hair cortex keratin genes (Rogers et al. 1997
; Bowden et al. 1998
) were
cloned and sequenced. We determined their relative positions in this
contig. The order of these genes was determined unequivocally using
cosmids and it is HB1-HB6-HB3-HB5 for hair keratin genes.
The first three genes are placed in a single cosmid insert indicating
that they are very closely located. The order of KRT6 genes is
KRT6B-KRT6C/6E-KRT6D-KRT6A and first four of these genes
are located in a single cosmid. Considering an average insert size of
cosmid clones in this library (40 kb), this region must be one of the
most gene-rich regions in the human genome (1 gene per 10 kb).
In two cases, we were able to determine the transcriptional orientation of the gene with respect to the centromere by mapping 5'- and 3' end-specific ESTs of genes to different clones. Mitochondrial ATP synthase c subunit (ATP5G2) mapped between D12S1618 and D12S1651, and is transcribed toward the centromere. ALK4 mapped between D12S1677 and D12S1712, and is transcribed toward the telomere.
A few genes mapped in this contig, such as elastase IIA (ELA1;
Davis et al. 1995
), keratin type II locus (Yoon et al. 1994
), AMHR
2 (Visser et al. 1995
) and HOX C cluster (Cannizzaro et
al. 1987
) have been cytogenetically mapped to ch12q13 by fluorescence in situ hybridization (FISH) analysis. Therefore, this contig is
localized at ch12q13.
The minimal tiling path based on the STS content for the region consists of 16 BAC and 4 PAC clones (boxed clones in Fig. 1). We determined the size of these clone inserts by pulsed field gel electrophoresis as shown in Figure 2 and the total size was measured as 3195 kb. The number of markers present in the minimal tiling path is 135 with 45 of them overlapping, suggesting an overlap of 25%. Therefore, the size of the contig at 75% is estimated to be 2396 kb. On the basis of this estimate, the marker resolution of this contig is assessed to be one ordered marker per 21 kb (2396 kb per 115 ordered markers). In the latest version of the Généthon genetic map, the genetic distance of this region is determined as 4.4 cM. Thus, 545 kb represents ~1 cM in this region.
|
The clone coverage obtained for the markers in the map ranged from 2 (B800J22/SP6) to 24 (262B3/T7). The average number of clones that were positive per marker of the contig map was determined by dividing the total number of positive PCR results with the total number of STSs in the map. The average number of clones per marker was 8.8. In contrast to the region between C172F9/T7 and D12S1712, we could identify only a few BAC clones representing the region between C115A8/T3 and B847N12/SP6 despite repeated efforts to identify more BAC clones representing the region. This result may reflect regional differences in the representation of the clones in the BAC library.
The accuracy of the map can be assessed in different ways. The clone depth/marker provides an indication of the quality of the map. The fact that in our map the average depth is 8.8 provides high confidence of its accuracy.
| |
DISCUSSION |
|---|
|
|
|---|
There are two ways of determining the overlap of sequences in
constructing a clone-contig map. One is based on restriction enzyme
maps or fingerprints and the other is STS content of clones. Although
maps using relatively small insert clones such as cosmid and phages
have been generated based on fingerprints (Stallings et al. 1990
; Soeda
et al. 1995
), most of the clone-based human genome maps are STS content
maps. In constructing physical maps, the advantages of a STS content
map lies in the rapid analysis of the STS content of the clone by PCR
and simplicity of data analysis. In addition, it allows the use of
previously generated information of the low-resolution maps.
In the present work, STS content mapping was important in recognizing false-positive clones. With the fingerprinting method, it would have been difficult to assemble the contig due to high numbers of false-positive clones in screening. The false-positive clones could have resulted from the probes containing sequences homologous to gene families or functional domains that can be recognized by other genes. The availability of a large number of ordered STSs should facilitate the assembly of genome sequences in this region.
During the construction of the high resolution map, we encountered two types of gaps. One class was not a true gap because two sets of clones were overlapping but we did not have a marker to connect the two sets of clones. This was the case for the regions between D12S347 and C172F9/T7, RARG (G06677) and PFDN5, and D12S886 and HOXC9. Three clone-end STSs were necessary to establish the sequence overlap. The three of the four gaps were this type of gap. This was likely the result of the abundance of probes (46) that were used for screening. The other type of gap was a real gap without any clones representing the region. Rescreening of the library with clone-end STS, P288H2/T7, and STS content mapping of the new clones was required to fill the gap between P375J22/T7 and D12S297. These clone-end-specific STSs proved extremely useful in the identification of new sets of clones as well as making connections to the other contigs.
Although the order of the markers in our contig map was comparable to that in YAC map generated by the WI genome center, it is not identical. The discrepancy could have originated from the differences in clones used for mapping (YAC vs. bacterial clones) and the resolution of the markers. Considering the lower integrity of YAC inserts in comparison to BAC and PAC inserts, it is highly likely that the order of the markers in our high-resolution contig map is more accurate than that of the YAC map generated by the WI genome center.
The minimal tiling path of our high-resolution contig was solely determined by STS content analysis. For sequencing purposes, it is important to establish that the clones represent genomic DNA before sequencing. High redundancy of the clones representing most of the regions indicates that the clones accurately represent genomic DNA. In addition, fingerprinting analysis will be used to ascertain that the clones represent the genomic DNA as well as to establish the minimal overlap between clones. The clones covering a major part of this region were already fingerprinted by the genome sequencing center at Washington University (http://genome.wustl.edu/gsc/) and accurately represent genomic DNA. The fingerprinting of the clones for the rest of the region is under way by the Einstein genome center at Albert Einstein College of Medicine (http://sequence.aecom.yu.edu/chr12/).
The ch12q13 region is of interest because genes for numerous human
diseases have been placed or genetically mapped to this region (Table
1). Among the genes placed in the contig,
ALK1 was shown to be responsible for hereditary hemorrhagic
telangiectasia type 2 (Johnson et al. 1996
), AMHR2 for persistent
Mullerian duct syndrome, type II (Imbeaud et al. 1995
), and keratin
type II genes for various genetic diseases. These keratin-associated
diseases include ichthyosis bullosa of Siemens (Rothnagel et al. 1994
), monilethrix (Healy et al. 1995
; Winter et al. 1997
), epidermolysis bullosa simplex (Lane et al. 1992
), epidermolytic hyperkeratosis (Letai
et al. 1993
), Meesmann corneal dystrophy (Irvine et al. 1997
),
pachyonychia congenita, Jadassohn-Lewandowsky type (Bowden et al.
1995
), palmoplantar keratoderma, Bothnia type (Lind et al. 1994
;
Kelsell et al. 1995
), white sponge nevus (Rugg et al. 1995
). Among
these disorders, the genes for all but one, palmoplantar keratoderma,
Bothnia type, were identified. Many keratin genes placed in this region
are good candidates for palmoplantar keratoderma, Bothnia type. In
addition to palmoplantar keratoderma, Bothnia type, the triple A
syndrome is also linked to this region, yet the gene is not found.
Triple A syndrome or Allgrove's syndrome is an autosomal recessive
disorder that includes adrenocorticotropic hormone (ACTH)-resistant
adrenal insufficiency, achalacia, and alacrima and is associated with
progressive neurological and autonomic dysfunction (for review, see
Huebner et al. 1999
). Weber et al. (1996)
determined linkage of triple
A syndrome to the region between D12S1629 and D12S312 with the highest
lod score of 10.8 at D12S368. The current transcript map completely
encompasses this region and ESTs in the map would be considered as a
candidate gene for the syndrome. The involvement of the central,
peripheral, and autonomic nervous systems, as well as the endocrine
system and the progressive course of the disease in some patients is
suggestive of a defective neuromodulator or neurotrophic factor showing
developmental and tissue-specific patterns of expression. Alpha subunit
of voltage-gated sodium channel, type VIII (SCN8A) is the
homolog of the murine Scn8a gene, mutated in motor end-plate
disease (Kohrman et al. 1996
), thus a strong candidate for triple A
syndrome.
|
In the contig, 74 ESTs including 39 genes are interspersed among clone-end STSs and genetic markers. In addition, 27 independent ESTs, whose functions are unknown, are currently placed in this contig. The precise order of these genes and ESTs were provided from the STS content map in the present report. The absence or presence of STSs in the clones composing the contig determined the order of these markers. The order of most STSs was determined unequivocally, although for a few STSs it was not (depicted by brackets). The precise order of these genes and ESTs relative to the genetic markers will greatly facilitate identification of genes responsible for triple A syndrome and palmoplantar keratoderma, Bothnia type.
| |
METHODS |
|---|
|
|
|---|
Identification of Clones Containing DNA Markers and Construction of STS Content Map
Most primers were purchased from Research Genetics, Inc. or from
various commercial sources. STSs from a YAC-based map (Krauter et al.
1995
) and available STSs from databases in the public domain were
collected for the mapping. STSs were PCR amplified using human DNA or
YAC DNAs as templates and PCR products purified using PCR product
purification kit (Qiagen, Germany). DNAs were labeled with
[32P]dCTP by random labeling method and used to screen 6X
genome equivalence of RPCI 11 BAC library (Osoegawa et al. 1998
), 6X Caltech BAC library, 6X RPCI 1 PAC library (Ioannou et al. 1994
), and
6-8X LLNCO12 cosmid library (Montogomery et al. 1993
) by Southern blot
method, as described previously (Church and Gilbert 1984
). RPCI
libraries were purchased from Rosewell Park Cancer Institute (Buffalo,
NY). BAC filters and clones from Caltech library were gift from Dr. S. Choi (Caltech, CA). BAC, PAC, and cosmid DNAs were prepared by the
alkaline lysis method and the positive clones were identified based on
the presence of PCR products for STSs used for screening. In addition
to publicly available STSs, gene-specific STSs were also developed
based on the sequences from GenBank using the PRIMER program (version
3, Whitehead Institute, Massachusetts Institute of Technology Center
for Genome Research). STS content of each clone was determined by PCR.
PCR reactions were performed in 20-µl reaction mixtures containing
1-10 ng of template DNA, 1x Taq buffer, 1.5 mM of
MgCl2, 200 µM of dNTPs, 0.2 µM of each primer, and
0.75 U of Taq polymerase (Perkin-Elmer, Inc.). Initial denaturation was at 94°C for 2 min, followed by 30 cycles of 94°C denaturation, annealing for 25 sec and extension at 72°C. Sequence overlap among the clones was determined by the shared STS contents of
the clones.
Generation of STS From Clone End
Clone end was specifically amplified by the method described previously. Briefly, BAC or PAC DNAs were digested with RsaI and ligated with annealed vectorette. Using vector-specific primer and vectorette primer, end-specific PCR product was amplified and subcloned into a vector (Invitrogen Co.). The sequences of vector ends were 5'-ATCTGCCGTTTCG ATCCTCCCGAA-3' for SP6 side and 5'-TCGGTCGAGCTTGA CATTGTAGGA-3' for T7 side. The vectorette primer was 5'-CCGCAAATCGATCTCGAGTCTAGAGTCGAC-3'. The PCR reaction was performed in 30-µL reaction mixtures containing digested and vectorette-annealed DNA, 1x Taq buffer, 1.5 mM of MgCl2, 200 µM of dNTPs, 0.2 µM of each primer, and 1 unit of Taq polymerase (Perkin-Elmer Inc.). Initial denaturation was at 94°C for 2 min, followed by 35 cycles of 94°C denaturation, annealing at 68°C for 30 sec and extension at 72°C for 1 min. Plasmids containing clone-end fragment were subsequently sequenced using an automated sequencer (ALFexpress, Amersham Pharmacia Biotech, UK). Unique primer pair for each end-sequence was developed and used for mapping.
Determination of Clone Insert Size by Pulsed-Field Gel Electrophoresis
BAC DNAs were prepared from 4-ml cultures of single colonies by standard alkaline lysis procedure. DNAs were digested with NotI and separation of the digested DNA was carried out in a 1% agarose gel in 0.5x TBE buffer (45 mM Tris/45 mM boric acid/1 mM EDTA) at 14°C. The electrophoresis was carried out on a BioRad CHEF Mapper apparatus for 20 hr at field strength 6 V/cm with a linear pulse time ranging from 5 to 15 sec. DNAs were visualized with ethidium bromide staining.
| |
ACKNOWLEDGMENTS |
|---|
The authors acknowledge the financial support by the academic research funds from the Ministry of Education, Republic of Korea (F-015709195) for the program year 1997, the Korea Research Foundation (1998-019-F00025) of 1998, and the Korea Science and Engineering Foundation (981-0703-018-2) for the program year 1998.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
5 These authors contributed equally to this work.
6 Corresponding author.
E-MAIL sjkyoon{at}cmc.cuk.ac.kr; FAX (02) 532-0575.
Article and publication are at www.genome.org/cgi/doi/10.1101/gr.142100.
| |
REFERENCES |
|---|
|
|
|---|
Received March 27, 2000; accepted in revised form August 11, 2000.
This article has been cited by other articles:
![]() |
H. Houlden, S. Smith, M. de Carvalho, J. Blake, C. Mathias, N. W. Wood, and M. M. Reilly Clinical and genetic characterization of families with triple A (Allgrove) syndrome Brain, December 1, 2002; 125(12): 2681 - 2690. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Handschug, S. Sperling, S.-J. K. Yoon, S. Hennig, A. J.L. Clark, and A. Huebner Triple A syndrome is caused by mutations in AAAS, a new WD-repeat protein gene Hum. Mol. Genet., February 1, 2001; 10(3): 283 - 290. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||