|
|
|
|
|
Genome Research
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Human genes containing triplet repeats have been demonstrated to
be involved in several neurodegenerative diseases by expansion of the
repeat in succeeding generations. To identify novel genes involved in
such pathologies, we have isolated transcripts containing (CAG/CTG)n repeats using two approaches. First, we
screened 4 × 106 clones representing 10 copies of a
human testis cDNA library using a (CAG)14 oligonucleotide
probe. Among the 910 clones identified, the 243 clones with the
strongest hybridization signal were sequenced partially from 3
or
5
ends. This provided us with 251 partial sequences that grouped
into clusters corresponding to 39 genes, of which 19 represent unknown
species. Second, we selected 203 additional ESTs containing
(CAG/CTG)n repeats representing 121 clusters from
the IMAGE consortium infant brain cDNA library. From these two series
of sequences, we have localized 95 genes on human chromosomes using a
panel of whole genome radiation hybrid (Genebridge 4). These genes are
located on all of the chromosomes except for chromosome X, the highest
density being observed on chromosome 19.
[The sequence data described in this paper have been submitted to GenBank under accession nos. AA065241-AA065346.]
| |
INTRODUCTION |
|---|
|
|
|---|
The human genome contains a large number of short tandem repeats
(also known as microsatellites), including trinucleotide repeats in
stretches of five or more, that have been detected in at least 50 genes
(Riggins et al. 1992
). Expansions of various types of
these trinucleotide repeats have been implicated in genetic diseases.
Although CGG and GAA repeats are expanded in different fragile X
syndromes (Kremer et al. 1991
; Verkerk et al. 1991
; Yu et al. 1991
;
Knight et al. 1993
; Jones et al. 1994
; Nancarrow et al. 1994
; Parrish
et al. 1994
) and Friedreich's ataxia, respectively (Campuzano et al.
1996
), CTG and CAG repeats are involved in a larger series of
pathologies. Amplification of CTG repeats have been described in
myotonic dystrophy (MD) (Aslanidis et al. 1992
; Brook et al. 1992
; Fu
et al. 1992
), whereas amplifications of CAG repeats have been observed
in spinal and bulbar muscular atrophy (SBMA) (La Spada et al. 1991
),
spinocerebellar ataxia type 1 (SCA1) (Orr et al. 1993
), Huntington's
disease (HD) (The Huntington's Disease Collaborative Research Group et
al. 1993), dentato-pallidoluisian atrophy (DRPLA) (Koide et al. 1994
;
Nagafuchi et al. 1994
), Machado-Joseph disease or spinocerebellar
ataxia type 3 (MJD or SCA3) (Kawagushi et al. 1994
), spinocerebellar
ataxia type 2 (SCA2) (Imbert et al. 1996
; Pulst et al. 1996
; Sanpei et
al. 1996
), and spinocerebellar ataxia type 6 (SCA6), the last
identified SCA associated with a CAG expansion (Zhuchenko et al. 1997
).
All of these diseases exhibit instability in transmission of the
expanded repeat from parent to offspring. In some of them, the increase
in repeat size correlates with an increase in disease severity and a
decrease in age of onset or penetrance as established for HD (Duyao et al. 1993
) and MJD (Kawaguchi et al. 1994). In addition, the expansion of the repeats occurs in the transcribed part of the gene; GAA repeats
are located in the intron, CGG and CTG in the untranslated exons, and
CAG in the coding exons of the related gene.
The study of CAG repeats is of special interest for at least three
reasons: (1) As mentioned previously, this type of repeat is involved
in at least six neurodegenerative diseases; (2) CAG repeats are
translated into polyglutamine stretches, domains that are often present
in transcription factors and may function as a polar zipper interacting
with other proteins (Perutz et al. 1994
); and (3) experiments in
Escherichia coli have shown that CAG/CTG tracts are expanded
at least eight times more frequently than any of the other nine
triplets (Ohshima et al. 1996
).
Therefore, the identification and mapping of genes containing CAG
repeats are of importance as CAG repeats represent potential candidates
for diseases that exhibit genetic anticipation (La Spada et al. 1994
),
such as unipolar and bipolar disorders (McInnis et al. 1993
; Engstrom
et al. 1995
; O'Donovan et al. 1995
), autosomal dominant cerebellar
ataxia (ADCA) type I (Durr et al. 1996
) and type II (Benomar et al.
1995
), familial nonspecific dementia (Brown et al. 1995
), and
schizophrenia (Ross et al. 1993
; Bassett and Honer 1994
; Morris et al.
1995
; O'Donovan et al. 1995
; Bowen et al. 1996
), although anticipation
is still questionable for this last disease (Petronis at al. 1996;
Sasaki et al. 1996
).
To aid in the identification of new genes containing CAG/CTG repeats,
we decided to look for transcripts containing such repeats in human
testis. This tissue expresses a large number of mRNAs, many of which
are shared exclusively with nervous tissue, such as neuropeptide
precursors, proenkephalin, or pro-opiomelanocortin (Wolgemuth and
Watrin 1991
), or belong to neurotransmitter biosynthesis (glutamate
decarboxylase) (Persson et al. 1990
). In the present work, the
screening of a human testis cDNA library with a CAG-specific probe
resulted in the identification of 39 CAG-containing genes, 19 of them
corresponding to new genes. In parallel, we analyzed expressed sequence
tags (ESTs) from the IMAGE consortium obtained from infant brain cDNA
clones positive for CAG hybridization. From this analysis, we
collected 121 CAG-containing clusters. From these two pools of
transcripts, we have mapped 95 CAG/CTG-containing genes using radiation
hybrid mapping.
| |
RESULTS |
|---|
|
|
|---|
The strategy for the analysis of sequences containing CAG repeat is outlined in Figure 1.
|
Isolation of Human Testis mRNA-Containing CAG Repeat (Group A)
The screening of 4 × 106 clones from a
human testis cDNA library, using a (CAG)14 as a probe,
produced 910 positive clones. We selected 243 clones eliciting the
strongest signal and sequenced them from their 3
end, as this
region is likely to correspond to a single exon (Hawkins et al. 1988)
and, therefore, is more suitable to derive sequence tag sites (STSs)
for mapping purposes (Hayes et al. 1996
). After the analysis of these
sequences, we rejected all clones lacking a poly(A) tail and we
sequenced the 5
end of the insert when no repeat was detected in
the 3
end sequence or when the sequence in the 3
end was not
informative enough. Of 243 cDNA clones, we obtained 251 sequences from
the 3
or 5
ends that ranged between 200 and 400 bp with an
average size at 380 bp. These sequences have been deposited in dbEST
with AC accession numbers AA065241-AA065346.
These sequences were submitted to two types of analysis. First, they
were compared among themselves to eliminate exact duplicates. This
analysis led us to discard 146 sequences (58%). We kept overlapping sequences as well as identical sequences containing repeats of various
sizes. The remaining 105 sequences (42%) were assembled into 39 independent clusters (group A) (Table 1). Sequences
were incorporated in the same cluster when they exhibited at least 98%
identity in nucleic acid sequence. Second, the sequences corresponding to these 39 clusters were then compared with sequences present in
nucleic and proteic sequence databases using BLAST programs (Altschul
et al. 1990
): 12 clusters corresponded to genes already known in human,
8 clusters were found to be homologous to known genes in human or in
other species, 16 clusters only matched with anonymous ESTs, and 3 clusters did not give any match at all.
|
Analysis of the repeats present in the sequences revealed that 21 clusters exhibit a CAG or CTG repeat located either in 3
or in
5
region of the cDNA. One-third of these repeats contained three
to nine triplets, the remaining two-thirds had between 10 and 20 triplets, and three cases had >20 repeats. In 15 clusters, we
observed insertions of CAA or TTG triplet in the CAG or CTG repeats,
respectively, thus extending the stretch of glutamine. In six clusters,
we observed small insertions of nine bases or less in the CAG repeat.
Two variations in the size of the repeat were observed; 13, 15, and 17 CAG repeats are present in the different cDNAs of the cluster 14, as
well as 10 and 11 CAG repeats in cluster 25. In 18 clusters, we did not
detect any repeat in the partial sequences that were obtained.
Nevertheless, it remains likely that these clones contain CAG repeats.
This statement is based on the fact that 4 clusters among 18, corresponding to already known human genes, identify genes containing
CAG repeats (e.g., monocyte differentiation antigen precursor, myotonic
dystrophy kinase, nucleolar phosphoprotein p130, human 54-kD protein
mRNA). Only one cluster matches to a human gene that contains a
CAG-rich region with no perfect successive repeats (e.g., human XRCC4
mRNA). The complete sequence of these cDNAs will definitely prove the presence of the CAG repeat.
Selection of Human EST-Containing CAG Repeat (Group B)
The IMAGE consortium screened a subset of 40,000 clones from the
normalized infant brain cDNA library 1 (NIB1) of B. Soares, using CAG
oligonucleotides. One hundred eighty-six positive clones likely to
contain CAG were obtained and listed in the IMAGE web page
(http://www.bio.llnl.gov/bbrp/image/itri.htlm). From this series of
clones, 203 sequences (350-500 bp), from either the 3
or 5
region of the insert (mean 1820 bp) were recovered from dbEST. These
sequences were assembled into 121 clusters according to the same
strategy as the one described for the human testis cDNA clones (Fig.
1).
Chromosomal Localization of cDNA Clones and ESTs
The human testis CAG/CTG containing clones and the human ESTs were
localized using a radiation hybrid panel. In this technique, segments
of human chromosomes obtained by X-ray irradiation are rescued in
rodent recipient cells. A linkage distance can be established on the
basis of the scoring of the number of breaks between two loci by
measuring the frequency of coretention (Cox et al. 1990
). In our case
the CAG/CTG-containing clones were mapped by using 90 hybrids from the
Genebridge 4 radiation hybrid panel (Gyapay et al. 1996
). In general,
for each cluster only one sequence located preferentially at the 3
end was retained for primer design. For some infant brain EST clusters,
the localization was achieved by using oligonucleotides derived from
the same cluster of ESTs of the CAG positive clone present in Unigene
(Schuler et al. 1996
).
Of the 39 genes expressed in human testis that we analyzed, 27 (69%) were localized (Table 2), and of 121 EST clusters derived from the brain cDNA library, 68 (57%) were mapped successfully (Table 3). Several clusters could not be localized for different reasons: (1) primers failed to amplify (majority of cases); (2) presence of background from hamster DNA; (3) human bands had a different size from that expected; and (4) sequences were too short for primer determination.
|
|
| |
DISCUSSION |
|---|
|
|
|---|
In the present study we used cDNA sequences obtained by two different approaches to identify and map new human genes containing CAG repeats.
The screening of 4 × 106 clones, equivalent to 10 copies
of a human testis cDNA library, allowed us to identify 910 clones containing CAG repeats. The 243 clones that exhibited the strongest hybridization signals were further characterized. When duplicates were
eliminated, these clones appear to correspond to 39 genes containing
CAG repeats. With respect to the number of plated clones, this
represents roughly 1 in 10,000 (39 in 400,000 for one copy of the
library). This number is lower than the ratio of 37 in 10,000 reported
by Néri et al. (1996)
from a human fetal brain cDNA library, or
28 in 10,000 from a human cerebral cortex cDNA library (Li et al.
1993
), and 7 in 10,000 from another fetal brain library (Riggins et al.
1992
). Such differences might result from the differential expression
of transcripts between testis and brain, but also from differing
experimental conditions. In the series that we analyzed, we selected
clones with intense hybridization signals. In addition, preliminary
tests that we performed on 26 of the remaining clones (667) that gave
low hybridization signals, allowed us to identify new transcripts that
contain 6-11 CAGs. Thus, the population of transcripts that contain
CAG repeats in normal human testis is certainly larger than our initial
observations indicate. In addition, at least in our study, the
intensity of the hybridization signal correlates more or less with the
number of repetitions.
The average size of the CAG/CTG repeats in human testis cDNA analyzed
in this study (strong hybridization signal) was ~13, with at least
30% of the 39 clusters above this value. The different lengths that we
observed are within the same range of repeat numbers usually observed
in normal alleles of disease genes (5-54 trinucleotide repeats). Other
reports, analyzing either genomic DNA or cDNA, mentioned lower numbers
in CAG repeats. Gastier et al. (1996)
, analyzing the
(CAG/CTG)n repeat lengths in 479 unique genomic
clones, observed 30% of the repeats with six triplets, whereas the
repeats with 13 copies represented only 2%. In human fetal brain cDNA,
Néri et al. (1996)
observed only 13 of 88 (15%) clones that
exhibited repeats of size above nine. The larger size repeats that we
observed result from our selection of the clones with the highest
hybridization signal as mentioned above.
In most of the clusters, the CAG repeats were not perfect. First, in 15 clusters, we observed the presence of CAA triplets in the CAG repeat.
This triplet also encodes glutamine and is also present in genes for
HD, DRPLA, SBMA, SCA2, and MJD1. Second, for six clusters, we detected
small insertions with sizes between 3 and 9 nucleotides. For five of
those clusters, the insertion did not change the possible open reading
frame (ORF). Similar insertions were found in the gene of SCA1, SBMA,
and MJD1. As already described for the SCA1 gene, this might contribute
to the stabilization of the repeat length (Chung et al. 1993
). For the
remaining cluster, the sequence was not accurate enough to determine
whether the insertion induces a frameshift in the ORF.
Until now genes with CAG expansions that are likely involved in genetic
diseases have two specific features: (1) the stretch of CAG is
translated into polyglutamine; and (2) in general, this locus is highly
polymorphic. With partial sequencing of cDNA, it is impossible to
predict with accuracy the ORF in which the CAG repetition is inserted.
Therefore, such stretches could be translated as poly(Gln), poly(Ser),
or poly(Ala). As an example, in our clones there is one that
corresponds to the nucleolar phosphoprotein p130 and contains a CAG
repeat coding for a poly(Ser), which is not known to be involved in a
neurodegenerative disease. Nevertheless, one cannot exclude implication
of a CAG amplification in a poly(Ser) or poly(Ala) in genetic diseases.
For example, an expansion of a polyalanine stretch in the
amino-terminal region of HOXD 13 is associated with the synpolydactyly
(Muragaki et al. 1996
). For two genes, we detected some polymorphisms
in the length of the CAG/CTG repeat. We observed a variation of 13 to
17 CAGs for gene 14, and 10 or 11 CAGs for gene 25. This could reflect
allelic mosaisism expression observed previously for this kind of gene in human testis (Zühlke et al 1993
; Telenius et al. 1995
; Zhang et al. 1995
), which may occur in meiosis during spermatogenesis.
From the 39 clusters that we identified, 19 correspond to unknown genes and 20 to already characterized genes. Among this population, we retrieved two of the seven genes already described to be involved in CAG diseases, the MJD1 protein and the myotonic dystrophy kinase, which are the products of genes involved in Machado-Joseph disease and myotonic dystrophy, respectively. This is the first observation that indicates an active transcription of these genes in human testis.
The cloning of cDNAs containing CAG repeats by hybridization is an
efficient method for the detection of new candidate genes, but this
technique is very time-consuming and it is difficult to eliminate
redundancy. As a complementary approach, we screened databases for ESTs
containing CAG. The IMAGE consortium identified by hybridization with a
CAG specific probe a large series of transcripts that are likely to
contain such repeats. The corresponding cDNAs, once characterized by
3
partial sequencing, did not reveal long stretches of CAG, but
larger repeats can be present upstream in the transcripts. This
approach, although less reliable, allows us to screen rapidly a larger
pool of cDNA from various tissues.
Until now, the various studies that have been done to identify new
genes containing CAG repetitions have reported ~100 genes (Riggins
et al. 1992
; Li et al. 1993
; Jiang et al. 1995
; Aoki et al. 1996
;
Néri et al. 1996
). Of these genes, only 17 have been assigned to
chromosomes and 7 of them sublocalized. In the present study, we have
localized the largest group of genes containing CAG repeats. We have
mapped 27 testis cDNAs and 68 EST sequences containing CAG/CTG repeats
using a radiation hybrid panel. All the present localizations agree
with the previous assignments when available.
In our study, CAG-containing genes were found on all chromosomes, except for chromosome X (the chromosome Y was not included in the panel of genome radiation hybrid). The distribution of the CAG-containing genes is not even, the largest number (10) being present on chromosome 19, whereas only one gene was detected on chromosome 2.
Some of the localizations of anonymous sequences that we have found are very close to loci involved in autosomal dominant genetic diseases associated with progressive neuropathy such as Charcot-Marie-Tooth disease type B (3q13-q22), schizophrenia disorder 1 (5q11.2-q13.3), Charcot-Marie-Tooth neuropathy (8q13-q21.1), related 4 and 8 of spinocerebellar ataxia (16q, 10q23.1-24.1), or schizophrenia disorder 4 (22q11). Further studies are needed to investigate whether some of these genes might be related to these genetic diseases.
| |
METHODS |
|---|
|
|
|---|
cDNA Cloning
Poly(A)+ RNA from the testis of a 27-year-old man was isolated
as described previously (Matsuoka et al. 1992
). The cDNA library was
constructed using the Superscript plasmid system and plasmid cloning
kit (Life Technologies, Inc.) according to manufacturer's specifications, except that reverse transcription was done at 42°C
for 1 hr. The cDNA was size-selected on S400 Sephacryl columns, and the
material >700 bp was inserted in an oriented manner into pSPORT1 vector, using NotI-SalI
adaptators. The resulting plasmids, once transfected into XlI
blue, gave 360,000 independent colonies. Ten copies of this library
(4 × 106 clones) were plated onto filters and hybridized
with a 5
-32P-labeled oligonucleotide (CAG)14
in 6× SSC [1× SSC: 150 mM NaCl/15 mM sodium
citrate (pH 7.0)]; 5× Denhardt's solution (1× Denhardt's solution: 0.2% bovine serum albumin, 0.02% Ficoll, 0.02%
polyvinylpyrrolidone); 0.1% SDS; 5 mM EDTA (pH 7.5); and 100 µg/ml of denatured salmon sperm DNA for 16 hr at 42°C. After
hybridization, filters were washed twice in 0.5× SSC with 0.1% SDS
at 65°C for 1 hr and exposed at
80°C to Amersham Hyperfilm
for 16 hr with one intensifying screen.
DNA Sequencing
Plasmid minipreps were performed using a minikit Tip 20 (Qiagen,
Chatsworth, CA) according to manufacturer's specifications. Plasmid
DNA concentrations were adjusted to 250 ng/µl based on absorbance
at 260 nM. Plasmids were sequenced according to Sanger's method using fluorescent dye-labeled primers and cycle sequencing kits
(Applied Biosystems) as described previously (Pawlak et al. 1995
). The
reaction products were analyzed on a 373A automated DNA sequencer
(Applied Biosystems). The sequences were done systematically on the
3
end of the cDNA using SP6 or
21M13 primer and, when necessary, on the 5
end using T7 or M13 reverse primer.
Sequence Analysis
The sequences were edited manually and limited to 400 bp and 2%
ambiguities (N). The redundancy was evaluated by internal comparison of
those sequences using the FASTA program. The sequences were sent to the
National Center for Biotechnology Information (NCBI) for BLASTX and
BLASTN analysis (Altschul et al. 1990
) in the nonredundant nucleic acid
and protein libraries. Sequence similarities identified by the BLAST
programs were considered statistically significant when scores were
>150 and 75 for acid nucleic acid and amino acid sequences,
respectively, or when the Poisson P value was <0.05. The
BLASTX and BLASTN results for each clone were analyzed simultaneously
and processed manually. We always selected the protein match when a hit
was detected with both types of analyses.
PCR for Radiation Hybrid Mapping
Primers for the PCRs were designed using the program as described
by Rychlick and Rhoads (1989), which was adapted to large-scale primer
design. The repeat elements, such as Alu, Kpn, and LINE were
masked first and then the primers were selected according to the
desired criteria. PCRs were performed on DNA obtained from the
Genebridge 4 radiation hybrid panel (Gyapay et al. 1996
). The PCRs were
carried out in a volume of 15 µl. The final concentrations in the
PCR were as follow: 2 ng/µl of DNA, 125 nM dNTP (31 nM of each), 1.33 µM primers (of each), 50 mM KCl, 2 mM MgCl2, 0.1% of Triton
X-100, 0.01% of gelatine, 10 mM of Tris-HCl (pH 9.0) (25°C), and 0.25 units per 15 µl of Taq polymerase. The
samples were overlaid with heavy mineral oil. Amplifications were
performed using the hot start procedure. The first three cycles
consisted of 30 sec of annealing at 61°C and 40 sec of denaturing at
94°C. The annealing temperature was lowered successively by 2°C
for each consecutive three cycles until 55°C, followed by 25 further cycles at an annealing temperature of 55°C. After completion of the
PCR reaction, 4 µl of loading mixture containing 0.1% (wt/vol) bromophenol blue and 50% (vol/vol) glycerol were added to each well.
The PCR products were allowed to migrate on an agarose gel containing
1% SeaKem and 3% NuSieve agarose in TBE buffer with 0.25 µg/ml
ethidium bromide. Then, the images of the gels were recorded with a CCD
camera and scoring of the results was carried out semiautomatically
with the BioImage software developed by Millipore. Typing results were
downloaded into a database. The calculations were performed using the
RHMAP package (Boehnke et al. 1991
). Positioning of the CAG/CTG
containing clones or ESTs were carried out relative to ~1000 evenly
distributed Genethon genetic markers. In the course of the
calculations, the program positioned the ESTs into each interval
defined by the adjacent genetic markers and the probability of this
position was calculated. The highest probability was retained and
considered as the real position of the given locus.
| |
ACKNOWLEDGMENTS |
|---|
We thank Y. Laperche and T. Rohn for the critical reading of the manuscript, R. Derreumaux for excellent technical assistance, and Edith Grandvilliers for her secretarial assistance. This work was funded by INSERM and the Groupement de Recherche sur l'Etude des Génomes.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL bulle{at}im3.inserm.fr; FAX 33-1-48-98-09-08.
| |
REFERENCES |
|---|
|
|
|---|
end of a transcript encoding a protein kinase family member.
Cell
68:
799-808 [Medline].[CrossRef][Medline]Received February 13, 1997; accepted in revised form May 1, 1997.
This article has been cited by other articles:
![]() |
J. M. Hancock, E. A. Worthey, and M. F. Santibáñez-Koref A Role for Selection in Regulating the Evolutionary Emergence of Disease-Causing and Other Coding CAG Repeats in Humans and Mice Mol. Biol. Evol., June 1, 2001; 18(6): 1014 - 1023. [Abstract] [Full Text] |
||||
![]() |
C. Feral, G. Guellaen, and A. Pawlak Human testis expresses a specific poly(A)-binding protein Nucleic Acids Res., May 1, 2001; 29(9): 1872 - 1883. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Jiang, P. A. Kirchman, M. Zagulski, J. Hunt, and S. M. Jazwinski Homologs of the Yeast Longevity Gene LAG1 in Caenorhabditis elegans and Human Genome Res., December 1, 1998; 8(12): 1259 - 1272. [Abstract] [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||