|
|
|
|
Vol. 9, Issue 7, 647-653, July 1999
LETTER
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Microsatellites and minisatellites are two classes of tandem repeat sequences differing in their size, mutation processes, and chromosomal distribution. The boundary between the two classes is not defined. We have developed a convenient, hybridization-based human library screening procedure able to detect long CA-rich sequences. Analysis of cosmid clones derived from a chromosome 1 library show that cross-hybridizing sequences tested are imperfect CA-rich sequences, some of them showing a minisatellite organization. All but one of the 13 positive chromosome 1 clones studied are localized in chromosomal bands to which minisatellites have previously been assigned, such as the 1pter cluster. To test the applicability of the procedure to minisatellite detection on a larger scale, we then used a large-insert whole-genome PAC library. Altogether, 22 new minisatellites have been identified in positive PAC and cosmid clones and 20 of them are telomeric. Among the 42 positive PAC clones localized within the human genome by FISH and/or linkage analysis, 25 (60%) are assigned to a terminal band of the karyotype, 4 (9%) are juxtacentromeric, and 13 (31%) are interstitial. The localization of at least two of the interstitial PAC clones corresponds to previously characterized minisatellite-containing regions and/or ancestrally telomeric bands, in agreement with this minisatellite-like distribution. The data obtained are in close agreement with the parallel investigation of human genome sequence data and suggest that long human (CA)s are imperfect CA repeats belonging to the minisatellite class of sequences. This approach provides a new tool to efficiently target genomic clones originating from subtelomeric domains, from which minisatellite sequences can readily be obtained.
[The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AJ000377-AJ000383.]
| |
INTRODUCTION |
|---|
|
|
|---|
Tandem repeats represent an important proportion
of vertebrate genomes and have been classified as satellites,
midisatellites, minisatellites, and microsatellites according to the
overall length of the entire array. In higher vertebrates,
(CA)n microsatellites are the most numerous, with an
average distance between two microsatellites of ~25 kb (Stallings et
al. 1991
). Ninety percent of human (CA)n microsatellite arrays are <40 bp and <1%-2% are longer than 30 repeats (Weber 1990
). Minisatellites repeat units are usually 10-100
nucleotides long, and the array spans 0.5-100 kb. Chromosomal distribution of minisatellites in the human genome is highly skewed towards telomeres and ancestrally telomeric regions (Amarger et al. 1998
).
The initial classification of minisatellites and microsatellites
has now been strengthened on biological grounds by the demonstration that different modes of evolution operate on these two types of structures. Microsatellites mutate by replication slippage processes because of mispairing between the two strands during replication. They
are stabilized by variant repeats, whose presence facilitates detection
of the slipped strand DNA by the mismatch repair system (Strand et al.
1993
; Heale and Petes 1995
). Minisatellites mutate predominantly in the
germ line (Jeffreys and Neumann 1997
) through mechanisms, including
gene conversion-like events, presumably arising from DNA double-strand
breaks (DSBs), insensitive to internal variations within the tandem
array (Buard and Vergnaud 1994
; Jeffreys et al. 1994
). However, a
number of intermediate situations raise the question of the border
between the mini- and microsatellite classes. For instance, mutation
rates at some minisatellites including MS1 (D1S7) are sensitive to
mismatch repair deficiencies (Hoff-Olsen et al. 1995
) reminiscent of a
microsatellite behavior. At the other end of the spectrum, some human
(CA)n repeats have extremely long alleles, with
internal heterogeneity (Wilkie and Higgs 1992
). Also, the origin of
both classes of tandem repeats is still poorly understood.
Microsatellite arrays may arise by replication errors or as a result of
nonhomologous end-joining repair following DNA DSB events (Liang et al.
1998
), which can create de novo (CA)n > 20
stretches. Unequal crossing-over or replication slippage between
fortuitous short-direct repeats have been invoked to provide the
initial duplication event of some minisatellites in human and yeast
(Haber and Louis 1998
).
To better understand the nature and origin of surprisingly long human (CA)s, we developed a technology that efficiently identifies clones containing long CA-rich sequences by a simple hybridization procedure. This approach was applied to human cosmid and PAC genomic libraries. The analysis of a subset of sequences strongly supports the conclusion that most long human CA-rich sequences are imperfect. The genome distribution of positive clones is highly skewed towards telomeres and minisatellites can usually be found in the vicinity. This observation is further strengthened by the parallel investigation of the currently available chromosome 7 human sequence data. Twenty-two new minisatellites have here been successfully identified, establishing the validity of this approach of minisatellite cloning by vicinity with long (CA)s.
| |
RESULTS |
|---|
|
|
|---|
Identification of Probes Appropriate for the Identification of Long CA Arrays
CHROMOSOME 1 COSMID LIBRARY SCREENING AND SEQUENCE ANALYSIS OF SOME POSITIVE CLONES
Five different (CA)n-derived DNA sequences were tested for their ability to detect genomic clones containing long (>100 bp) perfect or imperfect (CA)s, rather than short (CA)n < 40 microsatellites: (1) a long perfect synthetic (CA)n array; (2 and 3) two long natural imperfect (CA)s, R62 and R85, characterized previously in a search for rat minisatellite and microsatellite sequences (Amarger et al. 1998
|
CHROMOSOMAL ASSIGNMENT OF POSITIVE COSMID CLONES
A total of 22 cosmids detected by one or more of the imperfect (CA)n probes from the chromosome 1 library (R62, R85, 14C32, and 16C46) were then assigned to a chromosomal band by FISH and/or linkage (Fig. 1, circles; Table 1). Thirteen (59%) are subtelomeric, seven (32%) are interstitial, and two (9%) are juxtacentromeric. Unexpectedly, nine clones (five of which are located in a terminal band) do not originate from chromosome 1. Seven among the 13 chromosome 1 cosmids are in the telomeric bands. All but one are localized on 1p36.3 region and the last one gives a signal by FISH hybridization at both ends of the chromosome. Among the nontelomeric cosmid clones, two are localized on 1p34.35, one in 1p12, one in 1q42, and two others in a juxtacentromeric region.
|
Application of the Methodology to the Screening of a Total Human Genome PAC Library
Probe R62 detects clones with a very good signal-to-background ratio and will not detect a (CA)22 array [but would still detect a longer (CA)40 array, independently characterized from a pig cosmid library; data not shown]. R62 was thus selected to hybridize a high-density filter carrying ~20,000 independent PAC clones, corresponding to one human genome equivalent. The 42 clones giving the strongest signal were successfully assigned to a chromosomal band by FISH and/or linkage analysis and are represented by squares on the 550-band karyotype presented Figure 1. Twenty-five PACs are assigned to a terminal band (60%), 4 are juxtacentromeric (9.5%), and 13 are interstitial (30.5%).
Identification of Minisatellites Within Positive PAC and Cosmid Clones
The cosmid and PAC clones identified by R62 screening were searched
for minisatellites as described in Amarger et al. (1998)
. The DNA from
each clone was digested separately with three combinations of two
restriction enzymes: AluI and HaeIII; AluI
and HinfI; HaeIII and HinfI. Seventy-three
cosmid fragments with a size above 1.3 kb after the double digestion
were excised from agarose and tested for the presence of a
minisatellite by hybridization on a Southern blot. Three minisatellites
(CEB117, CEB118, CEB119) were isolated (Table 2).
Using the same approach, 316 PAC fragments were tested. Eighteen
minisatellites derived from 15 independent PAC clones were identified.
Their main characteristics (allele size, polymorphism) are presented in
Table 2. Twenty out of the 22 new minisatellites identified are derived
from the telomeric PAC or cosmid clones. One (or more) minisatellite
was identified in half of the telomeric PAC clones. PAC 1 contains (at
least) four minisatellites: UPS17, UPS21, UPS22 (Table 2), in addition
to CEB 70, which was characterized previously and independently (Spurr
et al. 1994
). PAC 50 contains two minisatellites: UPS6 and UPS7.
|
Parallel Investigation of Sequence Databases
The current status of publicly available human sequence data is
reflected at http://www.ncbi.nlm.nih.gov/genome/seq/. Significant progress has already been achieved for a number of chromosomes, such as
chromosomes 7, 17, 21, and 22, so that the screening of genome
libraries can be compared to some extent to the direct screening of
genome sequence. We selected chromosome 7 for further investigations,
because the available sequence is relatively well distributed along the
whole chromosome (i.e., in contrast with chromosome 17) and because the
distribution of minisatellites on chromosome 7 has been well documented
in earlier reports (Amarger et al. 1998
). At the time of this
investigation, ~54 Mb of sequence data was available, corresponding
to 30% of a total estimate of 170 Mb for chromosome 7. Figure
2 presents some of the results obtained by searching
and locating minisatellites and long CA sequences along the chromosome.
As a reminder, Figure 2A (left) is compiled from this report and
Amarger et al. (1998)
and locates minisatellite loci obtained by
screening cosmid or PAC libraries. Figure 2B presents the density of
tandem repeats with a repeat unit of 20 nucleotides or more, spanning
at least 1000 nucleotides as identified in the sequence data using the
tandem repeat finder described in Benson (1999)
. Figure 2C presents the
relative density of long (spanning at least 300 nucleotides) CA-rich
sequences detected by a FASTA search against the chromosome 7 sequence
data using a 800 bp-long (CA)400 as the query. None of the
matches in this range is a perfect CA repeat. Four matches span >800
bp, two of which originate from 7q36 (no higher order organization of
the degenerate CA rich array could be found; data not shown).
|
| |
DISCUSSION |
|---|
|
|
|---|
Minisatellites and microsatellites are two important classes of
tandem repeats used as genome markers. Minisatellites have been shown
to be useful tools to detect chromosomal rearrangements in a number of
pathological situations, including mental retardation (Flint et al.
1995
; Giraudeau et al. 1997
). Some of them are suspected to be involved
in gene regulation (Bennett et al. 1995
). Very unstable human
minisatellites have been characterized (Jeffreys et al. 1988
; Vergnaud
et al. 1991
). The mutation rate is apparently increased by
environmental agents such as radiation (Dubrova et al. 1997
). To
characterize new human minisatellites, as well as to investigate the
boundary between mini- and microsatellites, we have devised a strategy
enabling the rapid cloning of long CA-rich sequences from total
genomic libraries by hybridization screening. Five CA-rich probes were
evaluated for their ability to discriminate long CA-rich sequences from
ordinary microsatellites. The long perfect (CA)n
array does not discriminate against fragments containing a
(CA)20 array. The imperfect synthetic tandem repeat 16C46
[16-bp repeat unit containing an internal stretch of four (CA)s], is
more appropriate but cross-hybridizes with a cosmid clone containing a
(CA)22 array (data not shown) and fails to detect many
CA-rich sequences (Table 1). Probe R85, with internal stretches of up
to three (CA)s, detects clones with weak intensity. Probe R62 appears
to be a very good compromise. It will not detect a perfect
(CA)22, although it will detect a longer (CA)40
stretch. It detects many clones, with a good signal-to-background ratio, as compared to the synthetic 14C32 array. In contrast with 14C32, R62 also detects complex stretches of imperfect (CA)s devoid of
higher-order organization (Table 1).
First, to reveal the chromosomal distribution pattern of long CA
sequences, a chromosome 1-specific cosmid library was screened. Among
the 22 cosmids studied, 13 originate from chromosome 1 (Fig. 1). The
nine cosmids coming from other chromosomes, including five telomeric
loci, presumably reflect some contamination of the library. The fact
that the actual contamination of the library is less than the
proportion of non-chromosome 1 cosmids in our selection (40%) suggests
a telomeric bias in the contamination of the chromosome 1 library.
Seven of the 13 chromosome 1 cosmids are within the terminal 1p36.3
band, where a minisatellite cluster was previously and independently
described (Amarger et al. 1998
). Two are localized on 1p34.35 where
minisatellite MS1 (D1S7) is localized. Two others are localized in a
juxtacentromeric region where the MUC1 gene characterized by
tandem repeats units has been isolated. Another is localized in 1q42,
containing minisatellite MS32 (D1S8). Overall, the chromosome 1 distribution pattern of long CA-rich sequences is highly similar to the
chromosome 1 minisatellite distribution pattern that is shown in
Amarger et al. (1998)
or that can be deduced from the NIH/CEPH
Collaborative Mapping Group (1992)
data, suggesting that the procedure
could be applied on a larger scale for the identification of
minisatellite associated regions.
For this purpose, a whole-genome PAC library was screened using the R62
probe to enable the cloning of new minisatellite sequences in the
vicinity of CA-rich sequences. A significant proportion (25/42) of the
R62-positive PAC clones studied are assigned to a terminal band of the
karyotype and 13 of them contain new minisatellites. The
juxtacentromeric location of four (9.5%) PAC clones may reflect a
peculiar behavior of these regions or may indicate an ancestrally telomeric location. Thirteen clones (30.5%) are interstitial. One of
them is assigned to band 2q13 (Fig. 1) which is the position of a
well-characterized chromosome fusion site (IJdo et al. 1991
). Another
one is located on 1p31-p32, where one minisatellite was described
previously (Amarger et al. 1998
). This human chromosomal region is
homologous with 6qter in pig (Amarger et al. 1998
). As shown here in
PAC 1 and PAC 50 (Table 2), the use of large insert clones further
emphasizes the clustering of minisatellites within telomeric regions
(Vergnaud et al. 1993
).
The predominantly telomeric distribution is highly reminiscent of the
distribution of minisatellites across the human genome and clearly
different from the even distribution of microsatellites. In good
agreement with this, a similar result is obtained by the investigation
of sequence databases using a FASTA search (Pearson and Lipman 1988
)
and (CA)400 as the query. Although long perfect (CA)s
(>200 bp, e.g., accession no. Z81056 from Caenorhabditis elegans) are represented in the database, none of these originate from primates or even other mammals (data not shown). Figure 2C shows
the density of hits spanning at least 300 bp, across human chromosome
7. All such hits are imperfect, CA-rich stretches, with or without
higher-order redundancy. The distribution is almost identical to the
patterns shown in Figure 2, A and B, which reflect chromosome 7 minisatellite distribution. No obvious correlation is seen between the
chromosome 7 gene density presented in Figure 2D and the minisatellite
and long (CA)s distribution, with the exception perhaps of segment 7, band 7q22, which is a common peak (Fig. 2).
| |
METHODS |
|---|
|
|
|---|
High-Density Filters from Human Genome Libraries
High-density filters corresponding to a human chromosome 1 cosmid library were obtained from the Max-Planck Institute for Molecular Genetics. This library is represented by two high-density filters with 20,000 clones spotted on each membrane. Each clone is named by the number (c112) of the library and a specific number.
High-density filters corresponding to a human PAC library were obtained from the Roswell Park Cancer Institute (RPCI) center (http://bacpac.med.buffalo.edu/; the RPCI6 segment was used).
Perfect and Scrambled (CA)n Arrays Probes
A perfect long (CA)n probe and the two imperfect
(CA)n arrays 14C32
(GACACACTCACAGC)n and 16C46 (CACACACATGCACATA)n were synthesized as described in Vergnaud (1989)
. 14C32 and 16C46 were designed so as to contain a
maximum of three and four uninterrupted CA repeats, respectively. The
natural scrambled (CA)n arrays R62 (EMBL accession no. AC AJ000072) and R85 (EMBL accession no. AJ000073) were selected
among rat minisatellite sequences (Pravenec et al. 1996
; Amarger et al.
1998
). R62 and R85 repeat units are (CACACT)1-2CACAGYRR (14 or 20 bp) and (CAGGACA)1-2 GTGARCACA (16 or 23 bp), respectively.
Probe Labeling and Hybridization
The DNA fragments were recovered from agarose by centrifugation
through glass wool as described by Heery et al. (1990)
. The probes were
labeled with [
-32P]dCTP Institute of Chemical and
Nuclear (ICN) by the random priming procedure (Feinberg and Vogelstein
1984
). Hybridization was done as described in Vergnaud (1989)
in an
hybridization oven. After hybridization, the filters were washed in
1× SSC/0.1% SDS or 0.1× SSC/0.1% SDS. Hybridization and washing
were done at 60°C (screening of library filters) or 65°C
(hybridization of Southern blots).
Subcloning and Sequencing
Restriction digest fragments were recovered from agarose using the Jetsorb kit (Bioprobe System). The fragments were ligated into SmaI Puc 18 vector (Pharmacia) before transfer to Escherichia coli XL1 strain (Stratagene) by electroporation.
Recombinant plasmids were sequenced using 33P-labeled direct and reverse M13 primers with the Delta Taq sequencing kit (U.S. Biochemical) in a Perkin Elmer GenAmp PCR System 9600 thermocycler.
Identification of Minisatellites Within PAC and Cosmid Clones
DNA from each PAC or cosmid clone was digested by AluI and
HaeIII, AluI and HinfI, or HaeIII
and HinfI. Fragments >1.3 kb in size were recovered from
agarose and hybridized to Southern blots carrying two reference
individuals digested separately by AluI, HaeIII,
HinfI, and PvuII, as described in Amarger et al. (1998)
.
Chromosomal Assignment by Linkage Analysis
Linkage analysis was performed on the CEPH (Centre d'Etudes du
Polymorphisme Humain) panel of human families. Genotypes were managed
using GENBASE, developed by Jean Marc Sebaoun (Spurr et al. 1994
).
Linkage files output were converted to CRIMAP file format using the
LINK2CRI utility software written by John Attwood. CRIMAP version 2.4 (Green et al. 1990
) was used for the analyses.
Chromosomal Assignment by FISH
Cosmid or PAC DNAs were labeled with biotin by nick translation. After overnight hybridization on target chromosome spreads, slides were washed in 2× SSC at 37°C. Probes were detected with FITC-avidin and analyzed with an epifluorescence microscope (DMRB-Leica) equipped with a CCD camera driven by the Powergene system from Perceptive Scientific International (PSI).
Sequence Database Searches
Chromosome 7 sequence data (54 Mb available at the time of this
investigation) were retrieved from the National Center for Biotechnology Information (NCBI) site at
(http://www.ncbi.nlm.nih.gov/genome/seq/). Tandem repeats were
identified using the online software accessible at
http://c3.biomath.mssm.edu/trf.html (Benson 1999
). Large CA-rich sequences were detected using FASTA (Pearson and Lipman 1988
) and a
(CA)400 synthetic sequence as the query. The FASTA analysis was done using the computing facilities provided by Infobiogen (information at http://www.infobiogen.fr/). Each sequence contig was
assigned to a bin along the chromosome. Eleven bins of equal size were
defined. The horizontal bars presented in Figure 2 represent the
density of the object category per megabase of sequence in the
corresponding bin. The current distribution of human ESTs on chromosome
7 was retrieved from the NCBI site (http://www.ncbi.nlm.nih.gov/genemap/).
| |
ACKNOWLEDGMENTS |
|---|
We thank the Resource Center/Primary Database of the German Human Genome Project, Berlin, Germany for providing the human cosmid clones. We thank Olivier Raineteau and France Denoeud for their participation at different stages of this project as summer students. This work was supported by the EUROGEM project (EC contract GENE-CT93-0101), the PiGMaP project (VA; EC contract BIO2-CT94-3044), an Action Concertée Coordonée-Sciences de la Vie grant from the French Ministry of Research, and by a grant from La Ligue contre le Cancer (Département de Vendée, France) to F.G.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Present address: Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 753 24 Uppsala, Sweden.
5 Corresponding author.
E-MAIL Vergnaud{at}igmors.u-psud.fr; FAX 33 1 69 15 66 78.
| |
REFERENCES |
|---|
|
|
|---|
Received March 1, 1999; accepted in revised form May 25, 1999.
This article has been cited by other articles:
![]() |
F. Denoeud, G. Vergnaud, and G. Benson Predicting Human Minisatellite Polymorphism Genome Res., May 1, 2003; 13(5): 856 - 867. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. GIRAUDEAU, L. TAINE, V. BIANCALANA, B. DELOBEL, H. JOURNEL, C. MISSIRIAN, D. LACOMBE, D. BONNEAU, P. PARENT, D. AUBERT, et al. Use of a set of highly polymorphic minisatellite probes for the identification of cryptic 1p36.3 deletions in a large collection of patients with idiopathic mental retardation J. Med. Genet., February 1, 2001; 38(2): 121 - 125. [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||