|
|
|
|
Vol. 12, Issue 9, 1345-1349, September 2002
LETTER
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Complex cereal genomes are largely composed of small gene-rich regions intermixed with 5 kb to 200 kb blocks of repetitive DNA. The repetitive DNA blocks are usually 5-methylated at 5'-CG-3' and 5'-CNG-3' cytosines in most or all adult tissues, while the genes are generally unmethylated at these sites. We have developed methylation-spanning linker library (MSLL) technology as a tool to span large methylated DNA blocks and thereby link unmethylated genic regions. MSLL clones contain insertions of large fragments that are size fractionated over gels after complete digestion of total genomic DNA with restriction enzymes that are sensitive to the 5-methylation of cytosines in 5'-CG-3' and 5'-CNG-3' sequences. Our data indicate that the end sequences of maize MSLL clones are greatly depleted in repetitive DNAs and enriched in genes relative to total genomic DNA. Combined with other gene-enrichment approaches, MSLL technology can efficiently generate fully-linked contiguous sequences in complex genomes that are resistant to shotgun sequencing.
| |
INTRODUCTION |
|---|
|
|
|---|
Large grass genomes, including those of barley, maize, and wheat,
are mostly comprised of 5-20 kb blocks of genes
intermixed with repetitive DNA blocks that range in size from a few kb
up to more than 100 kb (SanMiguel et al. 1996
; Panstruga et al. 1998
; Tikhonov et al. 1999
; Dubcovsky et al. 2001
; Wicker et al. 2001
). In a
few cases, including within tandem gene families or the rare unrelated
gene cluster (Llaca and Messing 1998
; Fu et al. 2001
), gene-rich
regions may extend for 50 kb or more. In most cases, however, gene-rich
chromosome segments only contain 1-4 genes in a region of 20 kb or
less. The repetitive DNAs found in the intermixed repeat blocks are
usually nested insertions of a class of mobile DNAs called long
terminal repeat- (LTR-) retrotransposons (SanMiguel et al. 1996
; Llaca
and Messing 1998
; Panstruga et al. 1998
; Kumar and Bennetzen 1999
;
Tikhonov et al. 1999
; Dubcovsky et al. 2001
; Fu et al. 2001
; Wicker et
al. 2001
). These nested LTR-retrotransposons can make up well over 50%
of total genomic DNA, with most of this DNA coming from only a handful
of different element families that have copy numbers of several
thousand per nuclear genome (SanMiguel and Bennetzen 1998
; Kumar and
Bennetzen 1999
; Vicient et al. 1999
; Meyers et al. 2001
). The
LTR-retrotransposons are relatively large in size (usually greater than
5 kb) and can have numerous copies that are < 99% identical within
the same genome (W. Ramakrishna, P. SanMiguel and J. Bennetzen, unpubl. obs.). Hence shotgun sequencing of complex grass genomes would not
yield information that can be converted into long contiguous sequences
(Bennetzen et al. 2001
). Nuclear genomes in higher plants contain
extensive 5-methylation of cytosine residues, much of it associated
with 5'-CG-3' and 5'-CNG-3' sequences (Gruenbaum et al.
1981a
). In many animals and plants, cytosine methylation is
associated with heterochromatic regions, where it apparently contributes to the transcriptional inactivity of any sequences within
the condensed chromatin. In maize, studies indicate that most of this
cytosine methylation is associated with repetitive DNAs, including the
LTR-retrotransposons. In adult tissues, most LTR-retrotransposons
appear to be 100% methylated at all 5'-CG-3' and 5'-CNG-3' sites,
while genes appear to be unmethylated at these same sites (Gruenbaum et
al. 1981b
); Antequera and Bird 1988
; Bennetzen et al. 1994
).
This lack of genic methylation differs somewhat from that observed in
mammals (Bird 1986
; Frank et al. 1991
), for instance, because the
absence of methylation is even found in genes that are not expressed in
the tissues that were the source of the DNA that was characterized
(Bennetzen et al. 1994
). There are likely to be exceptions to this
general rule (Jacobsen and Meyerowitz 1997
), but overall it appears
that most genic regions can be separated from most LTR-retrotransposon
blocks by this difference in DNA methylation. Perhaps most interesting of all, the size of the methylated DNA blocks has an upper limit of 200 kb or less (Springer 1992
; Bennetzen et al. 1994
), suggesting that an
'open' region of chromatin is needed at this spacing to allow some
essential nuclear function, such as the initiation of DNA replication
or chromosome folding.
Martienssen and coworkers have used the difference in DNA methylation
between repetitive and genic DNA as a tool to efficiently sequence
gene-rich regions of the maize genome by a shotgun approach (Rabinowicz
et al. 1999
). In their methyl filtration technology, the insertion of
sheared fragments of total genomic DNA into a plasmid vector is
followed by transformation of this library into an Escherichia
coli strain that will not tolerate 5-methylation of cytosines in
the cloned DNA. In maize, this approach yielded a greater than two-fold
enrichment for genic sequences and at least a six-fold depletion of
known LTR-retrotransposon sequences, relative to the same library
inserted into a methylation-tolerant E. coli host (Rabinowicz
et al. 1999
; Meyers et al. 2001
). Application of methyl filtration
technology to the full maize genome should yield contiguous sequences
(contigs) for the genic regions, varying in size from a few kb up to a
few dozen kb. However, this filtration technique does not localize the
genic contigs relative to each other or to the maize genetic map. We
have developed an approach that we call methylation-spanning linker
library (MSLL) technology that overcomes this deficiency and also
isolates the boundaries between methylated and unmethylated regions.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Figure 1 depicts the structure of a
contiguous 225 kb of maize nuclear DNA that contains the
adh1-F locus, as derived from the sequence of Tikhonov and
coworkers (Tikhonov et al. 1999
). Subsequent studies have shown that
this is the standard structure for most or all gene-containing portions
of the genome of maize and other large-genome cereals such as wheat and
barley (Edwards et al. 1992
; Llaca and Messing 1998
; Dubcovsky et al.
2001
; Fu et al. 2001
; Wicker et al. 2001
). These genic regions contain small islands of genes separated from each other by relatively large
blocks of repetitive DNA. Because these repetitive DNA blocks are
cytosine 5-methylated at all 5'-CG-3' and 5'-CNG-3' sites, they will
not be digested by restriction enzymes that are inhibited by this type
of DNA methylation. The lines in Figure 1 show the fragments that would
be generated by complete digestion with SalI or
HpaII, two enzymes that are cytosine-methylation sensitive.
|
Because HpaII has a 4 bp recognition/cleavage sequence, it
should cut maize DNA, which has an approximate 50% GC content (Hake and Walbot 1980
), an average of once every 44 (256) bp.
However, from the analysis of bulk maize sequence data (Meyers et al.
2001
), we found that the maize genome is about 53% AT, also observing
that 5'-CG-3' and 5'-CNG-3' are somewhat underrepresented in the maize
genome (a respective 4.6% and 5.3%, compared to the predicted 5.6%).
In accord with the depletion of these bases, the actual frequency of
HpaII sites that we detected in the bulk sequence data
generated by Meyers et al. (2001)
was once in every 305 bp. However,
pulsed field gel analysis of HpaII-digested maize DNA
indicates that the majority of the genome yields fragments that are
larger than 50 kb (Springer 1992
; Bennetzen et al. 1994
). Hence
isolation of HpaII fragments larger than a few kb should yield
segments of DNA that contain internal methylated and repetitive sequences, while the ends are anchored in the unmethylated sequences associated with genic regions. Sequencing the ends of such fragments will mark adjacent genic regions, and provide clones that contain the
sequences between those two genic regions. SalI can be used in
the same manner, although its 6 bp specificity indicates that it will
digest (on average) several kb away from the site at which a methylated
DNA block begins.
To test this theory, we constructed three small BAC libraries. One
library contained 9-14 kb HpaII fragments in the vector pBeloBAC11 (Kim et al. 1996
). The fragments were generated by complete digestion of B73 maize genomic DNA with HpaII,
followed by pulsed field gel electrophoresis of the digested DNA.
Fragments of 9-14 kb were excised from the gel, half-filled
(Zabarovsky and Allikmets 1986
) with dCTP, and ligated into the vector
digested with BamHI and partially filled with dATP, dGTP and
dTTP. The second two libraries were similarly constructed except that
they used fragments from a complete SalI digestion,
half-filled and ligated into BamHI-digested and half-filled
pBeloBAC11. The inserts in the SalI libraries were in the size
ranges of 10-15 kb and 15-25 kb. The half-fill ligation approach and
the choice of fragment size ranges that differed by less than two-fold
were both designed to minimize the possibility of chimeric clones.
Both ends of 192 BAC clones from the HpaII library and 96 clones for each of the two SalI libraries were
subjected to DNA sequence analysis. The length distribution of the
obtained sequences was 100-759 bp, generating 410 kb of total
sequence, with a mean read of 545 bp and a median read of 589 bp.
Overall, just over 77% of the sequencing reactions yielded 100 bp or
more of high quality (PHRED 20) (Ewing and Green 1998
) sequence. These
sequences were scored for the presence of genes, LTR-retrotransposons,
other repeats, and organellar DNA. The same analysis was performed on 167 sequences that we generated for the ends of EcoRI BACs
from maize inbred B73 (http://www.chori.org/bacpac). Table
1 shows the summarized results for these
four sets of data. The HpaII library yielded end sequences
that exhibited homology to genes, retrotransposons, and chloroplast DNA
for a respective 5%, 25% and 4% of the clones. The end sequences of
the smaller SalI library yielded these same classes in a
respective 14%, 23%, and 17% of clones, while the larger
SalI library yielded a respective 18%, 18%, and 5% for
these homologies. In sharp contrast, the EcoRI library yielded
ends that were homologous to genes, retrotransposons, and chloroplast
DNA for a respective 1%, 52%, and 1% of the time. The results for
the EcoRI BAC ends are very similar to those seen in random
sequencing of sheared fragments of the maize genome, which gave these
same homologies in about 1%, 48%, and 1% of the sequenced clones
(Rabinowicz et al. 1999
; Meyers et al. 2001
).
|
These results indicate that the MSLL clones are enriched for genes and deficient for LTR-retrotransposons at their ends. Even though the LTR-retrotransposons are underrepresented in the MSLL libraries, we were surprised that so many LTR-retrotransposon homologies were still detected. However, closer inspection of the sequence data indicated that just over 10% (14/136) of the sequence homologies to LTR-retrotransposons began within the first 10 bp of sequence for the MSLL clones. For the EcoRI library, about 55% (42/77) of the homologies to LTR-retrotransposons were found to begin in the first 10 bp of sequence. Hence many of the MSLL clones have ends that are outside, but very near, LTR-retrotransposon blocks. Because the average HpaII site will be less than 300 bp from the first methylated region, we expect that this close juxtaposition of the cleaved site and an LTR-retrotransposon block should be particularly frequent in HpaII-based MSLL clones.
Of the 292 HpaII sequences that gave 100 bp or more of PHRED
20 sequence, 106 sequences were found to have one or more additional HpaII sites within the sequences generated. Many of these
HpaII sites (71%) are within annotated LTR-retrotransposons.
Despite their extensive methylation at 5'-CG-3' and 5'-CNG-3' sites
(Bennetzen et al. 1994
), our sequence inspections have shown that the
most abundant LTR-retrotransposons that make up over half of the maize genome (SanMiguel and Bennetzen 1998
; Meyers et al. 2001
) are actually
enriched in HpaII sites relative to genes (one per 239 bp
versus one per 492 bp for genes), largely because of the higher average
GC content of the LTR-retrotransposons.
Various crude predictions for the maize genome suggest that around
5%-15% of the total nuclear DNA is composed of genes, while about
50%-80% is composed of LTR-retrotransposons (SanMiguel and Bennetzen
1998
; Meyers et al. 2001
). Part of the rationale for using an internal
control with EcoRI clone ends was to balance our criteria for
gene and LTR-retrotransposon identification. We expect that the true
frequencies of both genic and LTR-retrotransposon sequences in our data
sets are higher than the conservative numbers that we apply, but we
used the same criteria for both MSLL and EcoRI clones. Because
the predicted distance between the cleaved HpaII site and the
LTR-retrotransposon block should average less than 300 bp, we expect
that HpaII BAC end sites will rarely be within the
peptide-encoding portion of a gene, and thus rarely identified as genic.
The frequency of chloroplast DNA homologies in our MSLL clones was about as expected, given that these libraries contained relatively small inserts. In the SalI libraries, for instance, the same chloroplast fragments were seen over and over again, and these were the rare chloroplast SalI fragments that were of the appropriate size to be found in these libraries. Libraries made with bigger SalI fragments, larger than 50 kb, should not have any organellar DNA fragments. These chloroplast DNA fragments were useful, however, in that they exhibited the expected chloroplast sequence homology at both ends, suggesting that the libraries did not have many chimeric clones.
The results indicate that MSLL technology can be used to link adjacent
genic regions, while providing the intervening repetitive/methylated DNA block on a clone that is available for any subsequent analysis. In
order to be comprehensive, several complete digestion libraries would
need to be made across a large range of DNA sizes and with a variety of
restriction enzymes. HpaII would be especially efficient for
spanning small repetitive DNA blocks (those less than 15 kb or so),
whereas SalI, PstI, SmaI, SstII or
other methylation-sensitive enzymes with a 6 bp specificity could best
characterize large methylated blocks. End sequences of these BACs would
link and order all unmethylated regions. Combined with the sequences of these unmethylated regions by methyl filtration shotgun sequencing (Rabinowicz et al. 1999
), the MSLL data would permit the assembly of
full chromosome contigs.
The sequences of the MSLL BAC ends identify the boundaries between unmethylated DNA (e.g., the cleaved HpaII site) and methylated DNA (e.g., the first HpaII site in the BAC end sequence, hence the first methylated HpaII site). It is not known how these "epigenetic boundaries" are composed in plants, how they are established, or what effects they may have on adjacent genes. The MSLL technology provides comprehensive access to these regions, making them available for more detailed study.
Although our experiments were focused on characterization of the maize genome, MSLL technology should be equally useful for application to any genome with a structure similar to that of maize. These similarly accessible genomes would certainly include barley, wheat, and numerous other vascular plants, but could also include many animals, protests, or fungi with complex genomes and DNA methylation that is enriched in repetitive DNAs.
| |
METHODS |
|---|
|
|
|---|
Preparation of High-Molecular-Weight (HMW) DNA From Maize
Maize inbred B73 seeds were kindly provided by Dr. Chris Staiger
(Purdue University). HMW DNA was extracted from the leaves of
10-day-old seedlings, as previously described (Liu and Whittier 1994
).
The final nuclear pellet was embedded in an equal volume of 1.5%
low-melting-point agarose. Plugs containing 4-5 ug of DNA were treated
with lysis buffer (1% sodium lauryl sarcosine, 0.1 mg/ml proteinase K,
0.1% ascorbic acid, 0.5M EDTA pH 9.1) in 50 ml volume for 48 h at
50°C, with one change of lysis buffer after 24 h.
Digestion, Size Selection of HMW DNA, and BAC Library Construction
Before digestion, the agarose plugs were washed at 50°C in several volumes of washing buffer containing 1mM phenylmethylsulfonyl fluoride (PMSF). 6-8 plugs were equilibrated for 30 min to 1 h in 400 ul SalI or HpaII buffer. Digestion was performed in 200 ul volumes with 60 units of restriction enzymes at 37°C for 12-14 h to achieve complete digestion. Digested DNA plugs were size fractionated on the CHEF-DRII system (Bio-Rad) and visualized under DARK READER (Clare Chemical Research). Different size fragments were cut from the gel and recovered by GElase (Epicentre Technologies). SalI fragments were half filled with dTTP and dCTP, and ligated into the pBeloBAC11 vector digested with BamHI and partially filled with dATP and dGTP. HpaII fragments were partially filled with dCTP and cloned into the same vector digested with BamHI and partially filled with dATP, dGTP, and dTTP. Ligations were transformed into ElectroMAX DH10B-competent cells (Life Technologies).
BAC End Sequencing and Analysis
The BAC DNA templates were prepared following a modification of the
standard procedure (Kelley et al. 1999
). In brief, a 96-well block
containing 1.3 ml per well of LB medium with chloramphenicol was
incubated at 37oC for 16 h. 100 ul of the overnight culture
was transferred to four 96-well blocks containing 1.3 ml per well of LB
medium with chloramphenicol, and grown at 37oC for 14 hr.
The BAC DNA was isolated from each block using the Qiagen R.E.A.L. prep
96 system following the manufacturer's instructions. The final DNA
pellet containing the pooled DNA from each of the four identical
cultures was dissolved in 40 ul of water. Sequencing reactions were set
up with 10 ul template, 6 ul 5 X ABI (Perkin Elmer), 4ul Big Dye
(Perkin Elmer), 1 ul DMSO and 0.1 ul forward or reverse universal
primer in a final reaction volume of 21.1 ul. The sequence traces were
transferred to a Sun E450 server and bases were called using phred
(Ewing and Green 1998
). Vector sequences were masked by CROSS_MATCH.
BLAST (Altschul et al. 1997
) was employed to compare all the trimmed
sequences with the public sequence database as both nucleotides and
predicted amino acid translations. Retrotransposons were detected
additionally by CROSS_MATCH and TBLASTX against a set of 103 known retroelements.
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://www.chori.org/bacpac; BACPAC Resources home page for Pieter de Jong's lab at the Children's Hospital Oakland Research Institute.
| |
ACKNOWLEDGMENTS |
|---|
We thank the US National Science Foundation for support of this project (grants 9975618 and 9975793).
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL maize{at}bilbo.bio.purdue.edu; FAX (765) 496-1496.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.185902.
| |
REFERENCES |
|---|
|
|
|---|
Received February 14, 2002; accepted in revised form July 17, 2002.
This article has been cited by other articles:
![]() |
K. M. Devos, J. Ma, A. C. Pontaroli, L. H. Pratt, and J. L. Bennetzen Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat PNAS, December 27, 2005; 102(52): 19243 - 19248. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Haberer, S. Young, A. K. Bharti, H. Gundlach, C. Raymond, G. Fuks, E. Butler, R. A. Wing, S. Rounsley, B. Birren, et al. Structure and Architecture of the Maize Genome Plant Physiology, December 1, 2005; 139(4): 1612 - 1624. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Emberton, J. Ma, Y. Yuan, P. SanMiguel, and J. L. Bennetzen Gene enrichment in maize with hypomethylated partial restriction (HMPR) libraries Genome Res., October 1, 2005; 15(10): 1441 - 1446. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. M. Springer, X. Xu, and W. B. Barbazuk Utility of Different Gene Enrichment Approaches Toward Identifying and Sequencing the Maize Gene Space Plant Physiology, October 1, 2004; 136(2): 3023 - 3033. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sheehan, P. R. Farmer, and T. P. Brutnell Structure and Expression of Maize Phytochrome Family Homeologs Genetics, July 1, 2004; 167(3): 1395 - 1405. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||