|
|
|
Published online before print
July 12, 2001, 10.1101/gr.174001
Vol. 11, Issue 8, 1353-1364, August 2001
LETTER
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Transposable elements (TEs) have been implicated in the generation of genetic rearrangements, but their potential to mediate changes in the organization and architecture of host genomes could be even greater than previously thought. Here, we describe the naturally occurring structural and nucleotide variation around two TE insertions in the genome of Drosophila buzzatii. The studied regions correspond to the breakpoints of a widespread chromosomal inversion generated by ectopic recombination between oppositely oriented copies of a TE named Galileo. A detailed molecular analysis by Southern hybridization, PCR amplification, and DNA sequencing of 7.1 kb surrounding the inversion breakpoints in 39 D. buzzatii lines revealed an unprecedented degree of restructuring, consisting of 22 insertions of ten previously undescribed TEs, 13 deletions, 1 duplication, and 1 small inversion. All of these alterations occurred exclusively in inverted chromosomes and appear to have accumulated after the insertion of the Galileo elements, within or close to them. The nucleotide variation at the studied regions is six times lower in inverted than in noninverted chromosomes, suggesting that most of the observed changes originated in only 84,000 years. Galileo elements thus seemed to promote the transformation of these, otherwise normal, chromosomal regions in genetically unstable hotspots and highly efficient traps for transposon insertions. The particular features of two new Galileo copies found indicate that this TE belongs to the Foldback family. Together, our results strengthen the importance of TEs, and especially DNA transposons, as inducers of genome plasticity in evolution.
[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF368842-AF368859 and AF368861-AF368900. In addition, sequences submitted under accession nos. AF162796-AF162799 were used as a basis for this study.]
| |
INTRODUCTION |
|---|
|
|
|---|
Transposable elements (TEs) are intrinsic components of the genomes
of all living organisms, from the simplest
prokaryotes to the most complex eukaryotes (Berg and Howe 1989
; Capy et
al. 1998
). They make up a substantial fraction of most studied genomes, although TE content varies widely in different species and tends to be
positively correlated with total genome size (Hartl 2000
). Current
sequencing projects are revealing the precise organization of genomes
and how repetitive sequences are distributed and arranged within them.
In the euchromatin, TEs are usually found scattered as individual
repeats interspersed with single-copy sequences. The chromosomal arms
of Drosophila melanogaster, for example, contain sporadic TE
insertions separated by long stretches of unique DNA (Ashburner et al.
1999
; Adams et al. 2000
; Benos et al. 2000
). In the human genome around
35%-45% of the euchromatic portion is taken up by TEs, mainly SINEs
and LINEs, more or less randomly distributed in a short period
interspersion pattern (Lander et al. 2001
; Venter et al. 2001
).
Heterochromatic regions located around centromeres and telomeres of
eukaryote chromosomes, however, show a very different organization.
These regions consist almost exclusively of repeated sequences and
harbor a great accumulation of TE sequences. A well-known case is the
pericentromeric heterochromatin of D. melanogaster, where,
besides simple sequence repeats, there are many different families of
mostly rearranged TEs interspersed with very little unique DNA (Gatti
and Pimpinelli 1992
; Pimpinelli et al. 1995
; Adams et al. 2000
).
Traditionally, TEs have been considered as junk DNA or mere genomic
parasites, exploiting cells for their own propagation (Doolittle and
Sapienza 1980
; Orgel and Crick 1980
). However, though probably as
indirect consequences of their existence (Charlesworth et al. 1994
),
TEs exert a great variety of effects on the genome of their hosts and
could have played a very important role in the shaping of the genetic
material during evolution (Finnegan 1989
; McDonald 1995
; Kidwell and
Lisch 1997
). TEs are a major source of mutation and genetic variation
by getting inserted into coding sequences or regulatory regions of
genes. These insertions are generally deleterious for the organism, as
happens in many Drosophila phenotypic mutants (Lindsley and
Zimm 1992
) and several human genetic diseases (Wallace et al. 1991
;
Holmes et al. 1994
), but some have been involved in new gene expression
patterns and even new genes with apparently beneficial effects (Britten
1996
, 1997
; Lander et al. 2001
). Moreover, TEs possess the ability to promote genetic recombination between homologous sequences and can
produce large-scale chromosomal rearrangements (Lim and Simmons 1994
;
Gray 2000
). Specifically, TEs have been implicated in the origin of
some natural chromosomal inversions in different organisms, such as
bacteria (Daveran-Mingot et al. 1998
), yeast (Kim et al. 1998
), flies
(Cáceres et al. 1999
), and hominids (Schwartz et al. 1998
).
One of the most outstanding examples of natural variation in chromosome
structure is the extraordinarily rich inversion polymorphism in the
species of the Drosophila genus. Hundreds of polymorphic inversions have been described in Drosophila, and these
inversions do not distribute at random among species or among
chromosomal elements within species (Krimbas and Powell 1992
).
Furthermore, the breakpoints of inversions are not randomly distributed
along chromosomes either (Krimbas and Powell 1992
; Cáceres et al.
1997
). Despite the fact that not all naturally occurring inversions
have TEs at their breakpoints (Wesley and Eanes 1994
; Cirera et al. 1995
), inversion breakpoints have been found to be associated with TE
insertion sites in D. melanogaster (Lyttle and Haymer 1992
;
Andolfatto et al. 1999
), D. willistoni (Regner et al. 1996
), and the D. virilis group (Evgen'ev et al. 2000
), and direct
evidence for the implication of TEs in the origin of chromosomal
inversions has been obtained both in the laboratory (Lim and Simmons
1994
) and in nature (Cáceres et al. 1999
). Therefore, it has been
suggested that TEs could be responsible for the hotspots where repeated breaks have been observed (Krimbas and Powell 1992
; Evgen'ev et al.
2000
). However, the molecular confirmation of the existence of the
hotspots and the elucidation of their anatomy have remained elusive.
Recently, we cloned and sequenced the breakpoints of a highly
successful chromosomal inversion of D. buzzatii, inversion
2j, that was originated by ectopic recombination between
oppositely oriented copies of a TE (Cáceres et al. 1999
). This
inversion inverted a central segment of the 2 standard
(2st) chromosomal arrangement, the ancestral arrangement of
chromosome 2 for all of the D. buzzatii cluster
species (Ruiz and Wasserman 1993
), comprising around one-fourth of its
euchromatic fraction. In all 2j chromosomes both inversion
breakpoints were found to contain large insertions that were absent
from the noninverted 2st chromosomes. Because these insertions
fulfilled all characteristic features of TEs (Capy et al. 1998
), they
were considered copies of a new transposon that was named
Galileo. However, the insertion at the proximal breakpoint
exhibited a very complex structure, with copies of several different
internal repeats in an apparently chaotic arrangement. In addition, a
preliminary study revealed that some variation in the structure of both
breakpoint insertions existed among inverted chromosomes. Thus, the
further characterization of the 2j breakpoints offered the
opportunity to get a deeper insight into the molecular nature of
inversion breakpoints and to investigate the long-term effects that TE
insertions raised up to a high frequency might have on the organization
of the genome.
Here, an exhaustive molecular analysis of the 2j breakpoint regions in 9 lines with 2st chromosomes and 30 lines with the 2j inversion has uncovered an amazing degree of naturally occurring structural variation among 2j chromosomes, caused by the insertion of multiple TEs inside each other, deletions, and other small DNA rearrangements. The observed structural diversity contrasts with the low level of nucleotide variation, suggesting that the structural changes have accumulated in a short period of time. Therefore, the breakpoints of inversion 2j appear to be highly variable hotspots.
| |
RESULTS |
|---|
|
|
|---|
Structural Variation at Inversion 2j Breakpoint Regions
Figure 1 shows the breakpoint regions of
inversion 2j in the two D. buzzatii lines that were
previously characterized, st-1 and j-1 (Cáceres et al. 1999
). In
2st chromosomes the breakpoint regions have been designated as
AB (distal breakpoint) and CD (proximal breakpoint).
Inversion 2j took place between A and B sequences and between C and D sequences, and the
breakpoint regions in 2j chromosomes consist of AC
(distal breakpoint) and BD (proximal breakpoint). Large
insertions not present in 2st chromosomes are found in the
chromosomes with the inversion between A and C
sequences and between B and D sequences. In this
study, several molecular techniques with increasing resolution power
and accuracy were sequentially used to examine the structure of the
2j breakpoints in other 2st and 2j lines:
Southern blot hybridization, PCR amplification of different segments,
restriction mapping of the PCR products, and DNA sequencing.
|
No structural variation in the AB or CD regions was found between nine 2st lines of diverse geographic origins. Southern blot hybridization of PstI-digested genomic DNA with AB and CD probes revealed in all 2st lines the same bands of 1.7 kb and 5.4 kb, respectively, corresponding to the distal and proximal 2j breakpoint regions (Fig. 1). PCR amplification of the 1.73-kb R1-B1 and 0.37-kb A1-B1 segments (distal breakpoint) or the 0.32-kb C1-D2 segment (proximal breakpoint) did not show any size variation between the 2st lines either. Restriction mapping of the PCR products corroborated the absence of differences within each segment.
Clearly contrasting results were found in 2j chromosomes.
First, variation in the restriction map of the breakpoint regions in 30 2j lines was analyzed by Southern blot hybridization. Genomic DNA of all 2j lines was digested with PstI and
hybridized with a CD probe. Two hybridization bands were
observed in each of the 2j lines, corresponding to the
proximal and distal breakpoints with their respective insertions, and
remarkable variation was detected among them: There were 11 bands of
different sizes for the proximal breakpoint, whereas there were 6 different bands for the distal breakpoint (Table
1). For those lines whose PstI hybridization pattern did not coincide with that of j-1 (Fig. 1), a
more detailed restriction map of the breakpoint region was elaborated
by repeated Southern hybridization using additional restriction enzymes
(ClaI, DraI, EcoRI, EcoRV,
HindIII, SalI, and XbaI) and AB
and CD probes. This resulted in the identification of nine
main structural types in the proximal breakpoint and six in the distal
breakpoint (Table 1).
|
In the PCR analysis of the 2j lines, smaller regions,
containing just the breakpoint insertions and the adjacent single-copy DNA, were studied. Primer pairs B2-G6 and G5-D1 (proximal breakpoint) and R1-C2 and A1-C1 (distal breakpoint) were used with genomic DNA of
all 2j lines (Fig. 1). The PCR products of each line were compared by gel electrophoresis and were digested with restriction enzymes to detect and map any variation existing between them (Table
1). The PCR results revealed a small difference between two lines (j-16
and jz3-4) belonging to one of the previous nine structural
types defined in the proximal breakpoint and between several lines
previously ascribed to the same structural type of the distal
breakpoint, but otherwise confirmed the restriction maps obtained from
the Southern hybridizations. However, two problems arose in the PCR amplifications. First, Taq DNA polymerase sometimes jumped
between distant parts of certain DNA templates, causing an excision of the intervening segment. By sequencing the G5-D1 PCR
products of lines j-1 and j-19 we showed that two different ~1-kb
deletions have occurred during the amplification. In both cases the
deletions were found to take place between short homologous sequences
repeated in direct orientation that were contained within long inverted repeats. Thus, the PCR excision mechanism resembles that of spontaneous deletion by slippage during DNA replication (Farabaugh et al. 1978
;
Albertini et al. 1982
), which is stimulated by the formation of
stem-loop secondary structures (Egner and Berg 1981
). On the other
hand, no amplification occurred in some of the 2j lines (Table
1) and other combinations of primers different of the previous ones
were assayed. Nevertheless, a few breakpoint segments could not be
amplified either with the new combinations of primers or with PCR
conditions specially designed for the amplification of difficult
templates (see Methods).
|
As a final step, we sequenced the regions that were found to differ
between 2j lines (Fig. 2).
Fragments showing varying restriction patterns were cloned and
sequenced completely from the corresponding PCR products. However, when
two or more 2j lines did not show any variation in the
restriction map of a particular region, only the DNA of one of them was
sequenced as representative. A thorough effort was made to isolate and
characterize all segments in which differences have been detected.
Therefore, for those segments that were not PCR-amplified or that
suffered deletions during PCR, we turned to traditional cloning. Two
genomic libraries of the j-19 and jz3-4 lines were
constructed and in both lines the two breakpoints of inversion
2j were isolated. Those segments differing with regard to the
other 2j lines in each breakpoint were cloned and sequenced.
Altogether, the Southern blot hybridization and PCR data allowed us to infer the structures present at the breakpoints of the 30 2j lines studied, and DNA sequencing let us fully identify the changes that differentiate them (Fig. 2). Ten different structural types were found in the proximal breakpoint and seven in the distal breakpoint, and most of them were related by relatively simple changes, such as insertions or deletions of DNA segments. Thus, with this information we were able to postulate a plausible evolutionary sequence of changes between the breakpoint structures. To better illustrate the changes, five hypothetical variants (Hyp) have been represented as intermediaries between the observed ones. Also, for the sake of simplicity, we have considered that all insertions occurred independently, although a few of them could have originated in a single event. In the proximal breakpoint, the simplest structure is that of Hyp-P1, which contains a Galileo insertion between B and D sequences with three other TEs inserted inside (Fig. 2A). All of the TEs inside Galileo are flanked by direct repeats, presumably generated by the duplication of the target site during the insertion event, with the only exception of BuT1. In the latter case, the absence of the outermost nucleotide of the right inverted terminal repeat (ITR), suggests that a deletion after the BuT1 insertion removed its last base pair, the right target site duplication, and part of the left long ITR of Galileo (see below). From Hyp-P1, eight large insertions of seven different TEs, eight deletions, and the inversion of an internal segment are required to generate the structural diversity actually seen in the proximal breakpoint (see Fig. 2A for details). In the distal breakpoint, the simplest structure is that of j-12, formed by a 392-bp Galileo insertion between A and C sequences and an ISBu1 insertion in A (Fig. 2B). From here, eight insertions of seven different TEs, five deletions and a small duplication should have occurred to explain the other six structural variants observed (see Fig. 2B for details).
The most important features of the 22 large insertions (named from i1
to i22) found at the breakpoints of inversion 2j are summarized in Table 2. The target site
duplications flanking most insertions, the presence of multiple copies,
and the variation found among lines identify the inserted DNA sequences
as TEs (Capy et al. 1998
). According to sequence similarities between
the inserted sequences, we have recognized ten different previously
undescribed TEs (that will be described in detail elsewhere). Apart
from the original Galileo-1 and Galileo-2 insertions
that were implicated in the generation of inversion 2j
(Cáceres et al. 1999
), there are two more Galileo copies
inserted at the 2j breakpoints, Galileo-3 and
Galileo-4. These new Galileo copies are basically
composed of very long ITRs, with a relatively small and heterogeneous
central region that does not seem to encode any protein involved in
their transposition. Like the first two copies, they do not show
homology to any known sequence in the available databases, but they
display significant structural similarity to the Foldback
elements described in many organisms (Bingham and Zachar 1989
;
Hoffman-Liebermann et al. 1989
; Hankeln and Schmidt 1990
; Yuan et al.
1991
; Rebatchouk and Narita 1997
), including the ability to form stable
secondary structures when denatured (as indicated by the difficulties
encountered in the PCR amplification of the segments containing these
elements). Five other insertions corresponding to two closely related
TEs (average sequence identity 84%) also show similarities to
Foldback elements. These new elements have been named
Kepler and Newton and share many of their
characteristics with Galileo (average sequence identity 73%),
suggesting that they belong to the same family: (1) The terminal 40 bp
of their ITRs are identical (except for one single nucleotide
difference); (2) all of them tend to duplicate 7 bp of the target site
upon insertion (Table 2); and (3) Newton elements exhibit very
long ITRs resembling those of Galileo elements. Moreover,
insertions i10 to i17 correspond to four different TEs that can be
ascribed to Class II (Finnegan 1989
; Capy et al. 1998
) and have been
designated as D. buzzatii transposons or BuTs. Based on
sequence homologies they have been included in the hAT
superfamily (Calvi et al. 1991
). BuT1 and BuT2 show
similarity to the element Gandalf of D. koepferae
(Marín and Fontdevila 1995
), whereas BuT3 and BuT4
are related to the element Hopper of Bactrocera
dorsalis (Handler and Gomez 1997
). Finally, five insertions could
not be neatly classified into any of the previously known TE families.
BuT5 ends in ITRs of just three base pairs (followed by
subterminal imperfect inverted repeats of 17 bp), generates 9-bp
duplications during insertion, shows a moderately repetitive pattern by
in situ hybridization to D. buzzatii polytene chromosomes
(J.M. Ranz, pers. comm.), and has been tentatively considered a Class
II TE. The other four insertions belong to a new class of highly
repetitive mobile elements, whose members do not possess ITRs and seem
to duplicate two base pairs upon insertion. We have called them
ISBu elements because of their structural and sequence
similarity to the IS elements of the species of the obscura
group of Drosophila (Hagemann et al. 1998
).
|
Several other types of genetic rearrangements besides the multiple TE
insertions have been found at the 2j breakpoints. We have
detected 13 deletions of more than 17 bp (Fig. 2): d1, 93 bp; d2, 24 bp; d3, 238 bp; d4, 32 bp; d5, 179 bp; d6, 41 bp; d7, >536 bp; d8, 20 bp; d9, 17 bp; d10, 248 bp; d11, >649 bp; d12, 1023 bp; and d13, 136 bp (the lengths of d7 and d11 are minimum estimates, as the real size
of the deleted fragments is not known). Five of these deletions seem to
have originated by the well-established mechanism of slipped-strand
mispairing (Farabaugh et al. 1978
; Albertini et al. 1982
): d2, d3, and
d6 took place between two repeated sequences of 3-4 bp, eliminating
one of them and the intervening DNA; d8 and d13 removed one copy of a
sequence of 20 bp and 136 bp, respectively, duplicated in tandem. A
similar mechanism could also have generated the tandem duplication of the terminal 41 bp of Galileo-2 in j-9 (Fig. 2B). Finally, in some of the 2j lines we have found a change of orientation of a 55-bp Galileo-1 internal fragment, which suggests that an
inversion has occurred inside the proximal breakpoint insertion (Fig.
2A). This inversion spanned ~600 bp and was probably generated by
recombination between the oppositely oriented ITRs of Kepler-1
and Kepler-2 in Hyp-P2.
Nucleotide Variation at Inversion 2j Breakpoint Regions
In addition to the structural variation study, we sequenced 596 bp
corresponding to the A, B, C, and
D single-copy sequences in the nine 2st lines and 12 2j lines representing the diversity of structural types found.
For comparison, we obtained the nucleotide sequence of the same regions
in D. martensis, another species of the D. buzzatii
complex (Ruiz and Wasserman 1993
). These are seemingly noncoding
intergenic regions, located 0.5-3.7 kb apart from the rox8
(A), Pp1
-96A (C), and
nAcR
-96A (D) coding sequences (Cáceres
et al. 1999
). However, the last 112 bp of D show homology to a
putative D. melanogaster ORF recently discovered (Adams et al.
2000
) that would require further investigation. In the 12 2j
lines we sequenced also 839 bp of the distal breakpoint insertion and the ends of the proximal breakpoint insertion. Figure
3 summarizes the 81 polymorphic sites found
and Table 3 shows the estimates of the
nucleotide diversity,
(Nei 1987
), calculated ignoring sites with
alignment gaps or missing data only in pairwise comparisons.
|
|
Considering the four single-copy regions together, nucleotide diversity
is six times lower in 2j chromosomes than in 2st
chromosomes (Table 3). We carried out computer simulations of the
coalescent process using the DnaSP program (Rozas and
Rozas 1999
) to assess whether the nucleotide variation in each
chromosomal arrangement was significantly different. Ten thousand trees
were generated assuming the average number of nucleotide differences of
2st chromosomes, constant population size and no
recombination, and a statistically significant probability of 0.01 of
obtaining nucleotide diversity values as the one observed in
2j chromosomes or lower was found. In addition, 2st
and 2j chromosomes exhibit a great number of fixed
differences, including 17 nucleotide substitutions and six indels of
1-4 bp (TE insertions and target site duplications excluded). Using
D. martensis as outgroup, a neighbor-joining tree (Saitou and
Nei 1987
) was built with the single-copy sequences of 2st and
2j lines (Fig. 4). All 2j
sequences formed a monophyletic cluster of high bootstrap value,
clearly separated from that of 2st sequences, confirming the
proposed unique origin of the inversion (Cáceres et al. 1999
).
|
No significant departures from the neutral model were found with the
Tajima (1989)
and Fu and Li (1993)
tests, and nucleotide variation was
used to date the origin of the inversion and of the sampled
2st and 2j alleles. The age of the inversion was
estimated from the fixed differences between 2st and
2j chromosomes. The average number of nucleotide differences,
dxy (Nei 1987
), between 2st and 2j
chromosomes is 0.0353 and between D. buzzatii and D. martensis is 0.1094. Subtracting from both figures the intraspecific polymorphism (0.0197), the net average number of nucleotide substitutions is obtained (Nei 1987
). Combining the available information (Russo et al. 1995
; Rodríguez-Trelles et al.
2000
), we have estimated the divergence time between D. buzzatii and D. martensis as 5.8 million years (Myr) and
this results in a rate of 7.7 × 10
9 nucleotide
substitutions per site and per year for the breakpoint regions.
Therefore, the 2j inversion should be ~1 Myr old, which is
consistent with its widespread distribution through most D. buzzatii populations. The coalescence time of 2st and
2j alleles was estimated from the average number of pairwise
differences between the sequences of each chromosomal arrangement
(Rozas et al. 1999
). Accordingly, the sampled 2st alleles are
estimated to be 485,000 years old and the sampled 2j alleles
84,000 years old.
Finally, we have used the Kreitman and Hudson's homogeneity test to
detect differences in polymorphism levels between the studied regions
(Kreitman and Hudson 1991
). In the pooled set of 21 2st and
2j sequences no significant differences in polymorphism across
A, B, C, and D regions were found
(X2L = 2.86, df = 3,
P = 0.41). However, the TE sequences inserted at the
proximal breakpoint accumulate strikingly higher nucleotide variation
between 2j chromosomes than the single-copy regions and the
distal breakpoint insertion (X2L = 8.61,
df = 2, P = 0.01). The difference between the polymorphism
levels between 2j chromosomes at the TE insertions of each
breakpoint (X2L = 4.00, df = 1,
P = 0.04), which are expected to be equally selectively
constrained, suggests that there could be an intrinsic increased rate
of nucleotide change at the proximal breakpoint insertion.
| |
DISCUSSION |
|---|
|
|
|---|
Our detailed analysis of the breakpoints of inversion 2j has allowed us to characterize and reconstruct the evolutionary sequence of changes that has occurred in these regions. This study has revealed a great extent of genetic rearrangement at the breakpoints, consisting of 22 insertions of 10 different TEs, 13 deletions, a duplication, and an internal inversion. The low level of nucleotide variation at the single-copy sequences among 2j chromosomes suggests that the different structures in each breakpoint were generated gradually from a common ancestor in a short period of time. According to the coalescence time of the sampled 2j alleles, the changes that differentiate them, that is, 16 of the TE insertions, the 13 deletions, the duplication, and the internal inversion, are estimated to have occurred <84,000 years ago. Together with the inversion 2j itself, this represents a rapid degree of genome restructuring never found before in nature and qualifies the 2j breakpoints as genetically unstable hotspots.
Typically, the density of TE insertions in D. melanogaster
euchromatin is low. The 2.9-Mb sequence from the Adh region
(Ashburner et al. 1999
) and the 2.6-Mb sequence from the tip of the X
chromosome (Benos et al. 2000
) display just one insertion every 171 kb
and 155 kb on average, respectively. These values coincide with the previous observed frequencies of polymorphic insertions in particular gene regions of D. melanogaster and other Drosophila
species (Table 4). The frequency of
insertions found at the 2j breakpoints in D. buzzatii
2j chromosomes is, however, ~100 times higher than the D. melanogaster average and ~40 times bigger than the highest frequency of insertions ever found in the genus Drosophila,
that of the vermilion locus of D. ananassae (Table
4). This complex array of broken and rearranged TEs accumulated in the
2j breakpoints in 2j chromosomes clearly differs from
the expected organization of ordinary euchromatin and resembles more
closely some D. melanogaster heterochromatic regions (Miklos
et al. 1988
; Vaury et al. 1989
; Devlin et al. 1990
; Locke et al. 1999
).
|
What is the cause of these hotspots? The structural diversity in
2j chromosomes contrasts sharply with the lack of TE
insertions and structural variation in the homologous regions of
2st chromosomes and points to an effect of the inversion or of
the initial Galileo insertions as most likely explanations for
the hotspots. It has been argued that TEs should accumulate around
inversion breakpoints because the reduction of recombination protects
them from being eliminated by deleterious ectopic exchanges (Montgomery
et al. 1987
; Eanes et al. 1992
; Sniegowski and Charlesworth 1994
), and this could in part account for the insertions at the 2j
breakpoints. However, we think that the former explanation does not
agree completely with our observations. First, TE insertions accumulate
exclusively in very small regions around the 2j inversion
breakpoints. Of the 12.3 kb corresponding to the studied region in the
2j ancestral chromosome, all TE insertions have accumulated
just in the 5.1 kb comprised by the Galileo-1,
Galileo-2, and ISBu1-1 elements and none in the
surrounding single-copy DNA. In the two other polymorphic inversions in
which variation around the breakpoints was analyzed, In(3L)P
and In(2L)t of D. melanogaster, only two TE
insertions were found in 2.5 kb and 5 kb studied, respectively (Hasson
and Eanes 1996
; Andolfatto et al. 1999
). Second, although differences
in mobility levels may be involved, the complete absence among the TEs
inserted in the 2j breakpoints of retrotransposons, which seem
to constitute the majority of TEs in Drosophila (Arkhipova et
al. 1995
), is noteworthy. Third, given the actual intermediate frequency of inversion 2j, the reduction in recombination is
expected to affect 2st and 2j chromosomes in a
similar way. Finally, the recombination reduction hypothesis does not
account for deletions and other chromosomal rearrangements.
Accordingly, we favor the idea that the Galileo insertions
were probably the main inducers of the generation of the hotspots. It
is particularly remarkable that Galileo elements seem to
belong to the Foldback family. These elements have a
distinctive internally repeated structure and the FB elements
of D. melanogaster are characterized by the production of
extremely unstable mutations and chromosomal rearrangements at
unusually high frequencies in laboratory populations (Bingham and
Zachar 1989
; Lovering et al. 1991
). TE insertions, deletions, and the
other DNA rearrangements are not distributed uniformly along the
studied regions in 2j chromosomes. Instead, they appear to
have occurred after Galileo-1 and Galileo-2
insertions, within or very close to them (Fig. 2). Fourteen TEs out of
20 are inserted within Galileo-1 or Galileo-2 elements and all of the observed deletions occurred inside or at the
ends of pre-existing Galileo or Galileo-like
elements. The fact that all 2j chromosomes share three TE
insertions and one hypothetical deletion inside the Galileo-1
element and an ISBu1 insertion at the distal breakpoint is
suggestive of the hotspots predating the origin of the 2j
inversion, but a population bottleneck affecting 2j
chromosomes could also be invoked.
There are several cases of nested insertion of TEs inside
Foldback elements (Bingham and Zachar 1989
; Hoffman-Liebermann
et al. 1989
). This sometimes has been interpreted as a mechanism to
direct TE insertion outside of gene coding regions to reduce the damage
inflicted to the host by their mobilization (Kidwell and Lisch 1997
).
Among Class II TEs, insertion site preference has been examined only
for D. melanogaster P elements, which show some tendency to
insert into accessible chromatin regions in the 5' end of genes and
into pre-existing P copies (Engels 1996
; Liao et al. 2000
).
Nevertheless, many more examples are known among retrotransposons. In
Saccharomyces cerevisiae, Ty1, Ty2,
Ty3, and Ty4 elements are mostly located in regions
upstream of tRNA genes and other genes transcribed by RNA polymerase
III, whereas Ty5 prefers to integrate near silent chromatin at
the telomeres (Ji et al. 1993
; Zou and Voytas 1997
; Boeke and Devine
1998
; Kim et al. 1998
). In addition, blocks of nested retrotransposons
are formed in the intergenic regions of the maize genome by repeated insertion of them inside each other. In particular, 14 of the 23 retrotransposons found in the adh1-F region were inserted
within other retrotransposons (SanMiguel et al. 1996
, 1998
). Finally, there are also retrotransposons that seem to preferentially target heterochromatic regions, such as the KERV-1 element of
kangaroos (Waugh O'Neill et al. 1998
) or the I element of
D. melanogaster (Dimitri et al. 1997
).
On the other hand, TEs, and especially DNA transposons, are largely
known to mediate the production of various types of genetic rearrangements, including deletions, duplications, and inversions, with
high efficiency. In laboratory studies, P elements have been found to promote deletions and duplications of the flanking genomic sequences (Preston et al. 1996
) and internal deletions of P
DNA (Staveley et al. 1995
), whereas deletions recovered from
mariner elements usually affect the ITR of the element and the
DNA where is inserted (Lohe et al. 2000
). In both cases, extra DNA
appears sometimes between the deletion endpoints, as happens in our d4 and d5 deletions, which were accompanied by the introduction of a new
nucleotide. In addition, TEs are involved in promoting genetic recombination between homologous sequences (Sved et al. 1990
; McCarron
et al. 1994
; Lohe et al. 2000
). We have already shown that
recombination between Galileo copies was implicated in the generation of inversion 2j (Cáceres et al. 1999
), and
several other naturally occurring inversions in Diptera could have
originated by a similar mechanism as well (Lyttle and Haymer 1992
;
Mathiopoulos et al. 1998
; Andolfatto et al. 1999
). At the molecular
level, genetic instability might result from the presence of inverted repeats or the mechanism of transposition of the TEs inserted at the
2j breakpoints. Excluding ISBu1 and ISBu2,
all of the other elements are thought to transpose by a conservative
cut-and-paste mechanism (Finnegan 1989
; Capy et al. 1998
), in which DNA
breaks induced by the transposase at the transposon ends could be
aberrantly repaired by host repair functions, producing many different
types of DNA alterations (Lohe et al 2000
). Either an increased
mutation rate attributable to repeated repair events or an increased
frequency of genetic exchange with other copies of the element could
account for the higher nucleotide variation observed at the TE
insertion of the proximal breakpoint.
Several lessons can be drawn from this work. We have been able to
follow the effects of particular TE insertions on the genome through
evolutionary time and to see how these TEs seem to have altered the
dynamics of ordinary euchromatic regions, transforming them into highly
unstable heterochromatin-like structures. Previously, insertion and
expansion of P transposon transgenes in the D. melanogaster genome was found to induce local formation of
heterochromatin and this was proposed to be caused by the pairing of
adjacent repeats (Dorer and Henikoff 1994
). Also, the TE clustering at the 2j breakpoints is consistent with the retrotransposon
associations found in D. virilis chromosomes by in situ
hybridization (Evgen'ev et al. 2000
) but challenges the prototypical
picture of the Drosophila genome provided by D. melanogaster (Ashburner et al. 1999
; Adams et al. 2000
; Benos et
al. 2000
). An analogous disparity in TE distribution is found between
two plant species with very different genome sizes, Arabidopsis
thaliana and Zea mays. Similar to D. melanogaster, A. thaliana has a relatively small genome
and is atypical in that most TEs are located in the pericentromeric
region (Lin et al. 1999
; Mayer et al. 1999
). Our results are
reminiscent of the explosive accumulation of 23 retrotransposons in the
originally 80-kb adh-1 region of maize over the last 6 Myr
that resulted in the triplication of its size (SanMiguel et al. 1996
,
1998
). However, the TE insertion rate observed in the 7.1-kb
2j breakpoint regions of D. buzzatii is even faster.
The important effects that these blocks of TEs could have on genome
evolution and the possibility that Galileo or other
Foldback elements could be involved in analogous hotspots at
other locations of the D. buzzatii genome are very interesting
questions for further investigation.
| |
METHODS |
|---|
|
|
|---|
Drosophila Stocks
Thirty-nine lines of D. buzzatii and one of D. martensis were used in the study. The D. buzzatii lines (except jq7-3 and jq7-4) are isogenic for chromosome 2 and bear one of four different 2 chromosome arrangements: 2st, 2j, 2jz3, or 2jq7 (2jz3 and 2jq7 derive from the 2j arrangement and carry inversions 2z3 and 2q7, respectively). These lines were isolated from different natural populations covering the whole range of the species distribution. The geographic origins of the 2st lines are: st-1 and st-2, Carboneras (Spain); st-3, Vipos (Argentina); st-4, Guaritas (Brazil); st-5, Catamarca (Argentina); st-6, Salta (Argentina); st-7, Termas de Rio Hondo (Argentina); st-8, Ticucho (Argentina); and st-9, Trinkey (Australia). The geographic origin of the 2j lines is given in Table 1. The D. martensis line (Ma-4) is from Guaca (Venezuela).
Southern Hybridization and Construction of Genomic Libraries
Southern hybridization was carried out by standard methods as
described previously (Ranz et al. 1999
). Two probes were used for the
analysis of the 2j breakpoint regions (Fig. 1). The
AB probe consists of a 1.7-kb PstI fragment
containing 1178 bp of A and 510 bp of B sequences,
whereas the CD probe consists of a 0.9-kb DraI
fragment containing 242 bp of C and 715 bp of D sequences (Cáceres et al. 1999
). Two genomic libraries of the j-19
and jz3-4 D. buzzatii lines were constructed in the
GEM-11 vector (Promega) as described in Cáceres et al. (1999)
. To
isolate the clones containing the 2j breakpoints, these
libraries were screened by plaque hybridization with the AB
and CD probes.
PCR Amplification
For the PCR amplification, different pairs of oligonucleotide primers covering the entire regions of study were designed (see Table 5, available as an on-line supplement at http://www.genome.org, for sequence of primers). To specifically amplify the breakpoint insertions, primers that anneal to inserted repetitive sequences were always used in combination with primers located on the flanking nonrepetitive DNA. PCRs were carried out in a volume of 50 µl, including 100-200 ng of genomic DNA of each line, 20 pmoles of the different primers, 200 µM dNTPs, 1.5 mM MgCl2, and 1-1.5 units of Taq DNA polymerase. Typical temperature cycling conditions were 30 rounds of 30 sec at 94°C, 30 sec at 50-70°C (depending on the primer pair used), and 60-180 sec at 72°C. Difficult templates that were not amplified with the normal PCR conditions were assayed with the GC-Rich PCR System (Roche), using 0.5-2 M GC-Rich resolution solution and an elongation temperature of 68°C.
DNA Sequencing and Sequence Analysis
DNA fragments of interest coming from restriction enzyme digestion
or PCR amplification were cloned into Bluescript II SK (Stratagene) or
pGEM-T (Promega) vectors, respectively. These fragments were sequenced
on an ALFexpress (Amersham Pharmacia Biotech) or an ABI 373 A
(Perkin-Elmer) automated DNA sequencer, using M13 universal and reverse
primers. Nucleotide sequences were analyzed with the Wisconsin Package
(Genetics Computer Group). Bestfit was used to align pairs
of homologous sequences in different lines to detect inserted or
deleted segments. Similarity searches through the GenBank/EMBL
databases using FASTA, BLASTX, and
TBLASTX were carried out to identify the inserted
sequences. To analyze the nucleotide variation at the 2j
breakpoints, we sequenced the same regions as in Cáceres et al.
(1999)
in six additional 2st lines and seven additional 2j lines. Both strands of PCR-generated templates were
sequenced completely with different pairs of primers (Table 5, available as an on-line supplement at http://www.genome.org). Sequences were multiply aligned with Clustal W (Thompson et al. 1994
). Polymorphism analysis was performed using the DnaSP program (Rozas and Rozas 1999
). Phylogenetic analysis was performed using the PHYLIP software package (J. Felsenstein).
| |
ACKNOWLEDGMENTS |
|---|
We are deeply indebted to J.M. Ranz for the data on the repetitive nature of BuT5 and general advice at all stages of this work. J.S.F Barker kindly provided us with 15 of the D. buzzatii stocks used. J. Rozas greatly contributed to improve the nucleotide variation analysis. We also thank A. Barbadilla for helpful discussion of results, and M. Ashburner, A. Berry, P. Capy, F. Casares, A. Navarro, and D. Petrov for valuable comments and suggestions. Work was supported by grant PB98-0900-C02-01 from the Dirección General de Investigación Científica y Técnica (Ministerio de Educación y Cultura, Spain) awarded to A.R. and a doctoral FI fellowship from the Comissionat per a Universitats i Recerca (Generalitat de Catalunya, Spain) awarded to M.C.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL caceres{at}salk.edu; FAX (858) 558-7454.
Article published on-line before print: Genome Res., 10.1101/gr. 174001.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.174001.
| |
REFERENCES |
|---|
|
|
|---|
-heterochromatin of Drosophila melanogaster.
Proc. Natl. Acad. Sci.
85:
2051-2055.