|
|
|
Published online before print
August 16, 2001, 10.1101/gr.164201
Vol. 11, Issue 9, 1527-1540, September 2001
LETTER
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The recent release of the complete euchromatic genome sequence of
Drosophila melanogaster offers a unique opportunity to explore the evolutionary history of transposable elements (TEs) within the
genome of a higher eukaryote. In this report, we describe the
annotation and phylogenetic comparison of 178 full-length long terminal
repeat (LTR) retrotransposons from the sequenced component of the
D. melanogaster genome. We report the characterization of 17 LTR retrotransposon families described previously and five newly
discovered element families. Phylogenetically, these families can be
divided into three distinct lineages that consist of members from the
canonical Copia and Gypsy groups as well as a newly discovered third
group containing BEL, mazi, and roo
elements. Each family consists of members with average pairwise
identities
99% at the nucleotide level, indicating they may be the
products of recent transposition events. Consistent with the recent
transposition hypothesis, we found that 70% (125/178) of the elements
(across all families) have identical intra-element LTRs. Using the
synonymous substitution rate that has been calculated previously for
Drosophila (.016 substitutions per site per million years) and
the intra-element LTR divergence calculated here, the average age of
the remaining 30% (53/178) of the elements was found to be 137,000 ±89,000 yr. Collectively, these results indicate that many full-length
LTR retrotransposons present in the D. melanogaster genome
have transposed well after this species diverged from its closest
relative Drosophila simulans, 2.3 ± .3 million years ago.
| |
INTRODUCTION |
|---|
|
|
|---|
Retrotransposons are the most abundant and
widespread class of eukaryotic transposable elements. For example,
>50% of the maize genome (SanMiguel et al. 1996
) and >40% of the
human genome (Smit 1999
) are comprised of retrotransposons. The
biological importance of retrotransposons ranges from their
contribution to mutation (Green 1988
) and disease (Deininger and Batzer
1999
) to their postulated role in evolution (McDonald 1990
, 1993
;
Kidwell and Lisch 1997
). The genome sequencing of humans and selected experimental and agriculturally important species is providing an
unprecedented opportunity to view the patterns of variation existing
among the entire complement of retrotransposons in complete genomes.
Retrotransposons are made up of short interspersed nuclear elements
(SINES), long interspersed nuclear elements [LINES, also known as
non-long terminal repeat (LTR) retrotransposons], LTR retrotransposons, and retroviruses. LTR retrotransposons are named for
their long terminal repeats, which contain transcriptional regulatory
sites and flank the internal coding regions of the elements (Boeke and
Stoye 1997
). LTR retrotransposons are classically divided into two
groups, the Copia/Ty1 group and the Gypsy/Ty3 group. The distinguishing
characteristic between these groups is the order of the three protein
domains
protease (PR), reverse transcriptase (RT), and integrase
(IN)
encoded within the polymerase (pol) gene of
the elements. The pol region of Copia/Ty1 elements has the
order (PR, IN, RT) whereas the Gypsy/Ty3 group has the more familiar
arrangement (PR, RT, IN), which is also the order found in
retroviruses. Recently, a third major group of LTR retrotransposons has
been described containing the BEL element from Drosophila melanogaster as well as the Cer7-12 elements of
Caenorhabditis elegans (Bowen and McDonald 1999
; Malik et al.
2000
). The IN domain is also found downstream of the RT domain in this
third group of LTR retrotransposons.
LTR retrotransposons and retroviruses are nearly identical in structure
and are clearly related phylogenetically (Xiong and Eickbush 1988
). The
main distinguishing characteristic is that some LTR retrotransposons,
such as Ty1 in yeast, do not contain an envelope gene, which
renders retroviruses infectious. Many LTR retrotransposons, such as
gypsy from D. melanogaster, however, do encode
Envelope proteins and are infectious (Song et al. 1994
). Therefore, LTR
retrotransposons also serve as excellent models for the study of the
evolution of infectious retroviruses. Previous large-scale analyses of
the LTR retrotransposons of Saccharomyces cerevisiae (Jordan
and McDonald 1998
, 1999a
,b
; Kim et al. 1998
), C. elegans
(Bowen and McDonald 1999
), Zea mays, and Hordeum
vulgare (Shirasu et al. 2000
) have provided novel insights into the
molecular evolution and phylogenetic distribution of these retrotransposons.
Because the long terminal repeats of LTR retrotransposons are
synthesized from a single template during reverse transcription, they
are identical at the DNA sequence level on integration. Therefore, if
the nucleotide substitution rate for the host DNA polymerase is known,
the relative integration time or age of the element can be estimated
from the level of sequence divergence existing between an element's
LTRs. Previously, LTR nucleotide identity has been used to estimate the
time of insertion of LTR retrotransposons from S. cerevisiae,
Zea mays, and humans. For example, the age of the Ty1 and Ty2
elements from S. cerevisiae has been estimated to be <100,000
years old (Jordan and McDonald 1999b
; Promislow et al. 1999
). In
contrast, it has been reported that the LTR retrotransposons within the
ADH-region of the maize genome are much older, having transposed in the
past 2 to 6 million years (SanMiguel et al. 1998
). In a similar study,
it has been reported that most human endogenous retroviruses (HERVs)
inserted into the human genome long before humans diverged from the Old
World monkeys, more than 25 million years ago (Tristem 2000
).
In an initial effort to characterize all of the LTR retrotransposons
within the genome of D. melanogaster, we report the
annotation, phylogenetic analysis, and estimated ages of 178 full-length elements (i.e., those containing two intact LTRs and
intervening coding regions) from the nonredundant sequence found in
GenBank (Benson et al. 2000
). Our results indicate that there are three
major groups of LTR retrotransposons found within the D. melanogaster genome. We find that these three groups consist of
over 20 individual families of elements and that each family of
elements is composed of a group of highly homologous individual
elements (~99% identity at the nucleotide level). We conclude that
many LTR retrotransposons from each family have resulted from
evolutionarily recent episodes of transpositional activity.
| |
RESULTS |
|---|
|
|
|---|
Isolation and Characterization of D. melanogaster LTR Retrotransposon Families from the Genome Sequence
The majority of the D. melanogaster genome sequence now
available in GenBank is from the euchromatic regions of the genome (Adams et al. 2000
). In contrast, only 2.5% of the genome sequence is
derived from heterochromatic clones (Myers et al. 2000
). Constitutive heterochromatin, which comprises roughly one-third of the D. melanogaster genome, is poorly represented in the genome sequence
because these regions are not easily cloned into large inserts (Myers
et al. 2000
). Likewise, the assembly of DNA sequence from genomic
regions that contain many tandemly arranged repetitive elements can
result in the omission of internal sequences (E. Myers, pers. comm.). These issues are important to our study because D. melanogaster heterochromatin is thought to contain a substantial
number of transposable elements (TEs) (Pimpinelli et al. 1995
). Also
LTR retrotransposons have been shown to exist in nested arrays in other
species (SanMiguel et al. 1996a
). Consequently, any LTR retrotransposons located in these regions of the genome are precluded from our analysis. Further sequencing and gap-filling efforts being
conducted by Celera and the Berkeley Drosophila Genome
Project (Myers et al. 2000
) will likely identify additional elements
within both the euchromatic and heterochromatic portions of the genome. Therefore, our results represent a large sampling of LTR
retrotransposons from the euchromatin of D. melanogaster.
Following the initial characterization of each LTR retrotransposon (see
Methods section), ClustalW (Thompson et al. 1997
) was used
to align the nucleotides of each element to known full-length LTR
retrotransposons of D. melanogaster and other related
organisms listed in Table 1. Information
concerning all elements identified previously can be obtained through
Flybase (http://flybase.harvard.edu). This initial alignment was done to group elements into known and unknown families. The phylogram generated from this preliminary alignment is shown in Figure
1. For clarity, each family is labeled once
followed by the number of elements in each family. Because of the low
level of interfamily nucleotide sequence identity, this initial
phylogram may not accurately represent all interfamily relationships,
but it does allow us to classify elements into distinguishable groups.
The long interfamily branches and the large cluster of nearly identical
elements at the termini of the family lineages apparent in this initial
phylogram indicate that most families of D. melanogaster LTR
retrotransposons consist of a group of highly homologous elements.
Subsequently, computed pairwise nucleotide identities confirmed this
finding in that each family was found to consist of elements with
average pairwise nucleotide identities of
99% (Table
2).
|
|
|
The nucleotide sequences of the LTR retrotransposons that did not group
with known elements were translated and their RT motif was aligned to
the RT of the known elements (Fig.
2A). Only those RT motifs
that were uninterrupted by frame shifts or stop codons were used in the
characterization of novel families. This alignment was then used to
generate pairwise amino acid identities. Consistent with criteria
established previously (Bowen and McDonald 1999
), if an element had a
pairwise identity of <90% to a known RT, it was classified as a new
family. These novel elements are shown in boldface type in the first
column of Table
3.
Additional RT sequences from other Drosophila and invertebrate
species were included in this analysis to ensure that novel elements
from D. melanogaster did not represent previously
characterized elements from other related species. Elements with RT
pairwise identities >90% to a previously characterized element were
given the name of that element followed by a number.
|
|
All elements characterized in this study are listed individually in
Table 3. Included in this table are other distinguishing characteristics of the LTR retrotransposons, including their accession numbers, chromosomal locations, inverted terminal repeats (ITRs), direct terminal repeats (DTRs), LTR length, complete element length, and estimated age of each element (see below). The DTRs result from a
duplication of the unoccupied insertion site following proviral or
element insertion (Coffin et al. 1997
). In our study, the DTRs served
as internal controls for the assembly process following the whole
genome shotgun sequencing of D. melanogaster (Myers et al.
2000
). If proviral elements located at different loci were incorrectly
assembled, they would contain a mixed set of DTRs. For the elements
that contained unique DTR sequences, 93% were identical. The other 7%
are either incorrectly assembled or are possibly the result of ectopic
recombination between proviral elements at different loci. This
hypothesis is currently under further investigation. In summary we
identified 23 copia, six 17.6, 10 297, 18 412, four antonia, six blastopia, 21 blood, one burdock, two hamilton, eight
HMS Beagle, four mazi, five mdg1, eight
mdg3, one micropia, one nik, four
nomad, 40 roo, six tirant, four
transpac, two wolfman, and five unclassified elements
(Table 4). The elements' combined lengths
totaled 1, 279, 046 nucleotides or nearly 1% of the sequenced
component of the genome.
|
In general, the number of individual elements that we have
characterized for each LTR retrotransposon family is consistent with
average copy numbers estimated previously by in situ hybridization (Table 4). In situ hybridization also detects only those elements located in the polytenized, euchromatic component of the genome. For
example, the most abundant element found is the roo element, which occupies, on average, 68 ± 14 sites within the polytene chromosomes of natural D. melanogaster populations (Vieira et al. 1999
). We also found that roo was the most abundant
element with at least 40 full-length copies in the genome of the
sequenced D. melanogaster lab strain. In contrast, the least
abundant elements in natural populations are 1731,
gypsy, and zam. Each of these elements has,
on average, less than two copies in natural populations (Vieira et al.
1999
). We did not find any copies of these elements in our analysis of
the D. melanogaster genome sequence. This is not surprising in
that both gypsy and zam are known to be most abundant
in constitutive heterochromatin and are in low abundance or absent from
the euchromatic regions of some D. melanogaster strains
(Pimpinelli et al. 1995
; Baldrich et al. 1997
).
Phylogenetic Characterization of D. melanogaster LTR Retrotransposons
The aligned RTs of all D. melanogaster LTR retrotransposons shown in Figure 2A were used to generate the phylogenetic trees presented in Figure 2, B and C. Additional RT sequences from other Drosophila and invertebrate species as well as D. melanogaster elements that were not identified in our analysis are also included in this phylogeny. The RT phylogeny indicates that there are three major groups of LTR retrotransposons within the D. melanogaster genome.
Copia Group
To date, members of the Copia group found in D. melanogaster include only copia and 1731. In our study, we did not find any representatives of the 1731 family. In one instance we found a copia element, copia-8, inserted into another element (mdg3-2). Similar composite insertions have been observed previously in maize (SanMiguel et al. 1996Gypsy Group
Previously characterized Gypsy group members that we identified in this study include 17.6, 297, 412, blastopia, blood, burdock, HMS Beagle, mdg1, mdg3, micropia, nomad, tirant, and transpac. Novel Gypsy group members first identified and named here are antonia, hamilton, nik, and wolfman. Five additional elements were identified that are closely related to Gypsy group elements and not characterized previously at the level of RT amino-acid identity. These elements are listed by accession number only in Figure 1. The RTs of these elements contain frame shifts or stop codons and are difficult to characterize (see above discussion). These five elements will require further analysis before they can be confidently placed phylogenetically with respect to their RT identity. The Gypsy group found in D. melanogaster is composed of at least 20 different families that form three divergent clades seen in Figure 2, B and C. These clades all emerge from a central unsupported region deep within the Gypsy group. This is best illustrated in the unrooted phylogram shown in Figure 2B. One clade is composed of the elements 412, blood, mdg1 , and the novel element we named wolfman. This clade is well supported with a bootstrap value of 100. These four element families are closely related and form a very tight cluster at the end of a long branch that separates them from the rest of the Gypsy group. A second clade that is less well supported (bootstrap value= 63) is composed of the elements micropia, mdg3, and blastopia. In contrast to the previously described group, these three elements are very distantly related to each other as indicated by very long branch lengths leading to each element. The third clade that is found within the Gypsy group is better supported with a bootstrap value of 71. This is the most abundant clade within the D. melanogaster genome and contains, to date, 13 different families of elements. This clade can be divided further into two well-supported lineages containing five and eight families each. In addition to gypsy, burdock, HMS Beagle, and nomad, the group of five contains one novel element we have named hamilton. Previously, only LTR nucleotide sequences were available for the HMS Beagle element. Here we describe the first full-length copies of this element family. HMS Beagle is most closely related to the yoyo element first characterized in the Mediterranean fruit fly, Ceratitis capitata. As mentioned earlier, each LTR retrotransposon family that we have characterized consists of a group of nearly identical elements (
99%
identity at the nucleotide level). One exception to this is the
HMS Beagle family, which contains elements that are highly related yet show some level of phylogenetic structure (Figure 1).
HMS Beagle elements consist of two well supported phylogenetic groups that share 97% RT identity at the amino acid level (Figure 2C).
An additional phylogenetic comparison based on the entire DNA sequence
of the HMS Beagle elements supports the conclusion that
HMS Beagle elements consist of two well-defined subgroups (Fig. 3).
|
BEL Group
In addition to the Gypsy and Copia clades, there is a third well-supported clade (bootstrap value = 100) that contains the BEL, mazi, and roo families. The complete sequence of BEL has been published previously (Bell et al. 1985
|
Aging the LTR-Retrotransposons of D. melanogaster
As described previously, LTR nucleotide identity can be used to
estimate the time of integration (SanMiguel et al. 1998
) of LTR
retrotransposons and retroviruses. We have found that 125 of the LTR
retrotransposons described here have identical LTRs, whereas the
remaining 53 have low levels of nucleotide divergence. Identical LTRs
indicate that the elements have inserted recently and have not had time
to accumulate mutations between the LTRs. Using the synonymous
substitution rate for Drosophila (Li 1997
) of .016 substitutions per site per million years and the intra-element LTR
divergence calculated here, we have calculated the integration time of
the 53 elements with LTR nucleotide divergence. The average age of
the remaining 30% (53/178) of the elements was found to be
137,000 ± 89,000 yr. These results are shown in Table 3 and Figure
5A. Our data indicate that all of the
D. melanogaster LTR retrotransposons analyzed in this study
have integrated within the last 500,000 years. Moreover, the level of
divergence for most elements indicates integration times of <200,000 years.
|
A second method for dating TEs is to calculate the average pairwise
nucleotide identity across the complete sequences of the elements that
are very closely related at the phylogenetic level (Kapitonov and Jurka
1996
; Costas and Naveira 2000
). The assumption underlying this method
is that phylogenetically related elements are identical at the time of
integration and have subsequently accumulated differences attributable
to host DNA polymerase substitutions. This method also assumes that no
homogenization of the element sequences by molecular mechanisms related
to gene conversion has occurred subsequent to their integration. Most
elements we characterized were found to contain unique flanking
sequences in the DTRs (see above). This indicates that gene conversion
has not affected any of the sequence directly adjacent to the elements
since their insertion. Although it is a formal possibility that gene
conversion may have some role in homogenizing repetitive sequences,
available data indicate that the magnitude of its influence is not
sufficient to account for the degree of similarity we observe
(Nevo-Caspi and Kupiec 1996
). We analyzed each independent family of
elements using this second method. The results of this independent
method of aging elements also indicate that the full-length D. melanogaster LTR retrotransposons have integrated within
the last 500,000 years (Table 2; Fig. 5B).
Therefore, both available methods of computing the age of LTR
retrotransposon integration are consistent and indicate that many
full-length LTR retrotransposons in D. melanogaster are much younger than the age of the genome in which they reside. The estimated divergence time of D. melanogaster from its closest relative
D. simulans is 2.3 ± .3 million years ago (Li et al. 1999
).
| |
DISCUSSION |
|---|
|
|
|---|
We have identified 178 full-length LTR retrotransposons from the
sequenced, euchromatic component of the D. melanogaster
genome. We have characterized the D. melanogaster LTR
retrotransposons phylogenetically with respect to other known LTR
retrotransposon families. In doing so, we have identified five novel
families of LTR retrotransposons within the genome of D. melanogaster that we have named antonia,
hamilton, mazi, nik, and wolfman.
Four of these elements fall into the canonical Gypsy group of LTR
retrotransposons. mazi groups with a third well-defined group
of LTR retrotransposons present within the genome of D. melanogaster. Also found within this third group is the abundant
element roo, which we found encodes a single polyprotein that
contains all of the enzymes necessary for LTR retrotransposon
replication. We have previously characterized six families of elements
from C. elegans (Cer7-12) belonging to this
newly defined third clade (Bowen and McDonald 1999
), which also
contains Pao from Bombyx mori and Tas from
Ascaris lumbricoides (Xiong et al. 1993
). This group is most
closely related in structure to the Gypsy group of elements in that its
integrase gene is found downstream or 3' of reverse
transcriptase. In the Copia group, integrase is found
upstream or 5' of reverse transcriptase. Judging from its
almost equal phylogenetic distance from both Copia and Gypsy groups,
however, this third clade likely diverged at or near the time of
divergence of the Copia and Gypsy groups and represents an ancient
group of LTR retrotransposons. Additional elements belonging to this
third clade have since been characterized from the genomes of
Anopheles mosquitoes (Cook et al. 2000
). Even more recently,
it has been claimed that elements belonging to this clade have been
identified in the pufferfish Fugu rubripes, the ascidian
urochordate Ciona intestinalis, and the blood fluke Schistosoma mansoni (Malik et al. 2000
). Therefore, this third major group of LTR retrotransposons is likely to be widespread within
the metazoan lineage.
Most LTR retrotransposons and retroviruses contain at least one
translational frame shift following the gag gene to regulate the necessary overproduction of Gag relative to the other element proteins (Coffin et al. 1997
). In addition to roo, other
elements with single ORFs include copia from D. melanogaster as well as the Gypsy group members Cer1 from
C. elegans (Britten 1995
) and Tf1 from
Schizosaccharomyces pombe (Levin et al. 1990
). In the case of
Tf1, a differential protein degradation process regulates the
overproduction of Gag (Atwood et al. 1996
). The presence of a long,
single ORF in the roo element (Fig. 4) indicates that this
characteristic is present within all three major groups of LTR retrotransposons.
Perhaps the most intriguing result to appear from our study is the fact
that the D. melanogaster genome contains many families of
full-length LTR retrotransposons, all of which have been
transpositionally active in the very recent evolutionary past.
Interestingly, this finding is similar to what has been observed
previously for the LTR retrotransposons in S. cerevisiae
(Jordan and McDonald 1998
, 1999b
) and C. elegans (Bowen and
McDonald 1999
). As shown in our results, the age of the full-length LTR
retrotransposons in the D. melanogaster genome is
substantially younger than the melanogaster species itself.
Interestingly, the average ages of all full-length LTR retrotransposons
in yeast (<100,000 yr) (Promislow et al. 1999
) and nematode (<500,000
yr) (N. Bowen, unpubl.) are also much younger than the age of the
species in which they are contained. In contrast to these findings, it
has been reported using the same criteria we have used here that
several full-length LTR retrotransposons within the ADH-region of the
maize genome are much older, having transposed in the past two to six
million years (SanMiguel et al. 1998
). Likewise, the average age of
full-length HERVs (>25 million years) (Tristem 2000
) is significantly
older than the age of the human species (4-6 million years) (Yang
1996
; Goodman et al. 1998
).
One possible explanation for these contrasting comparisons may be
differential genome size constraints placed on these species. In this
regard, Adrian Bird (Bird 1995
) has postulated that large increases in
genome size are necessarily associated with increases in informational
noise. Bird believes that the evolution of global epigenetic control
mechanisms, such as methylation, were prerequisite to the significant
expansions in genome size observed over the evolutionary history of
higher eukaryotes. Although methylation is known to have a key role in
the silencing of LTR retrotransposons in plants and vertebrates (Yoder
et al. 1997
), it appears to be lacking this function in many
invertebrate species, including yeast, nematodes, and
Drosophila (Russo et al. 1996
). We believe that it may be for
reasons such as this that full-length LTR retrotransposons have not
accumulated over evolutionary time within these invertebrate genomes.
As a consequence of the lack of methylation-mediated silencing in
invertebrates, there would be strong selective pressure to eliminate
LTR retrotransposons from these genomes.
Evidence has been presented that supports the existence of an active
mechanism for the deletion of TEs in S. cerevisiae (Jordan and
McDonald 1999b
) and Drosophila (Petrov et al. 1996
). Numerous solo LTRs exist in the S. cerevisiae genome as the result of
intra-element LTR recombination, which serves to eliminate Ty
elements from the host's genome (Jordan and McDonald 1999b
). In
D. melanogaster, as well as in other Drosophila
species, DNA deletions of <400 bp are thought to occur at an
astonishingly high rate within the genome, leading to a very high
incidence of DNA loss (Petrov and Hartl 1997
). The level of DNA loss
attributable to deletions in Drosophila is estimated to be 75 times higher than that produced by deletions in mammals (Petrov and
Hartl 1997
). Consistent with this hypothesis, many of the elements that
we have characterized contain sequence deletions when compared to the
length of the canonical elements found in the public database (Tables 1
and 3). For example, every 17.6 element that we characterized
from the D. melanogaster genome is shorter than the 7439 bp
reported for the canonical 17.6 element (Saigo et al. 1984
).
These active processes that eliminate elements from genomes supply
selective pressure for these elements to continually replicate or risk
elimination (Jordan and McDonald 1999b
). In turn, this results in only
young, full-length elements within these genomes. Our results indicate that the full-length elements from the melanogaster genome are very young. Further support that genome size constraints can limit the
accumulation of older retrotransposons comes from the recent characterization of BARE-1 insertion patterns in Hordeum
spontaneum (barley) (Kalendar et al. 2000
). These authors have
shown that there is a positive correlation between full-length
BARE-1 elements and increased genome size in barley. They
further suggest that, if needed, selection for increased genome size
can be regulated by limiting the amount of intra-element LTR
recombination as described above for S. cerevisiae.
A final question concerns the immediate source of the full-length LTR
retrotransposons present within the D. melanogaster genome.
One possibility is that the full-length LTR retrotransposons are
descendants from older elements that have been actively eliminated from
the D. melanogaster genome or from older elements sequestered within the yet to be sequenced heterochromatin (see above discussions). An additional possibility is that the LTR retrotransposons currently present in the melanogaster genome have derived from elements recently introduced from other species via horizontal transfer. Recent
analyses of specific families of Drosophila LTR
retrotransposons indicate that horizontal transfer of LTR
retrotransposons can occur (Jordan et al. 1999
; Terzian et al. 2000
).
The extent of horizontal transfer and the degree to which it may have
contributed to the overall composition of LTR retrotransposons that are
present within the D. melanogaster genome remains to be determined.
Subsequent to the submission of this manuscript for publication, others
(Frame et al. 2001
) have reported a similar phylogenetic characterization for the members of the BEL clade. In their report, the
element Tinker is identical to the element family that we call
mazi. Similarly, a database of repetitive elements including a
section for Drosophila has been made available by Genetic
Information Research Institute (Jurka 2000
) in which individual members
of the families that we identify as antonia,
hamilton, mazi, nik, and wolfman
have been given the names Quasimodo, Gtwin,
Diver, Gypsy5, and Tabor, respectively.
| |
METHODS |
|---|
|
|
|---|
Genome Query
Searches of the entire sequenced component of the D. melanogaster genome (using Advanced BLAST,
http://www.ncbi.nlm.nih.gov/blast/blast.cgi) were initiated by
performing TBLASTN (Altschul et al. 1997
) searches using
the RT amino-acid sequence of the Drosophila LTR
retrotransposons BEL (U23420), copia (M11240), and
mdg3 (X95908). Based on preliminary phylogenies we have
constructed using the RT amino acid sequences of D. melanogaster LTR retrotransposons characterized previously, these
three elements were chosen to represent the most divergent lineages.
Nucleotide sequences with homology to the RTs were then subjected to a
dot matrix (see below) analysis to reveal the presence of LTR
sequences. Accession numbers that did not contain LTRs (as revealed by
dot matrix analysis) were not included for further characterization.
The characteristic ITRs as well as the DTRs that flank the LTRs were
identified (Coffin et al. 1997
). The region between LTRs was then
translated to reveal coding sequences. Subsequently, the RT of each
identified element was used to query the genome until all queries
produced TBLASTN hits that overlapped into other element families.
Element Characterization
Each accession number containing a match to RT was retrieved from
NCBI and ~10,000 bp on each side of the TBLASTN hit were
subjected to further analysis. Sequences were characterized using
SeqLab: The Graphical User Interface to the Wisconsin Package (GCG
1999
), maintained, and made accessible by the Research Computing Resource (RCR) at the University of Georgia (UGA)
(http://www.rcr.uga.edu/biosci/home.html). The dot matrix program
COMPARE was used to identify regions of identity within
each sequence. DOTPLOT was used to visualize the dot
matrixes generated with COMPARE. LTRs appeared as a line
offset from and parallel to the identity diagonal. The terminal direct
repeats were characterized from the flanking sequences of the LTRs. The
terminal dinucleotides of each element LTR were also identified. RT
motif amino-acid sequences of each element and the polyprotein of
roo were predicted using TRANSLATE.
Multiple Sequence Alignments and Phylogenetic Analyses
Se-Al (courtesy of Andrew Rambaut,
andrew.rambaut{at}zoo.ox.ac.uk) was used for multiple sequence file
format manipulation and labeling. The ClustalW (Thompson
et al. 1997
) extension to SeqLab (GCG 1999
) and ClustalX
(Thompson et al. 1997
) were used to generate nucleotide and amino acid
alignments as described previously (Bowen and McDonald 1999
). The seven
conserved domains of the RT motif (Xiong and Eickbush 1988
), also known as the RT ordered series of motifs (OSM) (Hudak and McClure 1999
), are
shown boxed in Figure 1A. Amino-acid and nucleotide alignment files may
be obtained from the authors by request. PHYLIP (Felsenstein 1993
) was used for distance calculation, tree production and bootstrap analysis. Phylogenetic analyses were performed on the
multiple sequence alignments using distance methods employed by
PHYLIP (Felsenstein 1993
). The PRODIST program of PHYLIP, employing the Categories model, was used to
generate distance matrices that were analyzed with the
NEIGHBOR program to generate neighbor-joining tree files.
SEQBOOT was also used to generate 100 data replicates that
were subsequently analyzed with PRODIST (Categories
model), followed by NEIGHBOR, and finally with
CONSENSE to generate an unrooted bootstrapped tree as
presented in Figure 2B. The phylogram presented in Figure 2C was rooted
with the 1731 and copia elements. All trees generated
were visualized with TreeViewPPC version 1.5.3 (Page 1996
).
LTR Retrotransposon Age Calculation
PAUP (Swofford 1999
) was used to calculate
intra-element LTR identities and entire element family pairwise
identities using the Kimura-2 parameter method. Ages were calculated
using the formula T = K/2`r', where
T = time of divergence, K =divergence, and
r= substitution rate (Li 1997
). The average synonymous or silent site substitution rate used was .016 substitutions per site per
million years as calculated by E.N. Moriyama from 39 genes between the
melanogaster and obscura groups where the time of
divergence was set at 30 million years ago (Li 1997
).
| |
ACKNOWLEDGMENTS |
|---|
We are grateful to Drs. John Avise, Susan Wessler, Kelly Dawe, Michael Bender, and members of our laboratory for comments on earlier drafts of this manuscript. We thank Maney Mazloom for assistance in searching Genbank for Drosophila elements. This work was supported by a National Institutes of Health grant to J.F.M.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Present Address: Section on Eukaryotic Transposable Elements, Laboratory of Gene Regulation and Development, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA.
2 Corresponding author.
E-MAIL mcgene{at}arches.uga.edu; FAX (706) 542-3910.
Article published on-line before print: Genome Res., 10.1101/gr.164201.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.164201.
| |
REFERENCES |
|---|
|
|
|---|
Received December 28, 2000; accepted in revised form June 4, 2001.
This article has been cited by other articles:
![]() |
C. M. Bergman and D. Bensasson Recent LTR retrotransposon insertion contrasts with waves of non-LTR insertion since speciation in Drosophila melanogaster PNAS, July 3, 2007; 104(27): 11340 - 11345. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Lim and T. Kai Unique germ-line organelle, nuage, functions to repress selfish genetic elements in Drosophila melanogaster PNAS, April 17, 2007; 104(16): 6714 - 6719. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Garcia Guerreiro and A. Fontdevila The Evolutionary History of Drosophila buzzatii. XXXVI. Molecular Structural Analysis of Osvaldo Retrotransposon Insertions in Colonizing Populations Unveils Drift Effects in Founder Events Genetics, January 1, 2007; 175(1): 301 - 310. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Polavarapu, N. J. Bowen, and J. F. McDonald Newly Identified Families of Human Endogenous Retroviruses J. Virol., May 1, 2006; 80(9): 4640 - 4642. [Full Text] [PDF] |
||||
![]() |
R. Jing, M. R. Knox, J. M. Lee, A. V. Vershinin, M. Ambrose, T. H. N. Ellis, and A. J. Flavell Insertional Polymorphism and Antiquity of PDR1 Retrotransposon Insertions in Pisum Species Genetics, October 1, 2005; 171(2): 741 - 752. [Abstract] [Full Text] [PDF] |