|
|
|
Published online before print
December 14, 2001, 10.1101/gr.196001
Vol. 12, Issue 1, 122-131, January 2002
LETTER
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The Athila retroelements of Arabidopsis thaliana encode a putative envelope gene, suggesting that they are infectious retroviruses. Because most insertions are highly degenerate, we undertook a comprehensive analysis of the A. thaliana genome sequence to discern their conserved features. One family (Athila4) was identified whose members are largely intact and share >94% nucleotide identity. As a basis for comparison, related elements (the Calypso elements) were characterized from soybean. Consensus Calypso and Athila4 elements are 12-14 kb in length and have long terminal repeats of 1.3-1.8 kb. Gag and Pol are encoded on a single open reading frame (ORF) of 1801 (Calypso) and 1911 (Athila4) amino acids. Following the Gag-Pol ORF are noncoding regions of ~0.7 and 2 kb, which, respectively, flank the env-like gene. The env-like ORF begins with a putative splice acceptor site and encodes a protein with a predicted central transmembrane domain, similar to retroviral env genes. RNA of Athila elements was detected in an A. thaliana strain with decreased DNA methylation (ddm1). Additionally, a PCR survey identified related reverse transcriptases in diverse angiosperm genomes. Their ubiquitous nature and the potential for horizontal transfer by infection implicates these endogenous retroviruses as important vehicles for plant genome evolution.
| |
INTRODUCTION |
|---|
|
|
|---|
Retrotransposons and retroviruses (collectively referred to as
retroelements) replicate by a common mechanism of
reverse transcription (for review, see Coffin et al. 1997
).
Retroelement genomes are delimited by direct long terminal
repeats (LTRs), and they encode gag and pol genes,
whose products form a particulate replication intermediate wherein
reverse transcription takes place. The primary distinguishing feature
between the retrotransposons and retroviruses is that the latter have a
third gene called envelope (env). env encodes a transmembrane protein that associates with the cell membrane. The replication intermediate buds from the cell as a membrane-bound virion, and Env extends from the virion surface and
interacts with cellular receptors to mediate infection.
Phylogenetic relationships based on reverse transcriptase amino acid
sequences identify six distinct lineages of retroelements (Xiong and
Eickbush 1990
; Malik 2000
). One of these
the vertebrate retroviruses
encodes env genes and is infectious. The five
remaining groups are comprised mostly of retrotransposons and include
the well-studied Ty1-copia (Pseudoviridae) and
Ty3-gypsy (Metaviridae) elements (van Regenmortel et
al. 2000
), the so-called DIRS1 and BEL groups, and the caulimoviruses
(Malik et al. 2000
). With the exception of the caulimoviruses and the
sparsely populated DIRS1 group, some members of each lineage encode
open reading frames (ORFs) with env-like features
most
notably transmembrane domains. These include a large number of
invertebrate Ty3-gypsy elements (e.g., gypsy,
17.6, 297, and ZAM from Drosophila
melanogaster; TOM from Drosophila
ananassae; TED from Trichoplusia ni;
Yoyo from Ceratitis capitata; for review, see Lerat
and Capy 1999
), two Ty1-copia elements from plants (i.e.,
SIRE-1 from Glycine max [soybean] and Endovir
from A. thaliana; Laten et al. 1998
; Kapitonov and Jurka
1999
; Peterson-Burch et al. 2000
), and several BEL group elements
(e.g., Tas from Ascaris lumbricoides and
Cer7 from Caenorhabditis elegans; Felder et al. 1994
;
Bowen and McDonald 1999
). Analyses of env-like genes from the
various retroelement groups suggests that env was
independently acquired from viruses multiple times during evolution.
The env-like ORFs of several insect Ty3-gypsy
elements are closely related to env of the bacculoviruses, and for some Cer elements, the env-like gene is
related to env of the phleboviruses (Malik et al. 2000
).
Despite the widespread presence of env-like ORFs and their
similarity to known viral env genes, gypsy of D. melanogaster is the only known retroelement outside of the
retroviruses for which Env is known to play a role in infection (Kim et
al. 1994
; Song et al. 1994
).
In our analysis of the A. thaliana genome sequence, we
determined that Athila
a degenerate, centromere-associated
retroelement (Pelissier et al. 1995
, 1996
; Copenhaver et al. 1999
)
is
a Ty3-gypsy group retrotransposon with an env-like
ORF (Wright and Voytas 1998
). A related element was also described in
Pisum sativum (pea) called Cyclops-2 (Chavanne et al.
1998
). Because Cyclops-2 was less degenerate than Athila
and prevalent in related legumes, we sought potential functional
homologs in soybean. The soybean elements, called Calypso,
encode an env-like gene that shares 29% amino acid identity
to the corresponding gene of Cyclops-2 (Peterson-Burch et al.
2000
). This suggests that the env-like ORF has evolved under
functional constraint and likely plays a role in the life cycle of
these elements. For simplicity, we refer to Athila and related
retroelements as endogenous retroviruses, with the understanding that
the biological role of their env-like genes remains to be
determined. The sequence degeneracy of the endogenous plant
retroviruses described to date has frustrated attempts to define their
structural features. However, further characterization of the soybean
Calypso elements and completion of the A. thaliana
genome sequence has enabled us to construct consensus elements that
likely approximate functional elements. Here we report a detailed
description of these endogenous retroviruses and provide evidence of
their widespread distribution in higher plants.
| |
RESULTS |
|---|
|
|
|---|
Athila Elements of A. thaliana
To further characterize the A. thaliana Athila elements,
reverse transcriptases from all Ty3-gypsy elements were
recovered from the A. thaliana genome sequence (Initiative
2000
). BLAST searches (Altschul et al. 1990
) were
performed with reverse transcriptases from Athila1-1,
Tat4-1, and Tma3-1, three divergent A. thaliana Ty3-gypsy elements (Fig. 1;
Wright and Voytas 1998
). Additional BLAST searches were
performed with the most divergent retroelement sequences recovered. A
total of 191 unique reverse transcriptases were identified. These were
aligned, and when necessary, conservative changes were made to correct
frameshift mutations. A phylogenetic tree was generated by the
neighbor-joining method (Fig. 1; Saitou and Nei 1987
). The elements
clustered into three distinct clades designated the classic,
Tat, and Athila lineages.
|
Phylogenetic analysis further resolved the Athila elements
into clades, which we designated as distinct families (Fig. 1). These
included the previously described Athila1 family (Wright and
Voytas 1998
) and six additional families, designated
Athila4-Athila9. The Athila,
Athila2, and Athila3 families are not included in the
tree, because they have deletions of reverse transcriptase (Pelissier
et al. 1995
; Wright and Voytas 1998
). Elements in four of the seven
families had potential coding regions flanking reverse transcriptase
and discernible LTRs (Athila1, Athila4,
Athila5, and Athila6). Relatively intact insertions
were given species designations (e.g., Athila1-1, Fig. 1). The
Athila4 family was the largest and included 22 members. Six of
these (designated Athila4-1 to Athila4-6) were ~14
kb in length and had LTRs of ~1.8 kb (Fig.
2). Athila4-3 and
Athila4-4 were organized in tandem and shared a central LTR.
The tandem Athila4-3/Athila4-4 insertion and the
individual Athila4 elements were flanked by 5-bp target-site duplications (data not shown). In pairwise comparisons, the six Athila4 elements averaged 94% nucleotide identity across
their entirety. Despite this high degree of sequence identity,
gag and pol were broken by stop codons and
frameshifts.
|
Calypso Elements of Soybean
In the initial description of the Cyclops-2 element from
pea, related DNA sequences (based on Southern hybridizations) were found to be abundant in other legumes, including soybean (Chavanne et
al. 1998
). Cyclops-2 homologs were recovered from soybean by screening a genomic
phage library using the Cyclops-2
reverse transcriptase as a hybridization probe. Sixty-three hybridizing phage were characterized, 35 of which were unique based on restriction endonuclease mapping (data not shown). Each of these latter clones was
partially sequenced, and 24 had identifiable amino acid sequence similarity to Cyclops-2 and Athila (data not
shown). The coding regions of these 24 elements, however, were
replete with stop codons, frameshifts, deletions, and insertions. Five
of the least degenerate elements (designated Calypso1-1,
Calypso2-1, Calypso3-1, Calypso4-1, and
Calypso5-1) were sequenced (Fig.
3). Despite being highly degenerate, each
had discernable features such as LTRs and coding regions with
similarity to gag, pol, and the env-like gene of Cyclops-2. In the case of Calypso2-1, the 5'
LTR depicted in Figure 3 is the 3' LTR of a second Calypso
element that inserted within Calypso2-1.
Calypso5-1 contained an insertion within its reverse
transcriptase of 1.8 kb, with flanking 5-bp target-site duplications
and end sequences suggesting it is a retroelement solo LTR (Fig. 3;
data not shown). Despite the high level of sequence degeneracy, the
reverse transcriptases of the five Calypso elements shared, on
average, 81% amino acid identity.
|
Features of Athila4 and Calypso Elements
For most retroelements, the region adjacent to the 5' LTR is
complementary to a cellular tRNA and serves as the site for priming minus-strand DNA synthesis. The primer binding site (PBS) of
Athila4 and Calypso is complementary to the 3' end of
the aspartic acid tRNA for the GAC codon from A. thaliana and
soybean (Fig. 4a; Waldron et al. 1985
;
Wright and Voytas 1998
). Complementarity begins at variable positions
from the boundary of the 5' LTR, and extends for 13 bases for the
Athila4 elements and for 18 or 19 bases for given Calypso
elements. For most retroelements, a stretch of purines adjacent to
the 3' LTR serves as the priming site for plus-strand DNA synthesis. A
polypurine tract (PPT) is found at this location in Athila4
and Calypso, and all of the endogenous plant retroviruses
share a conserved core consensus sequence (TTTGGGGG), as well as less
conserved flanking sequences (Fig. 4B). A second PPT motif (PPT1) is
found after the env-like gene. The two PPTs delimit a large
noncoding region, which in Athila averages ~2 kb in length
(see Figs. 2, 3). A second noncoding region lies between gag-pol
and the env-like gene and approximates 0.7 kb.
|
Because of the large number of frameshifts and stop codons in
Calypso coding sequences, a quasiconsensus Calypso
element was generated. Additionally, a strict Athila4
consensus sequence was generated, which was possible because of the
high degree of sequence homogeneity. Figure
5A depicts the structural organization of these consensus elements, as well as Cyclops-2 from pea
(Chavanne et al. 1998
) and three partially sequenced homologs:
Diaspora from soybean, BAGY-2 from barley (Shirasu et
al. 2000
), and a degenerate element from rice that we identified from
the rice genome sequence data. The consensus Athila4 and
Calypso elements encode Gag and Pol on a single ORF of 1911 and 1801 amino acids, respectively. These coding regions were aligned
with Gag-Pol of Cyclops-2, and the percent amino acid identity
was plotted along their entirety (Fig. 5A). The first third of the ORFs
shares ~20% amino acid identity, and we define this region as Gag
(~600 amino acid [aa], Fig. 5A). The Calypso and
Cyclops-2 Gag proteins encode a conserved finger domain
characteristic of retrotransposon and retroviral nucleocapsid proteins
(Fig. 5B). This motif is not present in any of the other elements
examined. A block of ~110 amino acid residues is conserved near the N
terminus of Gag, suggesting a conserved function. Similarity to this
region can be detected in the sequence of Diaspora and the
rice element but not in BAGY-2 (data not shown).
|
Following Gag is a motif (LI/CDLGA) that we believe is the active site
of an aspartic acid protease (Fig. 5B). We define protease as the
region of ~40% amino acid identity that spans ~300 amino acid
residues between Gag and reverse transcriptase (depicted in light gray,
Fig. 5A). Although we do not know the precise boundaries of protease,
this region is considerably larger than the proteases of
retrotransposons and retroviruses (e.g., 181 aa for Ty1, 99 aa for HIV;
Merkulov et al. 1996
; Coffin et al. 1997
). Following protease is ~520
amino acids that comprise reverse transcriptase. Reverse transcriptase
shares ~68% amino acid identity among elements. All seven conserved
amino acid sequence domains characteristic of retroviral and
retrotransposon reverse transcriptases are evident (depicted in gray,
Fig. 5A). The remainder of Gag-Pol constitutes an ~450 amino acid
integrase (depicted in dark gray, Fig. 5A). In addition to the
conserved N-terminal zinc-binding motif and the DD35E motif of the
catalytic domain, integrase has a C-terminal extension with a GPY/F
module (Fig 5B; Malik and Eickbush 1999
). The GPY/F module is found in
some retroviral and Ty3/gypsy element integrases and is
thought to bind DNA. Integrase shares ~64% amino acid identity among
Athila4, Calypso, and Cyclops-2.
Features of the env-like Gene
After gag and pol and between the two noncoding
regions, the Athila4 and Calypso consensus elements
encode ORFs of 619 and 420 amino acids, respectively (Fig.
6A). Recognizable env-like ORFs
are also found in members of the Athila, Athila1
thru Athila6, and Athila9 families (data not
shown). The env-like ORFs of Athila2, Athila3, Athila4, and Athila6 share an
average of 69% amino acid sequence identity in pairwise comparisons
(data not shown). The Athila1 and Athila5 elements
are divergent (Fig. 1), and their env-like ORFs do not align
well with the other Athila families. The consensus
Calypso env-like gene shares 29% amino acid sequence identity
to the env-like gene of Cyclops-2 (Peterson-Burch et al. 2000
). Between the pea/soybean and A. thaliana elements,
no significant amino acid sequence similarity was observed.
|
Retroviral Env proteins are typically transported through
the endomembrane system, where they are proteolytically cleaved to
generate surface (SU) and transmembrane (TM) proteins prior to being
released on the cell surface (Coffin et al. 1997
). Targeting to the
endomembrane system is mediated by a signal sequence at the N terminus
of Env. The N termini of the Calypso and Cyclops-2 Env-like proteins are basic in nature (Fig. 6B). Additionally, the N
termini of Athila4 and Cyclops-2 are serine-rich. The
program PSORT predicts a variety of destinations for the
Env-like proteins within the cell (Nakai and Kanehisa 1992
). The most
confident predictions are for Calypso2-1 and
Athila4-1, which suggest targeting to the plasma membrane
(70% confidence) and endoplasmic reticulum (85% confidence), respectively.
At the cell surface, the retroviral TM protein spans the plasma
membrane. We previously reported a predicted transmembrane domain in
the env-like ORFs of several Athila elements
(Athila, Athila1, Athila2, and
Athila3, Wright and Voytas 1998
). The Athila4 consensus env-like ORF also encodes a transmembrane domain
(TM1, Fig. 6A-C), to which the program TMpred assigns a
score of 2006 (scores above 500 are considered significant; Hofmann and
Stoffel 1993
). Similarly, a transmembrane domain is predicted near the
center of the Calypso env-like ORF (TMpred value of 947; Fig. 6A,B and data not shown). The Cyclops-2 env-like protein has a potential transmembrane domain at a similar location, but
at a reduced confidence level relative to the other elements (TMpred value of 650; Fig. 6A,B and data not shown).
In our analysis of the Athila4 env-like gene, we noticed the
potential to encode additional transmembrane domains after the stop
codon. Strong transmembrane domains were predicted in either the same
frame as the env-like ORF (TM2, Fig. 6A-C) or in the +1 frame
(TM3, Fig. 6A-C). These potential coding regions extend the
env-like ORF to the first polypurine tract (PPT1) and are conserved among some element families (Fig. 6B). Small ORFs with predicted transmembrane domains are also found at the end of the Calypso and Cyclops-2 env-like ORFs. In the consensus
Calypso element, the ORF is in a
1 frame, although the
degree of degeneracy among Calypso elements reduces confidence
in this reading frame assignment. Unfortunately, sequences between
Athila families were too divergent to ascertain whether the
short ORFs are evolving as coding sequences based on frequencies of
synonymous versus nonsynonymous substitutions (data not shown).
Retroviral env genes are typically expressed from a spliced,
subgenomic mRNA (Coffin et al. 1997
). The Calypso
env-like ORF has a predicted splice-site acceptor sequence
located at the first methionine, to which the program
NetGene2 assigns a confidence level of 100% (Fig. 6D;
Brunak et al. 1991
; Hebsgaard et al. 1996
). Although there are other
favorable splice acceptors in the vicinity of the Calypso
env-like ORF, only the putative acceptor at the first methionine is
conserved (Fig. 6D). For the Athila elements, a number of
possible splice acceptors are present near the beginning of the
env-like gene, one of which is located just before the first
methionine and is consistently predicted with a high level of
confidence (>94%, Fig. 6D). In the animal retroviruses, the splice-site donor is typically located near the 5' LTR or within Gag.
Of the several possible donors in these regions, none are well
conserved between element families (data not shown).
Distribution of Endogenous Retroviruses in Plants
To assess the distribution of the endogenous retroviruses, a set of degenerate primers was designed based on conserved sequences flanking the seven domains of the Athila4, Cyclops-2, and Calypso reverse transcriptases. Genomic DNAs were surveyed by PCR from 18 plant species, including several dicots (Gossypium hirsutum, cotton; Platanus occidentalis, sycamore; Lycopersicon esculentum, tomato; Solanum tuberosum, potato; and Nicotiana tabacum, tobacco), old-world monocots (Oryza sativa, rice; Avena sativa, oat; Secale cereale, rye; Hordeum vulgare, barley; Triticum aestivum, wheat; and Sorghum bicolor, sorghum), new-world monocots (Zea mays, corn; Zea mays ssp., Parviglumis, teosinte; a Tripsicum species), and a gymnosperm (Pinus coulteri, pine). A. thaliana, soybean, and pea, served as positive controls. PCR products were cloned and at least three independent clones were sequenced from each species. Most of the PCR products from dicots and old-world monocots encoded reverse transcriptases that shared >60% amino acid identity. In contrast, the new-world monocots and the single gymnosperm surveyed only yielded reverse transcriptases from more distantly related elements (data not shown). The dicot reverse transcriptases had numerous stop codons and insertions/deletions; whereas, sequences from the old-world monocots were considerably less degenerate. The most intact reverse transcriptases were from oat, rye, and barley, which shared >85% nucleotide identity across species. All nucleotide and amino acid sequences were aligned, making it possible to identify and correct frameshifts. A neighbor-joining tree was constructed from these reverse transcriptases and representative Tat elements were used as an outgroup (Fig. 7). The endogenous retroviruses clustered on a single branch, and with few exceptions (e.g., Diaspora from soybean), elements from a single species clustered together.
|
Athila4 Elements Are Expressed in a Methylation-Deficient Strain
The A. thaliana Athila elements are preferentially located
within heterochromatin flanking the centromeres (Pelissier et al. 1996
;
Initiative 2000
). These regions contain repeated sequences that are
methylated and likely transcriptionally quiescent (Jeddeloh et al. 1998
; Consortium 2000
). Some Athila group elements and retrotransposons are expressed in genetic backgrounds, such as ddm1, which have reduced levels of DNA methylation (Hirochika et al. 2000
; Steimer et al. 2000
; Lindroth et al. 2001
). We sought Athila4 mRNAs by RT-PCR in ddm1 backgrounds, using
five different Athila4 primers and a poly(T) primer/adaptor.
Fifteen separate Athila cDNAs were cloned and sequenced: eight
were Athila4 elements, four were Athila6 elements,
and three could not be easily assigned to a family because of sequence
degeneracy (Fig. 8). No transcripts were
recovered from a wild-type strain. All 15 transcripts terminated within
a 200-bp window of a consensus Athila LTR. One
Athila4 cDNA was primed with a gag primer and was 8.4 kb in length. A portion of this clone (1.8 kb) was sequenced and
matched Athila4-6, except for a single base change, which
could be the result of a PCR-induced error. No spliced transcripts were
detected.
|
| |
DISCUSSION |
|---|
|
|
|---|
We previously reported that the A. thaliana Athila
retroelements have a novel feature
a putative env gene
that may enable them to be infectious (Wright and Voytas 1998
).
Homologs of Athila elements have been described in other plant
species (e.g., Cyclops-2 of pea, Chavanne et al. 1998
;
BAGY-2 of barley, Shirasu et al. 2000
), all of which are
replete with deletions, rearrangements, or stop codons. To ascertain
conserved features of these endogenous plant retroviruses, we analyzed
Athila elements in the completed A. thaliana genome
sequence. We also recovered Athila homologs from soybean
the
so-called Calypso elements. By generating consensus sequences
from degenerate insertions, we were able to identify features that
likely define a functional element.
Shared Features Among Plant Endogenous Retroviruses
The characterized plant endogenous retroviruses range from 12 to 14 kb in length and have LTRs ranging from 1.3 to 1.8 kb, among the
largest LTRs described to date for Ty3/gypsy elements. Like
many plant retroelements, Gag and Pol are encoded on a single ORF. One
striking feature among Athila4, Calypso, and
Cyclops-2 is the high degree of sequence conservation of
pol. Between these elements, reverse transcriptase and
integrase, respectively, share ~68% and 64% amino acid identity.
Because reverse transcription is error prone and often leads to
accelerated rates of sequence evolution (Gabriel and Mules 1999
), this
suggests that either Pol is under very tight functional constraints or
that the elements have invaded their plant hosts relatively recently.
The phylogenetic tree of A. thaliana Ty3/gypsy
elements provides some support for the recent acquisition of
Athila elements. The short branch lengths supporting the
Athila and Tat element groups suggest they share a
more recent common ancestor relative to classic Ty3/gypsy
element families (see arrows in Fig. 1). Because the Athila
elements encode an env-like ORF, horizontal transfer by
infection is one possibility for the apparent difference in their
evolutionary history.
In contrast to pol, gag shows higher levels of
sequence divergence. This is typical of retroelement gag
genes, whose products carry out structural roles. Nonetheless,
Calypso and Cyclops-2 Gag have conserved finger
motifs characteristic of nucleocapsid proteins, and all three elements
have a conserved domain near the Gag N terminus. Gag averages 675 amino
acid residues (measured from the first methionine to the active site of
protease), which is larger than most classic plant Ty3/gypsy
element Gag proteins (e.g., Reina, 482 aa; Avramova et al.
1996
). If the endogenous retroviruses are infectious, Gag may carry out
functions related to transmission. Many plant viruses encode movement
proteins that transport viral nucleic acids from cell to cell (Ghoshroy
et al. 1997
) or factors that facilitate spread by insect vectors
(Woolston et al. 1983
). These proteins are typically not
well-conserved, and no similarity to the Gag proteins of the endogenous
retroviruses is evident.
Another characteristic feature of the endogenous retroviruses is the
presence of two large noncoding regions that flank the env-like ORF. The upstream region approximates 0.7 kb, and the downstream region approximates 2 kb. In most retroelements, noncoding sequences are very small, and it is generally thought that extraneous sequences are lost to maximize the amount of genetic information that
can be encoded within an element. The conservation of noncoding domains
among the endogenous retroviruses suggests they play a role in
replication. Possibilities include regulating gene expression (either
transcription or translation) or facilitating expression of the
env-like ORF (e.g., in splicing or in enabling internal ribosome entry). Of the two noncoding regions, the 3' region is flanked
by conserved polypurine tracts (PPTs), which might serve as priming
sites for plus-strand DNA synthesis during reverse transcription.
Multiple PPTs are found in other retroelements such as Ty1 and HIV,
although in these elements, the upstream PPT resides within pol
(Hungnes et al. 1993
; Heyman et al. 1995
). A third, small noncoding
region is also found between the 5' LTR and the start of the gag-pol
ORF. This region carries the putative primer binding site (PBS) for
minus-strand DNA synthesis, which is complementary to an Asp tRNA. This
is a distinguishing feature of the endogenous retroviruses, for the
classic Ty3/gypsy elements and the Ty1/copia group
elements have PBSs complementary to initiator Met tRNAs, and
the Tat elements have PBSs complementary to Asn, Lys, and Arg tRNAs
(Wright and Voytas 1998
; D.A. Wright, unpublished observation).
The env-Like ORF and Its Potential Role in Infection
We previously concluded that the env-like genes of the
endogenous retroviruses likely play a functional role in replication, based on sequence conservation between the Cyclops-2 and
Calypso env-like genes (Peterson-Burch et al. 2000
).
With the availability of the A. thaliana genome sequence,
additional Athila env-like genes made it possible to discern
conserved features. Computer models predict the Env-like proteins are
expressed from a spliced subgenomic mRNA. The Env-like proteins are
also predicted to encode a central transmembrane domain. Env-like
proteins of animal retroviruses often have both central and C-terminal
transmembrane domains, the latter of which anchors Env within the
endoplasmic reticulum. In most endogenous plant retroviruses, there is
a short ORF after the env-like gene that is predicted to
encode a transmembrane domain and could serve an anchoring role.
Expression of the short ORF as part of Env would require read-through
of a stop codon. Alternatively, because a transmembrane domain is also
encoded in adjacent reading frames, ribosomal frameshifting may be
employed. Attempts to determine if this region is evolving as a coding
sequence were not productive because of the high degree of sequence
divergence between element families. As other endogenous plant
retroviruses are identified, it will be of interest to determine
whether they too have this short transmembrane domain-encoding ORF. A
functional element will be required to determine experimentally whether
it has a biological role.
If the endogenous retroviruses are infectious, then the Env-like
protein is likely important in this process. During infection by
retroviruses, Env facilitates the merging of the membrane-bound virion
with the target cell. The plant cell wall poses an obstacle to
membrane-mediated infection. Nonetheless, enveloped plant viruses do
exist, including members of Bunyaviridae and the
Rhabdoviridae (van Regenmortel et al. 2000
). These viruses bud
from the endomembrane system and accumulate in the cell until a feeding
invertebrate ingests them and carries them to another plant. Recent
work has shown that some animal retrotransposons have acquired env
genes from viruses (Malik 2000
). For example, the env gene
of the D. melanogaster gypsy element is related to env
of the baculoviruses and was likely acquired by gypsy
through transduction. To date, however, we have not identified
similarity between the env-like ORFs of the endogenous plant
retroviruses and those of viruses or other genes in the databases. It
should be mentioned that some plant Ty1/copia group
retrotransposons have env-like ORFs (Laten et al. 1998
, 1999
;
Kapitonov and Jurka 1999
; Peterson-Burch et al. 2000
). These genes are
unrelated to the env-like genes of Athila and its
homologs, but they are predicted to be transmembrane proteins. It is
tempting to speculate that env-like genes play a similar role
in both groups of elements.
Distribution of Endogenous Retroviruses in Plants
Using a PCR-based assay, we found that endogenous retroviruses are widely distributed among angiosperms. The recovered reverse transcriptases were strikingly similar and shared >60% amino acid identity. This high degree of sequence conservation belied the fact that most carried mutations, the exception being elements from cereals, namely oat, rye, and barley. The integrity of the cereal reverse transcriptases implies that these elements have undergone more recent episodes of replication, and to date, they are the best candidates for functional endogenous retroviruses. Elements were not recovered from a gymnosperm (pine) and the three new-world monocot species tested (corn, teosinte, and tripsicum). It may be that the endogenous retroviruses are not present in the genomes of these plants or that they are divergent and cannot be amplified by the primers. Phylogenetic analyses of the reverse transcriptases indicated that, with few exceptions, the relationships among the elements reflected relationships among their hosts. This suggests that either the endogenous retroviruses are inherited vertically or if they are viruses, they have a limited host range. As more plant genomes are characterized in greater detail, it will be of interest to determine whether high levels of sequence conservation is a general feature of the endogenous plant retroviruses. This will help address the question as to whether or not they are young retroelements relative to the classic Ty3/gypsy elements.
Expression and Activity of A. thaliana Athila Group Elements
Most A. thaliana Athila elements are located within
centromeric heterochromatin, which is typically highly methylated
(Vongs et al. 1993
; Pelissier et al. 1996
; Copenhaver et al. 1999
).
Methylation is thought to control transposable element activity (Yoder
et al. 1997
; Martienssen 1998
), and several recent studies in plants have shown that decreases in DNA methylation are associated with increased transposable element activity (Hirochika et al. 2000
; Lindroth et al. 2001
; Miura et al. 2001
; Singer et al. 2001
). Of
particular relevance to this study, truncated Athila
transcripts have been reported in strains with mom1
mutations, which derepress transcriptionally silent loci (Amedeo et
al. 2000
; Steimer et al. 2000
).
We performed RT-PCR on RNAs isolated from ddm1 plants and were able to amplify cDNA from Athila4 and Athila6 elements, two of the most intact Athila families. Transcripts terminated at a similar position within the LTR, thereby defining the LTR R/U5 boundary. cDNA as large as 8.4 kb was recovered; however, no spliced messages were identified. Although Athila elements are expressed in ddm1 backgrounds, they are probably not replicating because of sequence degeneracy. For future studies, it will be important to identify a functional Athila group element. We envision two approaches for how this might be accomplished: 1) a consensus Athila4 element could be constructed or 2) elements could be further characterized from species such as the small grains that appear to have structurally intact elements. The identification of a replication-competent Athila group element will be necessary to test the hypothesis that these elements are infectious plant retroviruses. If this proves to be the case, the Athila group elements may be useful as vectors for gene transfer and the genetic modification of plants.
| |
METHODS |
|---|
|
|
|---|
DNA Manipulations and Filter Hybridizations
A soybean genomic
phage library (Chen et al. 1998
) was screened
with a reverse transcriptase probe under low stringency conditions
(50°C with a 1% SDS wash; Ausubel et al. 1987
). The probe was
obtained by PCR amplification of Pisum sativum DNA
using primers based on the Cyclops-2 reverse transcriptase
(DVO701 5'-CCG-TCA-TCC-GGA-ATG-ACA-AGG-ATG and DVO702
5'-ACG-GAT-GAG-CCT-TTG-CTT-CGA-ATC). Phage subclones were sequenced by
primer walking. Genomic DNAs from 18 plant species (see Results) were
surveyed by PCR to identify Athila-group reverse transcriptases. DNAs were prepared using genomic tips and protocols supplied by Qiagen. Degenerate primers were designed based on two
conserved amino acid sequence motifs flanking the seven core domains of
reverse transcriptase (Xiong and Eickbush 1990
; VRKEVLKL, DVO1197
5'-GTG-CGN-AAR-GAR-GTN-NTN-AAR-YT, and FIKDFSKV, DVO1198 5'-AAC-YTT-NGW-RAA-RTC-YTT-DAT-RAA). PCR was performed in 50 µL reactions with ~100 ng genomic DNA, 3 µmole of each primer, 2.5 units Taq DNA polymerase, 1
Taq buffer (Promega), and 2.5 mM MgCl2. PCR was performed for 30 cycles under the following
conditions: 92°C for 20 sec, 50°C for 30 sec, and 72°C for 90 sec. The PCR products were purified on low-melting agarose gels and
cloned into T-vector prepared from pBluescript II KS- (Hadjeb and
Berkowitz 1996
). Athila-group reverse transcriptases were
sequenced in their entirety from vector-based primers.
Sequence Analysis
DNA Sequence analysis was performed using the GCG
software package (Devereux et al. 1984
), DNA Strider 1.2 (Marck 1991
), and the BLAST search tool (Altschul et al.
1990
). Phylogenetic relationships were determined by the
neighbor-joining distance algorithm using PAUP v4.0 beta
4a (Saitou and Nei 1987
; Swofford 1991
) and were based on
reverse transcriptase amino acid sequences that had been
aligned with CLUSTALX v1.63b (Thompson et al. 1994
).
Transmembrane helices were identified using the PHDhtm
program and TMpred (Hofmann and Stoffel 1993
; Rost et
al. 1995
). Splice-site analysis was performed with
NetGene2 (Brunak et al. 1991
; Hebsgaard et al. 1996
). All
DNA sequences have been submitted to the DDBS/EMBL/GenBank databases. The Calypso elements are under accession numbers
AF186182, AF186183, AF186184, AF186185, and AF186186. BAC or P1 clone
numbers for the Ty3/gypsy reverse transcriptases are listed in
the Figure 1 legend. Accession numbers for the Athila4 elements are listed in the Figure 2 legend. The accession numbers of
the Athila-group reverse transcriptases from various species are AF378012 to AF378081. Additional details regarding these sequences
can also be found at our Web site
(http://www.public.iastate.edu/~voytas/).
RT-PCR
Total RNA was isolated from A. thaliana ddm1 plants using the PUREscript RNA isolation kit (Gentra Systems, Inc.). RNA was annealed to the primer DVO1247, which is a poly(T) oligo with a specific tail (5'-GGA-CTT-CAG-GAC-TGC-TTG-ACA-AAG-T30). First-strand DNA synthesis was performed at 42°C for 2 h using Superscript II reverse transcriptase and the manufacturer's protocol (GIBCO BRL). RNase activity was inhibited by the addition of Super RNase IN per the manufacturer's instructions (Ambion). PCR was carried out using the Expand Long Template PCR System (Roche Molecular Biochemicals) with Athila-element-specific primers, along with DVO1248, which is specific to the tail of DVO1247.
| |
ACKNOWLEDGMENTS |
|---|
We thank Jim Keck for assistance with the figures and members of the Voytas lab for helpful comments on the manuscript. This work was supported by a grant from Phytodyne, Inc., the Center for Advanced Technology Development at Iowa State University, and NIH grant number R41 GM61420. This is journal paper No. J-19446 of the Iowa Agriculture and Home Economics Experiment Station, Ames, Iowa, project No. 3383 and was supported by Hatch Act and state of Iowa funds.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Present address: Phytodyne, Inc., 2901 South Loop Drive, Building 3, Suite 3515, Ames, IA 50010, USA.
2 Corresponding author.
E-MAIL voytas{at}iastate.edu; FAX 515-294-7155.
Article published on-line before print in December 2001: Genome Res., 10.1101/gr.196002.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.196001.
| |
REFERENCES |
|---|
|
|
|---|
a database of membrane spanning protein segments.
Biol. Chem. Hoppe-Seyler
347:
166.Received May 10, 2001; accepted in revised form July 15, 2001.
This article has been cited by other articles:
![]() |
C. Llorens, R. Futami, D. Bezemer, and A. Moya The Gypsy Database (GyDB) of mobile genetic elements Nucleic Acids Res., January 11, 2008; 36(suppl_1): D38 - D46. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Toth, G. Deak, E. Barta, and G. B. Kiss PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W708 - W713. [Abstract] [Full Text] [PDF] |
||||