|
|
|
|
Vol. 12, Issue 6, 930-943, June 2002
LETTER
|
| |
ABSTRACT |
|---|
|
|
|---|
We identified putative long terminal repeat- (LTR) retrotransposon sequences among the 50,000 random sequence tags (RSTs) obtained by the Génolevures project from genomic libraries of 13 Hemiascomycetes species. In most cases additional sequencing enabled us to assemble the whole sequences of these retrotransposons. These approaches identified 17 distinct families, 10 of which are defined by full-length elements. We also identified five families of solo LTRs that were not associated with retrotransposons. Ty1-like retrotransposons were found in four of five species that are phylogenetically related to Saccharomyces cerevisiae (S. uvarum, S. exiguus, S. servazzii, and S. kluyveri but not Zygosaccharomyces rouxii), and in two of three Kluyveromyces species (K. lactis and K. marxianus but not K. thermotolerans). Only multiply crippled elements could be identified in the K. lactis and S. servazzii strains analyzed, and only solo LTRs could be identified in S. uvarum. Ty4-like elements were only detected in S. uvarum, indicating that these elements appeared recently before speciation of the Saccharomyces sensu stricto species. Ty5-like elements were detected in S. exiguus, Pichia angusta, and Debaryomyces hansenii. A retrotransposon homologous with Tca2 from Candida albicans, an element absent from S. cerevisiae, was detected in the closely related species D. hansenii. A complete Ty3/gypsy element was present in S. exiguus, whereas only partial, often degenerate, sequences resembling this element were found in S. servazzii, Z. rouxii, S. kluyveri, C. tropicalis, and Yarrowica lipolytica. P. farinosa (syn. P. sorbitophila) is currently the only yeast species in which no LTR retrotransposons or remnants have been found. Thorough analysis of protein sequences, structural characteristics of the elements, and phylogenetic relationships deduced from these data allowed us to propose a classification for the Ty1/copia elements of hemiascomycetous yeasts and a model of LTR-retrotransposon evolution in yeasts.
| |
INTRODUCTION |
|---|
|
|
|---|
The Génolevures project used a novel approach to
evolutionary genomics (FEBS Lett. 2000, special issue 487). Comparison
of approximately 50,000 random sequence tags (RSTs) from 13 yeasts selected across the entire Hemiascomycetes class (see Kurtzman and
Robnett 1998
for phylogenetic relationships between these species and
Souciet et al. 2000
) provided a wealth of sequence information on
genetic redundancy, the functional classification of genes, and the
conservation of synteny.
This analysis also sought repeated sequences. Indeed, an understanding
of repetitious elements can be of great value in sequence assembly.
Entities such as retrotransposons are known to play a role in
remodeling genomes; first when they transpose into new sites and second
when they are subjected to homologous recombination, leading to
chromosomal rearrangements (Zolan 1995
; Kim et al. 1998
) such as
reciprocal translocations.
One ubiquitous group of retrotransposons contains long terminal repeats
(LTRs) at both extremities of the element. Different types of LTR
retrotransposons exist in a wide range of eukaryotes including insects,
plants, fungi, yeasts, and fishes. Recently, fossils of LTR
retrotransposons were identified in mammals at a very low copy number
(Volff et al. 2001
). The structure of LTR retrotransposons is
comparable to that of the retroviruses that replicate via mRNA
intermediates (Boeke 1989
). Two genes are commonly found in LTR
retrotransposons. These genes are homologs of the retroviral
gag and pol genes. The gag gene of
retroviruses encodes structural proteins of the viral particle and the
retroviral pol locus encodes a polyprotein with protease (PR),
integrase (IN), reverse transcriptase (RT), and RNAseH (RH) catalytic
domains. The arrangement and functions of these entities in LTR
retrotransposons correspond to those in retroviruses. Some elements,
such as gypsy from Drosophila melanogaster, harbor a
third gene, which is homologous with the retroviral env gene
encoding the protein for the envelope of infectious viral particles.
LTR retrotransposons of fungi have been divided into two distinct
groups on the basis of sequence similarities of their RTs (Xiong and
Eickbush 1990
) and the organization of the subunits within their
pol genes. In the Ty1/copia group, these subunits are
arranged in the order PR, IN, RT, and RH, whereas in the
Ty3/gypsy group the order is PR, RT, RH, and IN.
LTR retrotransposons have been extensively studied in the model yeast
Saccharomyces cerevisiae. Five distinct families of retrotransposons exist in this organism: four Ty1/copia
elements (Ty1, Ty2, Ty4, and Ty5) and one Ty3/gypsy element
(Ty3). Only three of these families are known to be transpositionally
active in S. cerevisiae; namely, Ty1, Ty2, and Ty3. Ty5
elements from S. cerevisiae are either solo LTRs or degenerate
elements that have accumulated several deleterious mutations (Voytas
and Boeke 1992
). However, intact and active copies of Ty5 have been
found in S. paradoxus, a closely related species to S. cerevisiae (Zou et al. 1996
). The complete nucleotide sequence of
S. cerevisiae revealed 52 different Ty elements (Goffeau et
al. 1996
) and thus provided a unique opportunity to study genome
organization (Kim et al. 1998
), evolution (Jordan and McDonald 1998
;
1999b
), and the coevolution of the mobile elements and their host
(Jordan and McDonald 1999a
).
More recent, investigations on Candida albicans identified 34 different LTR-retrotransposon families that belong to the
Ty1/copia and Ty3/gypsy groups (Goodwin and Poulter
2000
). Most of these families only contain solo LTRs or LTR remnants.
Only three different full-length and intact retrotransposons were
identified. These are (1) Tca2 (or pCa1), which is quite unusual
because it carries two open reading frames (ORFs) separated by a stop
codon and produces many extrachromosomal DNA copies (Matthews et al.
1997
); (2) Tca5, which has a similar structure and sequence to Ty5
(Plant et al. 2000
); and (3) Tca4, which is a Ty1/copia
element close to Tca2 (Goodwin and Poulter 2001
).
Other hemiascomycetous yeasts such as Yarrowia lipolytica
(Schmid-Berger et al. 1994
) are known to contain LTR retrotransposons, but no transposable elements have been described in the
Zygosaccharomyces, Kluyveromyces, or Debaryomyces
yeast genera, or in the Saccharomyces sensu lato group. The
Génolevures project provided evidence for the presence of LTR
retrotransposons in some of the 13 yeast species studied (Blandin et
al. 2000a
,b
; Bolotin-Fukuhara et al. 2000
; Bon et al. 2000a
,b
;
Casaregola et al. 2000
; Lépingle et al. 2000
; Llorente et al. 2000b
;
Neuvéglise et al. 2000
).
Therefore, on the basis of the data of Génolevures, we attempted to characterize fully and to compare the LTR retrotransposons present in the 13 hemiascomycetous yeasts by determining the full sequences of some of these elements. Thorough analysis of the data revealed phylogenetic relationships between some of the elements and enabled us to suggest a classification system for the Ty1/copia elements of hemiascomycetous yeasts.
Identification of New Retrotransposon Sequences
Our study was based on the Génolevures sequence data, which
includes 49,199 RSTs from 13 hemiascomycetous yeasts (Souciet et al.
2000
). The partial sequencing of each genome provided a coverage of
approximately 0.2× (2500 RSTs) of the S. exiguus, S. servazzii,
S. kluyveri, K. marxianus, D. hansenii, and C. tropicalis genomes, and a coverage of approximately 0.4× (5000 RSTs) of the S. bayanus (syn. S. uvarum), Z. rouxii, K. lactis, Pichia angusta, P. farinosa (syn. P. sorbitophila), and Y. lipolytica genomes. All of the RSTs
have been compared with a database of Ty protein sequences (H. Feldmann, unpubl.). Because the copies of Ty5 elements in S. cerevisiae are degenerate, all of the RSTs were compared with the
Ty5 elements of S. paradoxus. Table
1 lists the RSTs that match to Ty elements.
The RSTs of interest were thoroughly analyzed and manually assembled to
avoid low-quality contigs. Then, LTRs associated to the identified
full-length elements were defined by comparison of their 5' and 3'
extremities. Finally, repeated sequences were screened for the presence
of solo LTRs. We used the internal repeats TGTTG...CAACA that bound
the LTR and the 5 bp of target site duplication, whenever they were
present, in addition to the breaking points of sequence homology
between two different copies, to define the border of the LTRs.
|
This screening revealed a large variation in match number: from zero in
K. thermotolerans and P. farinosa to 45 in S. exiguus (Table 1). No matches were detected when the Ty library was
compared with the RSTs from K. thermotolerans and P. farinosa (de Montigny et al. 2000
). This indicates that these
strains possess few or no Ty-like elements. Alternatively, the sequence
of the elements from these yeast species may be so divergent from
conventional Ty elements that they were not identified. In a few
species (S. servazzii, S. exiguus, S. kluyveri, and D. hansenii), the RSTs matched different types of Ty elements. This
implies that, like in S. cerevisiae, different elements exist
in a single host. A systematic nomenclature was used to name the newly
identified LTR retrotransposons: T for Transposon followed by the
initials of the genus and species of the yeast and a number referring
to the S. cerevisiae homologs (1 for Ty1 or Ty2, 3 for Ty3, 4 for Ty4, and 5 for Ty5) or to the C. albicans homolog (2 for
Tca2). For example, Tpa5 is a P. angusta element that is
homologous to Ty5, whereas Tdh2 is from D. hansenii and is
homologous to Tca2. Whenever several highly divergent copies of the
same element type were identified, a decimal number was added to the
name, for Tkl1.1 and Tkl1.2 and Tse5.1 and Tse5.2 (see below). In all
other families of full-length elements, internal variability was less
than 1.2% and did not affect the creation of the consensus sequences.
We identified a total of 17 families of LTR retrotransposons in the 13 yeast species (Table 1), a family being defined by the set of copies of
a particular retrotransposon, itself defined by structural features and
by sequence conservation, in a given yeast species.
Elements of three of the 17 families were assembled as complete
consensus sequences from RST data (Tpa5, Tkm1, Tsu4), and elements of a
further seven were described after additional sequencing (Tse1, Tse3,
Tse5, Tsk1, Tkl1, Tdh2, and Tdh5). The remaining seven families are
described from incomplete sequence information. These seven families
were present at low copy number, all (except Tss1) were Ty3 homologs
and were (except Tct3) highly degenerate. Table
2 lists the main characteristics of these
newly identified elements in comparison with known LTR retrotransposons
of hemiascomycetous yeasts.
|
Classification Based on Structural Features and Sequence Similarity
On the basis of their structural characteristics and sequence similarity, the 17 newly identified retrotransposon families were divided into five groups, which also included previously described elements from other yeasts (Table 2).
The first group (Ty1-like elements), which is a very homogeneous group,
comprises Ty1 and Ty2 together with the five new homologs of Ty1. These
are the partially sequenced element Tss1 and four families of
completely described elements (Tse1, Tkl1, Tkm1, and Tsk1). Two
different copies of Tkl1 were identified in the K. lactis
strain studied, Tkl1.1 and Tkl1.2 (Fig.
1). All of the Ty1-like elements are
approximately 5.9 kb long, with the exception of Tse1 from S. exiguus, which is only 5.6 kb long. Tkl1.1 is only 5425 bp in
length, but the copy sequenced has lost the first 189 amino acids (aa)
of gag (Fig. 1). All of the completely sequenced elements in
the Ty1-like group have two overlapping ORFs separated by a +1
frameshift occurring within the highly conserved sequence CUUAGGC in
the region of the overlap (Voytas and Boeke 1992
). This indicates that
the frameshift mechanism has also been conserved in these elements. The
lengths of the LTRs are also conserved, varying from 322 bp in S. kluyveri to 424 bp in S. exiguus. All of the LTRs are
terminated by the dinucleotides 5'-TG...CA-3' or longer terminal
inverted repeats. In Tkm1, the terminal inverted repeat is
5'-TGTTG...CAACA-3'.
|
The second group (Ty4-like elements) constitutes Ty4 and the newly described related retrotransposon Tsu4. The organization of Tsu4 and Ty4 is similar to that of the Ty1-like elements, with two ORFs subjected to the same frameshift mechanism. They have different primer binding sites (PBSs) (see below) and the sizes of ORF1 and ORF2 differ slightly: ORF1 is generally larger in Ty1-like elements and ORF2 is generally larger in Ty4 and Tsu4. The amino acid sequences of the peptides derived from ORF2 are conserved between Ty1-like and Ty4-like elements (29.6% identity and 41.1% similarity for 1240 aa), whereas their gag proteins are less well conserved and difficult to align. The nucleic acid binding motif CX2CX4HX4H of gag is found in the Ty4-like elements but not in the Ty1-like elements.
The third group (Ty5-like elements) constitutes Ty5 and Tca5 together with the newly described retrotransposons Tdh5, Tpa5, and Tse5. These elements are characterized by a single ORF corresponding to a gag-pol gene fusion. The lengths of the ORFs are variable: 1417 aa in Tpa5 compared with 1698 aa in S. paradoxus Ty5. The lengths of the LTRs are also highly variable: from 251 bp in Ty5 to 685 bp in Tca5. The terminal inverted repeats are more highly conserved than in Ty1-like elements, consisting of at least 5 nt: 5'-TGTTG...CAACA-3'. Two related elements were identified in S. exiguus: Tse5.1 and Tse5.2. The sequences of their pol genes are highly conserved (97.4% identity for 1299 amino acids), with the exception of a 34 aa deletion in Tse5.1 located within the tether region, between IN and RT. Despite these differences, we consider that Tse5.1 and Tse5.2 belong to the same family.
The fourth group (Tca2-like elements) comprises Tca2, Tca4, and the new
element Tdh2. Although initially detected because of its homology with
Ty1, Tdh2 is more closely related to Tca2, which is not found in
S. cerevisiae. In addition to having similar nucleotide
sequences, Tdh2 and Tca2 share several characteristic features. ORF1
and ORF2 are in the same phase, separated by a stop-codon (UAA in Tdh2
and UGA in Tca2). This arrangement is similar to that found in some
mammalian retroviruses (Yoshinaka et al. 1985
) but is unique in LTR
retrotransposons. No purine-rich sequence occurs downstream from the
UAA codon in Tdh2, a sequence required for the suppression of the UGA
stop codon in MMLV and Tca2 (Matthews et al. 1997
). Another structural
peculiarity of the two elements is the occurrence of imperfect 6-bp
inverted repeats at the ends of the LTRs.
The fifth group (Ty3/gypsy elements) consists of the Ty3
homologs including the seven new elements Tse3, Tss3, Tzr3, Tsk3, Tdh3,
Tct3, and Tyl3. Five of these elements were found to be degenerate
because of an accumulation of deletions and/or point mutations that
introduce stop codons in the ORFs. For example, when the three copies
of the Tzr3 element in Z. rouxii were partially sequenced,
they were found to differ (3% nt divergence) and to contain a large
number of stop codons. A sixth element (Tct3) present in a low copy
number in C. tropicalis may be intact as no stop codons were
detected in the 1360 nt sequenced. The only completely sequenced
Ty3/gypsy element that appears to be structurally intact is
Tse3 from S. exiguus. Its two overlapping ORFs are separated by a +1 frameshift that probably occurs within the heptanucleotide AUUAGUA, if the Ty3 model of translational frameshifting has been conserved (Voytas and Boeke 1992
).
LTRs Not Associated with Complete Retrotransposons
In some yeast species, LTRs not associated with complete
retroelements were detected either by analysis of repeated sequences or
because they were located at strategic sites such as translocation breakpoints or in the vicinity of tRNA genes or other retrotransposons. In K. thermotolerans and P. farinosa, repeated
sequences were systematically screened because no homology with Ty
proteins was detected. We identified two putative LTR sequences in
K. thermotolerans, LTRkt1 and LTRkt2, but none in P. farinosa. The "long" version of LTRkt1 is 261 bp long and the
"short" version is 243 bp long. The LTR is surrounded by 5'
TGT...ACA 3' and was found in at least 11 RSTs corresponding to
nine different loci. At each locus, the putative LTR is flanked by a 5 bp direct repeat corresponding to a duplicate of the target site (Fig.
2). In addition, at each locus except one,
at least one tRNA gene was located 59 to 76 bp upstream of or
downstream from the LTR (Table
3). This
indicates that this putative LTR integrates at highly specific sites,
near tRNA genes, which is consistent with the findings in S. cerevisiae (Eigel and Feldmann 1982
; Kim et al. 1998
).
|
|
|
LTRkt2, the second putative LTR in K. thermotolerans, was found as a repeated sequence in 10 RSTs corresponding to nine different loci. The element is 293 or 417 bp long. The longer version of LTRkt2 is often flanked by 5' TATTG...TGACA 3' and the shorter version is often flanked by 5' TACGA...TGACA 3', but some copies have accumulated point mutations.
A repeated sequence was identified as a putative LTR in Y. lipolytica (Casaregola et al. 2000
). This 273-bp sequence, LTRyl1, is present in 18 RSTs and contains the characteristic TGTTG repeat at
the 3' end and the characteristic CAATA repeat at the 5' end. Five of
the 18 LTR sequences are surrounded by 5-bp repeats, which account for
the duplication of the target site.
In P. angusta, a repeated sequence (LTRpa1) was found
preferentially associated with other LTR sequences or Tpa5
retroelements in 19 RSTs corresponding to 17 different loci (Blandin et
al. 2000a
). LTRpa1 is flanked by TGTTG...CAACA and has a mean
length of 265 bp.
When the synteny breakpoints of S. bayanus were sequenced, a
putative solo LTR of 331 bp flanked by a 5-bp duplication site was
found in S. bayanus (Fischer et al. 2001
). This LTR (LTRsu1) possesses the imperfect internal repeats TGTTG...CAATA and is
highly repeated in S. bayanus genome, occurring in 59 RSTs.
Some of the copies seem to be intact, whereas others are degenerate or
truncated. We identified tRNA genes in approximately 40% of the RSTs
containing LTRsu1, indicating that tRNA genes are hotspots for LTRsu1 integration.
P. farinosa is currently the only yeast species in which no known LTR retrotransposons or remnants are known.
Insertion Sites
It is well known that all S. cerevisiae Ty elements are
target specific. Ty1-4 elements insert upstream of genes transcribed by RNA polymerase III, such as tRNA genes (Chalker and Sandmeyer 1992
;
Kim et al. 1998
), whereas Ty5 integrates preferentially into silent
chromatin regions, such as at telomeres and mating loci (Zou et al.
1996
). We systematically screened RSTs carrying solo LTRs or 3' LTRs
for tRNA genes and for genes that are known to be subtelomeric or
telomeric in S. cerevisiae. This analysis was probably biased
because we do not have the complete genome sequence but only random
sequences of approximately 1 kb on average. For example, in S. cerevisiae, Kim et al. (1998)
considered that Ty1 insertions were
tRNA-associated if they were located within 750 bp of a tRNA gene.
Thus, we probably underestimated the number of associations.
There is no proof that the Ty5-like elements considered here
preferentially insert themselves into subtelomeric regions, because these regions evolve rapidly in yeasts and could not be identified during the Génolevures program (Llorente et al. 2000a
).
A few retroelements follow a nonrandom mode of integration (Table 3). Five elements (LTRsu1, Tsu4, Tse3, Tsk1, and LTRkt1) are preferentially integrated into tRNA gene regions. At least one tRNA gene was identified in 35% to 40% of the RSTs containing LTRsu1, LTRs of Tsu4, Tse3, or Tsk1 and in all RSTs except one containing LTRkt1. For all other elements, we do not have enough information to draw a conclusion on their targeting specificity.
We observed two modes of integration specificity, as in S. cerevisiae. One type of targeted integration, as observed for Tse3, was found to be very precise. It always occurs 13-19 bp upstream of a tRNA gene as described for Ty3, indicating conservation of the integration mechanism of Tse3. Conversely, the four other LTR retrotransposons that carry out targeted integration have a wider target preference. Integration occurred between 58 bp and 697 bp upstream of, and sometimes downstream from, a tRNA gene. The distance is often increased if there is another LTR sequence between the LTR in question and the tRNA gene (transposition hotspot).
Copy Numbers
We tried to estimate the copy number of each retrotransposon without
taking into account the solo LTRs. First, we measured the number of
RSTs containing part of the full-length retrotransposons (Table 1).
Considering the percentage of genes identified per genome
(Génolevures 2000
), and the size of the retrotransposons, we calculated the number of each element per genome. In some cases, we
then used Southern hybridization with internal probes (not shown). For
instance, in S. kluyveri, we identified 24% of the genes
(Neuvéglise et al. 2000
) and found 11 RSTs matching Ty1 or Ty2.
Considering that one RST corresponds to 1/6 of Tsk1, the estimated
number of elements per genome is 7.6. Interestingly, we observed eight
bands on Southern blot hybridization with one Tsk1 internal probe on
different digestions of S. kluyveri genomic DNA.
We found that the number of elements per genome is highly variable but lower than the number of Ty1 elements in S. cerevisiae. The retrotransposons could be classified into three groups based on their copy number: (1) highly repeated elements (Tse1, Tsu4, and Tdh5) present at 15-20 copies per genome; (2) moderately repeated elements (Tkm1, Tsk1, Tpa5, Tse5, and Tse3) with 8-15 copies per genome; and (3) weakly repeated elements (Tkl1.1 and Tkl1.2, Tdh2, Tdh3, Tsk3, Tss3, Tyl3, Tzr3, and Tct3). These highly and moderately repeated elements all seemed to be intact and to be potentially active. All identified copies of the weakly repeated elements, except Tdh2, have accumulated stop codons or deletions and are therefore defective. It is still unclear whether full-length copies of the degenerate Ty3/gypsy elements exist in the genome or not.
PBS
Most LTR retrotransposons use a specific tRNA from their host as a
primer for RT. A short sequence (8-49 nt) of the retroelement located
immediately downstream from the 5' LTR, termed the PBS, is
complementary to part of this tRNA molecule. In S. cerevisiae, the PBS of Ty1, Ty2, and Ty3 retroelements are complementary to the 3'
acceptor stem of the initiator methionine tRNA (tRNAiMet),
whereas the PBS of Ty5 is complementary to 13 nt from an internal portion of tRNAiMet that includes the anticodon stem-loop
(Voytas and Boeke 1993
). S. cerevisiae Ty4 is an exception to
this rule because its PBS is complementary to the 3' end including the
acceptor stem of S. cerevisiae tRNAAsn with one
mismatch (Stucka et al. 1992
). New types of PBS were detected in
C. albicans (Goodwin and Poulter 2000
). For example, the PBS
of Tca1 and Tca2 are complementary to an internal fragment of the
tRNAarg(UCU).
We searched the newly identified elements for potential PBS and the tRNA genes for complementary sequences. As expected, the PBS of these elements were complementary to sequences within tRNAs that were homologous to their counterparts in S. cerevisiae or C. albicans. Elements homologous to Ty1 or Ty3 contained PBS that were complementary to the 3' acceptor stem of a tRNAiMet. We confirmed this in Tse1, Tkl1-2, and Tse3 by comparing the sequences of the tRNAiMet identified in S. exiguus and K. lactis with the sequences of the retroelements. We noticed that the PBS were longer in these elements than in Ty1, Ty2, or Ty3: 12 nt in Tse1 and Tse3 and 13 nt with one mismatch in Tkl1.2 (Fig. 3). For Tsk1 and Tkm1, we had to compare the sequence with that of S. cerevisiae Ty1 PBS because no tRNAiMet genes were found in the corresponding RSTs. PBS were also found to be highly conserved among Ty5-like elements, being complementary to the anticodon stem-loop of tRNAiMet. Comparisons with the nucleotide sequence of tRNAiMet genes of S. exiguus and P. angusta showed that the PBS of Ty5-like elements are longer in Tse5 and Tpa5 than in Ty5 or Tca5.
|
Unusual PBS were found in Tsu4 and Tdh2. The PBS of Tsu4 is
complementary to 22 nt of the 3' end of a tRNAAsn whereas Ty4
is complementary to 23 nt (Stucka et al. 1992
), although both have the
same mismatch at position 16 (Fig. 3). The sequence of the
tRNAAsn gene from S. bayanus is identical to that of
S. cerevisiae. The PBS in Tsu4 and Ty4 are among the longest
known PBS, although the longest one is the Tcn10 PBS from the
basidiomycetous yeast Cryptococcus neoformans (Goodwin and
Poulter 2001
), which is 49 bp. In Tdh2, the PBS is complementary to an
internal fragment of tRNAArg(UCU) including the anticodon
stem-loop as described for its C. albicans homolog, Tca2
(Matthews et al. 1997
). The identification of tRNAArg(UCU) in
D. hansenii confirms that the Tdh2 PBS (15 nt) is longer than the Tca2 PBS (11 nt).
Conservation and Diversity of the ORFs of Yeast Retroelements
Sequence homologies between conserved coding regions within the ORFs of the newly identified LTR retrotransposons allowed us to align the amino acid sequences of all members of the five groups described above.
TYA is known to be extremely variable in different yeast
species such that their sequences cannot be aligned. Only some of the
nucleic acid-binding regions located at the carboxy terminus had
homology with Ty3, Ty4, or Ty5. We found that the
CX2CX4HX4H motif was conserved in all
Ty5-like elements except in Tse5.1 and Tse5.2 (Fig.
4A). This motif is also conserved in Ty4
and Tsu4. This motif is found in Ty3 and Ylt1, which is a Y. lipolytica Ty3/gypsy element (Schmid-Berger et al. 1994
)
but is quite degenerate in Tse3.
|
It is known that Ty1 and Ty2 elements lack the
CX2CX4HX4H consensus motif (Jordan and
McDonald 1999b
) but possess a nucleic acid-binding motif that is
homologous to a prokaryotic consensus DNA-binding sequence (Clare and
Farabaugh 1985
). Tsk1 is the only Ty1-like element that has a similar
sequence to the putative nucleic acid-binding motives of Ty1 and Ty2.
As in Tca2 (Matthews et al. 1997
), no conserved motives were found in
the first ORF of Tdh2.
The amino acid sequence of TYB is generally more stringently conserved than that of TYA. We aligned the sequences of the newly identified retroelements for each of the four internal TYB domains. The RT sequences were highly conserved, which allowed us to align them with other Ty1/copia and Ty3/gypsy elements. This alignment is available at http://www.inra.fr/clib/english/genolevu.htm.
The PR, IN, and RH domains of the Ty1/copia and Ty3/gypsy elements were too divergent to allow extensive alignments. Thus, we could only align the highly conserved boxes (Fig. 4B,C). The active site of the endogenous protease (D residues), which follows three to five hydrophobic acids, was found to be conserved in all hemiascomycetous retroelements and related LTR retrotransposons from other eukaryotes (Fig. 4B). The zinc finger, or HHCC domain, of the IN is highly conserved among all Ty1/copia elements, although the length of the "loop" between HH and CC varies. In contrast, Ty3/gypsy zinc fingers are less well conserved and the Tse3 zinc fingers appear to be degenerate (Fig. 4C).
Phylogenetic Relationships
The PR, IN, and RH domains were highly divergent and could not be
used to deduce phylogenetic relationships between the various elements.
Thus, we choose to base our phylogenetic analysis on the seven
universally conserved domains of RT as described previously (Xiong and
Eickbush 1990
). The resulting phylogenetic tree (Fig. 5) comprises the hemiascomycetous
retroelements and retrotransposons belonging to other eukaryotic
genomes (plants: Ta1-3 and Tnt; Drosophila: copia, gypsy,
1731; fungi: CfT-1 and MAGGY; and yeasts other than
hemiascomycetes: Tf1 from S. pombe, RF3, RF5, Tcn1, Tcn2,
Tcn3, Tcn4, Tcn5, Tcn6, and Tcn9 from the basidiomycetous yeast C. neoformans). The tree was rooted with L1Hs, a non-LTR retrotransposon from Homo sapiens.
|
The Ty1/copia part of the tree is clearly divided into five clades corresponding to the Ty1-like, Ty4-like, Ty5-like, Tca2-like, and other Ty1/copia elements from nonhemiascomycetes. The four clades of hemiascomycetous elements support the clustering, which was established on the basis of LTRs, ORFs, frameshifting capacities, and PBS.
All known gypsy elements and Tse3 are located on a branch that is clearly distinct from the other branch, which contains all of the Ty1/copia elements (Fig. 5). Despite its original features, Tse3 is grouped with Ty3 and is phylogenetically distant from the retrotransposons of other yeasts such as Tca3 or Tca8 from C. albicans or Tcn2-5 from C. neoformans.
| |
DISCUSSION |
|---|
|
|
|---|
We used the Génolevures (2000)
sequence data to identify 17 different families of LTR retrotransposons and five families of solo
LTRs in 13 hemiascomycetous yeasts. We did not find any new types of
LTR retrotransposon, although we identified a Tca2-like element (Tdh2)
by using S. cerevisiae Ty proteins as a bait. Unlike S. cerevisiae (Goffeau et al. 1996
), C. albicans (Stanford
DNA sequencing and Technology Center,
http://www-sequence.stanford.edu/group/candida), and C. neoformans (Stanford DNA sequencing and Technology Center, http://www-sequence.stanford.edu/group/C.neoformans/index.html), the
complete genome sequences of these 13 strains are not yet available.
Therefore, our genome survey remains nonexhaustive and other LTR
retrotransposons or particular remnants will probably be identified
when complete genomic sequences become available. Nonetheless, we
managed to obtain a reliable picture of transposable elements within
the hemiascomycetes class.
Host Response toward LTR Retrotransposons
A first interesting conclusion that emerges from this work is that
the studied yeast species all contain less LTR retrotransposon families
than S. cerevisiae. Some species, such as P. farinosa, appear to be devoid of LTR retrotransposons or to possess
only remnants of solo LTRs like K. thermotolerans. Several of
the other yeast species harbor a limited number of families of
nonfunctional or degenerate elements. This is particularly true for
species that only contain Ty3/gypsy elements: Z. rouxii
and S. servazzii. S. exiguus and, to a lesser extent,
D. hansenii contain a similar pattern of LTR retrotransposons
to S. cerevisiae: S. exiguus harbors at least three different
families of retroelements (Ty1-like, Ty3/gypsy, and Ty5-like),
all of which seem to be intact and potentially active. Surprisingly,
the differences were greatest between C. albicans and its most
closely related species, C. tropicalis. Only one family was
detected in C. tropicalis, whereas C. albicans carries 34 distinct families of LTR retrotransposons, as well as
non-LTR retrotransposons and class II transposable elements (Goodwin
and Poulter 2000
; Goodwin et al. 2001
).
This indicates that the host response to transposable elements is species-dependent rather than being linked to their phylogenetic position or to their ability to reproduce sexually. Clearly, genomes interact in various manners with LTR retrotransposons. Some species such as S. cerevisiae, S. exiguus, S. kluyveri, K. marxianus, and D. hansenii tend to conserve intact and probably active transposons, but this propensity is clearly not congruent with their phylogenetic relationship. Other species, such as K. lactis, S. bayanus, or S. servazzii tend to reduce the copy number and, moreover, to inactivate particular (or all) copies until only remnants of solo LTRs remain. Considering the paucity of elements in S. bayanus compared with S. cerevisiae, it seems that several transposons may have been lost simultaneously after recent speciation events of the host.
There seem to be two mechanisms for retrotransposons removal in
hemiascomycetous genomes. The first mechanism involves the gradual
erosion of LTR retrotransposons as a result of accumulation of point
mutations or minor deletions, as observed in Z. rouxii and
K. lactis. In these cases, the copy number per genome is low and the retrotransposons are getting lost. There is no evidence that
this mechanism corresponds to repeat-induced point mutations (RIP) or
methylation-induced premeiotically (MIP). These mechanisms are known to
occur in fungi, in Ascobolus (Goyon and Faugeron 1989
) and
Neurospora (Cambareri et al. 1989
), in which they silence or
inactivate retrotransposons by generating an accumulation of point
mutations as a result of G-C to A-T transitions.
The second mechanism, the most documented in S. cerevisiae
(Jordan and McDonald 1999a
), delete elements through LTR-LTR
recombination, leaving only LTR remnants. In these cases, the copy
number is sometimes high probably because of a dynamic equilibrium
between the frequency of the transposition events, which tend to
increase the number of elements, and the host's regulatory mechanisms, which tend to decrease the number of high copy number elements. Nakayashiki et al. (2001)
recently introduced MAGGY into a naive genome
of Magnaporthe grisea and showed that after equilibrium was
reached at 20-30 copies, MAGGY was repressed by a mechanism that is
not directly dependent on methylation and that appeared rapidly after
the genome invasion. The retrotransposons copy number in the
hemiascomycetous yeasts studied does not seem to exceed 20, whereas
there are 32 in the sequenced S. cerevisiae strain. Several
other methylation-independent mechanisms that repress the transposable
elements have been described in Drosophila, but it is not
known whether such mechanisms of transposon repression and inactivation
exist in yeasts.
A New Subdivision of Ty1/copia Elements in Hemiascomycetes
We used the sequence data for full-length elements accumulated during this study to propose a phylogeny of the hemiascomycetous LTR retrotransposons. This phylogeny is also supported by sequence similarities and conservation of structural features (LTR conservation, organization of the ORF2 domains, frameshifts, and PBS) among the Ty1/copia hemiascomycetous elements. Therefore, we propose a subdivision of the hemiascomycetous Ty1/copia elements into four clades. Each of these clades is defined by a typical element, namely the first one historically described, such as Ty1 (Ty2 belongs to the Ty1 clade), Ty4 and Ty5 from S. cerevisiae, and Tca2 from C. albicans. The clade names refer to their typical elements, that is, Ty1-like, Ty4-like, Ty5-like, and Tca2-like (Fig. 5). These four clades are clearly distinct, each of them being monophyletic and only including retrotransposons from hemiascomycetous yeasts (see Fig. 5). In addition, the phylogenetic distances between the elements are consistent with the phylogeny of the host species on the basis of rDNA sequences, with the exception of Tsk1 from S. kluyveri (discussed below). In contrast, all other Ty1/copia LTR retrotransposons, including elements from plants, basidiomycetous yeasts, and insects, are grouped together in a separate clade.
The phylogeny of Ty3/gypsy elements in hemiascomycetous yeasts is much more complex than that of Ty1/copia elements. Ty3/gypsy elements in hemiascomycetes are widely spread over the phylogenetic tree of the gypsy group of retrotransposons (Fig. 5). Most of these Ty3/gypsy elements are degenerate, truncated, or remnants of solo LTRs. Contrary to the Ty1/copia elements, there are no monophyletic clades containing hemiascomycetous elements alone. Conversely, we showed that host-species with parts of Ty3/gypsy transposons are present in all branches of the hemiascomycetous phylogenetic tree on the basis of rDNA sequence data. In some yeast species, several divergent Ty3/gypsy elements were found, such as Ylt1 and Tyl3 in Y. lipolytica or Tca3 and Tca8 in C. albicans. In addition, the amino acid sequences of the RT of elements from closely related yeast species, such as Ty3 and Tse3 (45.4% identity and 53.6% similarity on 209 aa), differed enormously. These findings indicate that the acquisition of Ty3/gypsy elements by hemiascomycetes took place long enough ago to allow such a divergence and evolutionary events. This means that the modern hemiascomycetous Ty3/gypsy members can probably be further subdivided into different clades, like the hemiascomycetous Ty1/copia elements, although we need more sequence data on these elements.
Retrotransposon Evolution: Sequence Evolution or Horizontal Transfer?
Previous evolutionary studies on retrotransposons (Malik and
Eickbush 2001
) show that LTR retrotransposons were present in early
eukaryotes, and a fortiori in the yeast ancestor. Ty1/copia and Ty3/gypsy elements are derived from an ancient and diverse group that radiated into modern yeast species, mostly via vertical inheritance. However, the ancestral LTR retrotransposons have undergone
evolutionary events. Some of these events (exceptional events) provided
some retrotransposons a selective advantage over less active elements,
leading to better proliferation and rapid invasion of the host. For
example, the acquisition of a PBS complementary to a tRNAAsn
might give Ty4 in S. cerevisiae an advantage because it is not in competition with Ty1 and Ty2 for tRNAiMet molecules.
Similarly, the acquisition of a frameshift mechanism is probably an
advantage during expression.
Given the conservation of structural features, the congruent evolution between hosts and elements within a clade, and the phylogenetic distribution of retrotransposon types, we propose that the actual members of the four Ty1/copia clades of hemiacomycetes diverged from a common ancestor after three major exceptional events during the evolution of LTR retrotransposons (Fig. 6).
|
Ty5-like elements are probably the most ancient group of Ty1/copia
elements in hemiascomycetous yeasts. Although their genetic structure has been conserved, with all members encoding a single ORF
corresponding to a gag-pol gene fusion, they present
a low level of sequence conservation (up to 47% aa identity in
pol). Thus, these elements might be an old component of
hemiascomycetous yeasts that probably arose after the speciation of
Y. lipolytica (Fig. 6). This would also explain why most
hemiascomycetous species (9 of 14 in Fig. 6) have lost this type of
element. The presence of a single ORF, compared with two in the other
Ty1/copia elements, is reminiscent of the scenario for the
evolution of non-LTR retrotransposons, in which the most ancient
element has only one ORF, and the most recent types have two (Malik et
al. 1999
).
We propose that the first exceptional event that arose during Ty1/copia evolution in hemiascomycetous yeasts was the acquisition of different strategies for regulating the stoichiometry of gag and pol. Thus, the acquisition of a stop-codon separating ORF1 and ORF2 took place in the lineage leading to the Tca2 clade. Because only C. albicans and D. hansenii possess modern members of this retrotransposon type, we suggest that this exceptional event arose after the speciation of P. angusta, just before the differentiation of D. hansenii and P. farinosa on the one hand and C. tropicalis and C. albicans on the other hand (Fig. 6). Another strategy consisting of a +1 frameshift mechanism was acquired by the Ty1-like ancestor and has been retained during evolution by Ty4-like elements (Fig. 6). Because the six families of full-length Ty1-like elements (Ty1, Ty2, Tse1, Tsk1, Tkl1, and Tkm1) that form this very homogeneous group with highly conserved sequences (up to 79.9% aa identity in pol) and a phylogeny congruent with the phylogeny of the host species are all contained into species of the Saccharomyces and Kluyveromyces genera, we propose that the event leading to the appearance of the Ty1 ancestor arose before the genera Saccharomyces and Kluyveromyces differentiated (Fig. 6). Similarly, we propose that the acquisition of a different PBS gave the Ty4-like ancestor a selective advantage. Given that Ty4-like elements (Ty4 and Tsu4) share highly conserved sequences (68% identity on the 1461 aa of pol) and are only present in S. cerevisiae and S. bayanus, we suggest that Ty4-like elements are relatively recent and that they arose before the speciation of the sensu stricto group (Fig. 6).
LTR retrotransposons are constantly evolving, so new types will
probably arise. Jordan and McDonald (1999b)
showed that there is a high
level of genomic turnover of Ty elements in S. cerevisiae. However, we do not know how the new types appear. We suggest several different possibilities. The first entails the accumulation of point
mutations that allowed the new element to escape the regulatory repression of the host. As no major evolutionary events differentiate Ty1 from Ty2, this is a prominent example of elements separated from
each other by nucleotide divergence only. This implies that Ty1 and Ty2
evolved independently as a result of functional constraints, either in
the same genome or in two different genomes that crossed after
differentiation of the retrotransposons by allotetraploidization (Seoighe and Wolfe 1999
) or by a cross between two local races that
mixed recently. We do not have sufficient evidence to differentiate between these different hypotheses concerning the origin of the modern
S. cerevisiae genome.
The second possibility is that internal elements recombined with the
host's DNA, which led to the acquisition of new PBS, for example
(contributing to coevolution between host and transposon), or by
interelement recombination as reported by Jordan and McDonald (1998)
.
The third alternative is that a partial or complete retrotransposon or
an invading retrovirus recombined with an endogenous LTR
retrotransposon via horizontal transfer. Unlike in insect
retrotransposons (Jordan et al. 1999
), there is no evidence that
horizontal transfer is involved in the acquisition of LTR
retrotransposons in yeasts. The only example is Tsk1 from S. kluyveri, which is closer to Ty1 than is Ty2 (79.9% identity for
1322 aa in pol between Ty1 and Tkl1 vs. 75.6% for 1351 aa
between Ty1 and Ty2); whereas host genomes are more distantly related.
However, this example needs further documentation.
| |
CONCLUSIONS |
|---|
|
|
|---|
We used an exceptionally large set of data on hemiascomycetous yeasts to analyze the LTR retrotransposons identified in these species in detail. Our results help to elucidate the evolution of these elements and, in particular, provide an insight into the genomic evolution of yeasts. We have developed a model for the evolution of elements from the Ty1/copia group. However, it was not as easy to establish a model for the evolution of the Ty3/gypsy group because there is little sequence information on this type of element. The large-scale genome sequencing project that is underway (P.F. Cliften, pers. comm.; GDR Génolevures) should confirm the evolutionary model proposed for Ty1/copia elements and provide accurate information that can be extended to Ty3/gypsy elements.
| |
METHODS |
|---|
|
|
|---|
Strains
The strains used in this study were the same as those used in the
Génolevures project (Souciet et al. 2000
): S. bayanus syn. S. uvarum (CLIB533), S. exiguus (CBS379), S. servazzii (CBS4311), Z. rouxii (CBS732), S. kluyveri
(CBS3082), K. thermotolerans (CBS6340), K. lactis
(CLIB210), K. marxianus var. marxianus (CBS712), P. angusta (CBS4732), D. hansenii var. hansenii (CBS767),
P. farinosa (CBS7064), C. tropicalis (CBS94), and
Y. lipolytica (CLIB89).
Accession Numbers
The EMBL and GenBank accession numbers of the sequences of the elements used in this study and of the new transposable elements are listed in Table 4.
Sequence Screening and Element Assembly
Sequence data for the 13 hemiascomycetous yeasts are available on
the Génolevures website at
http://cbi.labri.u-bordeaux.fr/Genolevures/Genolevures.php3. In
addition to the Ty5 sequence of S. paradoxus, we used a
database of S. cerevisiae Ty elements (Horst Feldmann,
unpubl.). We compared the amino acid sequences of the RSTs from the 13 yeast species with those of the Ty elements with the
BLASTX version 2.0.8 program with the blosum62
substitution matrix. We used the Staden programs of the
Madison Institute Genetics Computer Group (GCG) sequence analysis
package (Devereux et al. 1984
) to assemble the sequences. We assembled
the complete elements by sequencing the clones containing the RST of
interest and, if part of the element was missing from the plasmid
libraries, by PCR amplification of genomic DNA fragments. Sequencing
data were obtained from overlapping reads on both strands.
Multiple Sequence Alignment and Phylogenetic Analysis
Pairwise comparisons were performed using the FASTA or
Bestfit programs from the GCG package. We aligned multiple
amino acid sequences with the PILEUP program of the GCG
package and ClustalW (Thompson et al. 1994
). Alignments
were adjusted manually with GeneDoc (K.B. Nicholas and
H.B. Nicholas, http://www.psc.edu/biomed/genedoc/) so that our
alignments were consistent with previously published multiple
alignments of protease, IN, RT, and RNAseH domains (Xiong and Eickbush
1990
; Springer and Britten 1993
; Jordan and McDonald 1999b
). The
GCG program Distances was used to calculate distance
matrixes with the Kimura correction method, and the GCG program
GrowTree was used to construct and to view UPGMA trees.
For bootstrap analyses, neighbor-joining trees were constructed using
ClustalX (Thompson et al. 1997
) and viewed with TreeView (Page 1996
).
Southern Hybridization
Genomic DNA was prepared in Seakem GTG agarose (FMC, USA) plugs as
described previously (Vezinhet et al. 1990
). The DNA was digested and
subjected to electrophoresis in a CHEF Mapper apparatus (Bio-Rad
Laboratories) in 1% Pulsed Field Certified Agarose (Bio-Rad Laboratories) gels in 0.5× TBE buffer at 12°C. Genomic DNA was digested with EcoRI, BamHI, HindIII, or
PstI and then was separated by field inversion gel
electrophoresis (FIGE) for 20 hr, 39 min with forward and reverse
voltages of 9 V/cm and 6 V/cm, respectively. Initial and final pulses
were 0.11 sec and 0.67 sec, respectively. DNA was transferred onto
GeneScreen nylon membranes (Dupont de Nemours NEN, USA) as described
previously (Zimmermann and Fournier 1996
). DNA/DNA hybridizations were
performed as described previously (Sambrook et al 1989
) with DNA probes
labeled with (
32P) dCTP using the Megaprime labeling kit
(Amersham Life Science, UK). Probes were obtained by PCR amplification
under the following conditions: 4 min at 94°C, 30 cycles of 30 sec at
94°C, followed by 30 sec at the Tm of the primers, 1 min per kb at
72°C with 2.5 units of Taq DNA polymerase (Appligene Oncor, France).
Long-range PCR amplifications of genomic DNA were run in a Perkin-Elmer
2400 thermocycler using the Expand high-fidelity PCR system (Boehringer Mannheim, Germany).
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://bioc111.otago.ac.nz:591/retrobase/home.htm; Retrobase site.
http://cbi.labri.u-bordeaux.fr/Genolevures/Genolevures.php3; Génolevures project.
http://www.inra.fr/clib/english/genolevu.htm; Collection of yeasts of biotechnological interest.
http://www.psc.edu/biomed/genedoc/; GeneDoc.
http://www-sequence.stanford.edu/group/candida; Sequencing of Candida albicans at the Stanford Genome Technology Center.
http://www-sequence.stanford.edu/group/C.neoformans/index.html; Stanford Genome Technology Center, Cryptococcus neoformans genome project.
| |
ACKNOWLEDGMENTS |
|---|
E.B. was supported by the EEC scientific research grant QLRI-1999-01333. This work was supported by INRA, CNRS, the GDR/CNRS 2354 "GénolevuresII," and by two BRG grants (Ressources Génétiques des Microorganismes n°11-0926-99 and Gestion des Collections de Levures de Fromage).
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Corresponding author.
E-MAIL ncecile{at}grignon.inra.fr; FAX 33-1-30-81-54-57.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.219202.
| |
REFERENCES |
|---|
|
|
|---|