|
|
|
|
Vol. 10, Issue 2, 174-191, February 2000 Multiple LTR-Retrotransposon Families in the Asexual Yeast Candida albicansDepartment of Biochemistry, University of Otago, Dunedin, New Zealand
We have begun a characterization of the long terminal repeat (LTR) retrotransposons in the asexual yeast Candida albicans. A database of assembled C. albicans genomic sequence at Stanford University, which represents 14.9 Mb of the 16-Mb haploid genome, was screened and >350 distinct retrotransposon insertions were identified. The majority of these insertions represent previously unrecognized retrotransposons. The various elements were classified into 34 distinct families, each family being similar, in terms of the range of sequences that it represents, to a typical Ty element family of the related yeast Saccharomyces cerevisiae. These C. albicans retrotransposon families are generally of low copy number and vary widely in coding capacity. For only three families, was a full-length and apparently intact retrotransposon identified. For many families, only solo LTRs and LTR fragments remain. Several families of highly degenerate elements appear to be still capable of transposition, presumably via trans-activation. The overall structure of the retrotransposon population in C. albicans differs considerably from that of S. cerevisiae. In that species, retrotransposon insertions can be assigned to just five families. Most of these families still retain functional examples, and they generally appear at higher copy numbers than the C. albicans families. The possibility that these differences between the two species are attributable to the nonstandard genetic code of C. albicans or the asexual nature of its genome is discussed. A region rich in retrotransposon fragments, that lies adjacent to many of the CARE-2/Rel-2 sub-telomeric repeats, and which appears to have arisen through multiple rounds of duplication and recombination, is also described. [The sequence data described in this paper have been submitted to the GenBank data library. Accession numbers are listed in Table 1 and in the Materials and Methods section.]
Candida albicans is a major fungal
pathogen of humans and C. albicans
infections have become more of a problem in recent years with the
spread of AIDS and the increased use of invasive surgical techniques
(Odds 1988 Several laboratories have devoted considerable effort over recent years
toward understanding the genomic organization of C. albicans
and how this varies among strains. Important results to date include
the construction of an SfiI restriction map of the complete
genome (Chu et al. 1993 Retrotransposons are a significant component of many eukaryote genomes.
They often make up a large proportion of the genome, for instance, the
L1 retrotransposon comprises ~15% of the human genome (Kazazian and
Moran 1998 Full-length retrotransposons seem to be quite unstable structures and are often lost as a result of recombination between their two LTRs. This results in single, isolated LTRs, termed solo LTRs, remaining at the original sites of insertion. Full-length retrotransposons, and also the solo LTRs, are often found flanked by short (4- or 5-bp) direct repeats. These are duplications of the target-site sequence, which are formed during the insertion process. Of the S. cerevisiae Ty elements, Ty1, Ty2, and Ty3 are known
to be functional (Curcio et al. 1988 Most LTR retrotransposons can be classified into one of two distinct
groups, the copia group or the gypsy group, on the
basis of their reverse transcriptase sequences and other structural features (Xiong and Eickbush 1990 Five retrotransposons or retrotransposon-like elements have been
identified in C. albicans to date. The first of these is Tca1
(Chen and Fonzi 1992 Here we describe a more thorough characterization of the retrotransposons in the C. albicans genome. This characterization was made possible by a C. albicans genome sequencing project at Stanford University. Our results show that the structure of the retrotransposon population in C. albicans differs considerably from that of S. cerevisiae. The differences suggest that the forces shaping retrotransposon evolution have differed in the two species. Further analyses and comparisons may yield interesting insights into more general aspects of genome structure, function, and evolution.
Identification of New C. albicans Retrotransposon Sequences The genome of C. albicans strain SC5314 (Gillum et al.
1984 These sequence databases were used to identify new families of C. albicans retrotransposons. The major focus of this paper is the identification and characterization of the LTRs of the new retrotransposon families, although a number of new full-length elements were identified. These full-length retrotransposons are described briefly here but will be analyzed in greater depth elsewhere. In this paper we first describe how the new LTRs were discovered, then how they were divided into families, before presenting a more in-depth characterization of the various families. The criteria for assigning a C. albicans sequence as a new
retrotransposon LTR were that such a sequence should (1) have distinct termini, most likely 5'-TG...CA-3'; (2) be present at multiple locations in the genome, as evidenced by multiple different flanking sequences; (3) sometimes be found flanked by short (4- or 5-bp) direct
repeats, representing duplications of the target sequence; and (4) be
within a limited size range. Initially, we considered elements
~200-600 bp in length, a range that encompasses the size distribution of previously described fungal LTRs (251-388 bp, Lauermann et al. 1997 The new LTRs were identified in a variety of ways. A few were
identified simply as a result of their association with recognizable retrotransposon internal regions. An example is the san LTR.
One sequence carrying part of a san LTR also carries part of
an ORF predicting a protein similar in sequence to the carboxy-terminal end of the Tca2 Pol protein (Fig. 1A). This
similarity was recognized in the BLASTX search against the Genpept
database and was noted in the sequence's annotation. Closer
examination of this sequence revealed a region downstream of the ORF
closely resembling the polypurine tracts (PPTs) of other C. albicans retrotransposons (Fig. 1B), and adjacent to this PPT was a
new LTR-like repetitive sequence
The next method involved screening the database containing the
individual sequences (the database containing the assembled data was
not available at the time) for repetitive elements by performing
sequential BLAST searches of individual sequences against the database.
With the level of coverage of the genome in the database at the time,
most single-copy sequences produced just a single hit (themselves), or,
occasionally, two or three. Repetitive sequences, on the other hand,
usually had multiple high-quality matches. Most of the repetitive
sequences identified in this manner appeared to originate from the
mitochondrial DNA, the ribosomal DNA repeats, the MRS (Chibana et al.
1998 Further LTRs were identified as a result of the apparent pattern of evolution of retrotransposons in C. albicans (see below), which has resulted in the appearance of related families of elements that still retain significant sequence similarity. For instance, the san element mentioned above and the alpha LTR of Tca1 share ~60% sequence identity. These two elements are shown aligned in Figure 1C. They can be seen to share very similar terminal sequences and scattered regions of similarity in the internal regions. These areas of similarity may contain sequences important for LTR function, such as promoters, terminators, or recognition sites for integrase, etc. An example of a new LTR being detected as a result of its similarity to other elements is xi, which is related to the san and alpha LTRs. This element was detected following a BLASTN search of the assembled sequence data with a san LTR as a query. As expected, the search detected the contigs containing san LTRs with the most significant scores, followed by the contigs containing alpha LTRs. However, several other contigs were also detected with scores well above background. Close examination of these contigs revealed that they all contain a similar element, which bears all of the characteristics of an LTR. This new element was named xi. The final method by which new LTRs were discovered was as inserts within, or near to, previously recognized elements. For instance, while characterizing the members of one retrotransposon family, we would frequently find truncated examples or elements with large insertions. Analysis of the sequences causing the disruptions often revealed new LTRs. An example of an LTR identified in this manner is tui. This LTR was originally detected as a 193-bp element, present within a kappa LTR, with the terminal dinucleotides 5'-TG...TA-3' and flanked by a 5-bp duplication of the target site (Fig. 2). Further analysis of this putative LTR revealed multiple similar sequences in the genome with a variety of different flanking sequences.
By these four methods, we have identified a total of 355 LTRs or LTR fragments in assembly 4 of the Stanford database, which represents 14.9 Mb from a haploid genome of size 16 Mb. It is worth noting, however, that our methods were not exhaustive, and it is possible that there are other families of LTR sequences in the SC5314 genome that we have failed to detect. In addition, it is likely, given that C. albicans is asexual and that many strains have thus been evolving independently for a long time, that there are retrotransposons present in other C. albicans strains that are not present in SC5314 at all. Furthermore, there are a few other LTR-like sequences present, which we haven't included in this analysis because they don't fulfill all of the above criteria, such as having two distinct termini. Division of the New Elements into Families Many of the new LTRs that we have identified are simple to group
into distinct families as they are highly similar within a family,
bearing, for example, >80% sequence identity to all other members
of the family, yet they bear no detectable similarity to any other LTR
at all (apart from the terminal dinucleotides). For others, however, it
is not so simple, as there are quite a few cases of pairs of elements
sharing 60%-70% sequence identity. It is not immediately clear
whether such elements should be classified in the same family, or into
related, but distinct, families. To allow comparisons between our
results and the findings from S. cerevisiae, we have tried to
keep the concept of a family consistent between these two species.
Although the term family has not been defined precisely for
retrotransposons in S. cerevisiae, the extent of diversity of
sequences within a family has been fairly well documented (Kim et al.
1998 With these findings from S. cerevisiae in mind, we decided on
a working definition of a family of C. albicans
retrotransposon LTRs as being a monophyletic group of sequences, having
all the characteristics of LTRs, that typically share Phylogenetic trees were constructed for all families to allow the
diversity of sequences among and within families to be visualized. Examples of these are shown, drawn to the same horizontal scale, in
Figure 3A. These trees are useful to illustrate some
borderline cases and to compare them with well-defined examples of
related, but distinct, families. An example of the latter is provided
by alpha, san, and xi. Alpha and xi
each have
Another borderline case is the phi and chi LTRs. These two families fall into two monophyletic groups with 100% bootstrap support (Fig. 3A), but some phi LTRs share 71% identity with particular chi LTRs, suggesting that perhaps they should all be assigned to the one family. If, however, this were to be done, then this would become a very diverse family, with some elements sharing as little as 56% identity and many failing to align full-length at all. In addition, no evidence for recombination between the two families was detected. For the above reasons, these elements were assigned to two separate families. The somewhat greater divergence between phi and chi LTRs than between the two halves of the pi family can be seen by the longer branch length separating phi and chi than that separating the two halves of the pi family (Fig. 3A). A final borderline case was the tui LTRs. These elements fall into two groups receiving 100% bootstrap support, with the elements from one group typically sharing 67%-69% identity with the other group. However, all of the elements in the upper group, as shown in Figure 3A, have very similar flanking sequences and appear to have been duplicated by a mechanism other than autonomous retrotransposition (see below), suggesting that they are not worthy of a separate family designation. These elements were therefore grouped together with the related LTRs into one family. On the basis of the above family definition, designed to be broadly
comparable to S. cerevisiae, and the resolution of borderline cases as described, the C. albicans LTRs fall into 34 distinct families. To be consistent with the naming of previous C. albicans and S. cerevisiae LTRs, the new elements have
been named after letters of the Greek alphabet, although we tended to
avoid using letters already assigned to S. cerevisiae LTRs
(delta, omega, sigma, and tau).
When we ran out of classical Greek letters, we started naming them
after archaic Greek letters (sampi, san, etc.) and
phonetically similar names of New Zealand birds (moa,
tara, weka, etc.). The various families are listed in
Table 1. Related families of LTRs are listed in Table
2.
The number of distinct families we have identified in C. albicans is considerably more than the number found in S. cerevisiae. This raises the question of whether a C. albicans retrotransposon family, as we have defined it, is really
equivalent to an S. cerevisiae Ty family. We tested this in
two ways. Firstly, we constructed phylogenetic trees of the
delta, sigma, and tau Ty LTRs, using as a
dataset all of the full-length LTRs identified in the yeast genome
(available from the Voytas laboratory web site:
http://www.public.iastate.edu/~voytas/ltrstuff/ltrtables/yeast.html). These trees are shown, drawn to the same horizontal scale as the C. albicans trees (Fig. 3B). As discussed by Kim et al.
(1998) As a second means of comparing the concept of a family in C. albicans and S. cerevisiae, we calculated the level of
nucleotide diversity ( General Characteristics of the LTR Families Having established that C. albicans retrotransposon
families are comparable to S. cerevisiae families, we went on
to characterize the LTRs of each family in more detail. Some of our
findings are summarized in Table 1. The elements range in length from
127 to 780 bp, with an average length of 359 bp. This is a somewhat wider range than that found in previously identified yeast
retrotransposon LTRs
For each LTR family, the sequences flanking all of the copies were
recorded and compared. Examples of such comparisons are given for the
pi and san elements in Figure 5.
More than half of the full-length LTRs were found to be flanked by
short direct repeats representing target-site duplications (TSDs). For
the majority of elements, such TSDs were 5 bp in length, as is the case
with the S. cerevisiae Ty elements. Five elements, however, were commonly found flanked by 4-bp TSDs. To the best of our knowledge, 4-bp TSDs have not been reported previously for fungal elements, but
are common among gypsy-class elements of insects (for a
summary, see Table 6 of Boeke and Stoye 1997
Kim et al. (1998)
Analysis of the regions flanking each C. albicans LTR suggested that recombination between LTRs at different locations has occured. For instance, just one of the san LTRs in Figure 5B (2986) is flanked by a 5-bp direct repeat. However, the 3' target site of the san LTR in 2584 is identical to the 5' target site of that in 3083, and similarly its 5'-flank is the same as the 3' flank of the san LTR in 2757. The san LTRs in 2757 and 3083 likely represent the left and right LTRs of a single full-length element (T. Goodwin, unpubl.). This apparent swapping of flanking sequences suggests that one LTR of this retrotransposon has undergone a recombination with a san LTR at some other location. There are several other examples of pairs of LTRs that appear to have swapped flanking sequences, and there are many examples of elements flanked by sequences, bearing no resemblance to a direct repeat, which are probably the result of either recombination or frequent mutation of the TSD. Identification of Retrotransposons Retaining Internal Sequences The finding that many of the new LTRs are flanked by direct repeats
suggests that these elements are solo LTRs. It is of interest to know
whether the actual retrotransposons, corresponding to the many LTR
families we have identified, are still present and to know what form
such retrotransposons might take. We have therefore searched the
databases for retrotransposon internal regions. Sixteen different
families were identified, including the previously described Tca1 and
Tca2. The new elements have been named Tca3 through Tca16 and are
listed in Table 5.
For some elements, such as nu, the identification of a
corresponding retrotransposon was relatively straightforward Retrotransposons that have highly degenerate or divergent coding
regions, or for which only fragments remain, are unlikely to be
detected by homology to other elements at the protein level. Therefore,
as a possible means of detecting such elements, we scanned the regions
flanking those LTRs that lack TSDs for possible minus-strand
primer-binding sites (PBSs). In most retrotransposons, the PBS is a
10-20-nucleotide sequence just downstream of the left LTR that is
complementary to part of a cytoplasmic tRNA. As examples, the Ty1 PBS
consists of a 10-nucleotide sequence, immediately adjacent to the LTR,
that is complementary to the 3' end of the initiator methionine
tRNA (tRNAiMet), whereas the Ty5 PBS is a 13-nucleotide
sequence, adjacent to the LTR, that is complementary to the anticodon
stem-loop of the tRNAiMet (Voytas and Boeke 1992 New PBSs were detected in a variety of ways. Some were found as a
result of their similarity to previously identified C. albicans PBSs. For instance, both the previously described
full-length retrotransposons, Tca1 (Chen and Fonzi 1992 In some cases, two LTRs of a family were found repeated in a direct orientation on a contig, but it wasn't clear whether they were two independent insertions or the two LTRs of a single retrotransposon. In such cases, we looked for sequences near to the LTR termini that began 5'-TGG-3'. This sequence is complementary to the CCA trinucleotide present at the 3' end of all tRNAs and is a feature of all PBSs that utilize the 3' end of a tRNA as a primer. Such sequences were visually compared with the C. albicans tRNAs in GenBank or to a database of all the S. cerevisiae tRNA sequences (http://biochimica.unipr.it/yeast/all_trna.txt). In the case in which a matching S. cerevisiae tRNA was found, its sequence was used to screen the C. albicans databases and identify the homologous gene, which was then compared with the putative PBS. By these methods, 14 different families of C. albicans LTRs
were found with associated PBSs. The PBSs could be divided into seven
different classes (Fig. 6); those with an extensive
region of homology to a tRNAArg(UCU) internal region
(e.g., gamma), those with a short region of homology to the
same tRNA (e.g., whio), those homologous to an internal region
of the tRNAiMet (e.g., zeta), and those homologous
to the 3' ends of various tRNAs including the iMet, Ile, Gln, and
Ala tRNAs. In several cases, the PBS was not immediately adjacent to
the LTR, but between 2 and 10 bp downstream. A short gap between the
LTR and the PBS is a common feature of gypsy-class
retrotransposons (see, e.g., Table 1 of Chavanne et al. 1998
The 16 families of elements that we have found to retain some internal sequences vary considerably in their coding capacity (Table 5). A few are apparently intact, bearing all the characteristic features of functional retrotransposons (Tca2, Tca4, and Tca5). Others, such as Tca3 and Tca8, still have long, uninterrupted ORFs with homology to other retroelements, but we haven't yet detected a full-length element. One, Tca6, has no long ORFs in the internal region but has ORF fragments that still bear detectable similarity to related retrotransposons. Some, such as Tca9 and Tca13, can be found as composite elements, with identical LTRs, intact PBSs and PPTs, and flanked by 5-bp direct repeats. Between the LTRs, however, lies 4-5 kb of sequence with no apparent coding capacity at all, nor any detectable similarity to other retroelements. A further element, Tca10, consists of a composite element, flanked by a 5-bp direct repeat, whose LTRs share 99.5% identity. The internal region is ~2 kb long and contains a PBS and PPT adjacent to the left and right LTRs, respectively, and some extensive ORFs. The predicted products of these ORFs, however, bear no significant similarity to any protein sequence in the databases. Most of the elements could be assigned to either the copia or gypsy family (Table 5). For some elements without extensive ORFs, this assignment is tentative as it is based on the nature of their PBSs. For instance, a small gap between the LTR and the PBS is common among gypsy-class elements, but to the best of our knowledge, has not been found in copia-type elements. Conversely, the use of tRNA fragments as primers appears to be restricted to copia-like elements. None of the other 18 newly identified LTR families were found associated with sequences resembling the internal regions of retrotransposons. Solo LTRs and LTR fragments may be the only remnants of these retrotransposon families. However, it is possible that internal retrotransposon sequences corresponding to some of these LTRs may have escaped detection, given that the coverage of the SC5314 genome in the Stanford database, although extensive, is not complete. It is also possible that other C. albicans strains harbor full-length retrotransposons that are absent from SC5314. Retrotransposons in the C. albicans subtelomeric regions We have reported previously that some kappa LTRs are
associated with the C. albicans repetitive elements
CARE-2 (Lasker et al. 1992 As might be expected for repetitive elements, there are a relatively large number of contigs bearing CARE-2- and Rel-2-like sequences in the database. More than half of these contigs were found to contain a similar and LTR-rich region neighboring the CARE-2/Rel-2-like sequences. A few other contigs also have a similar LTR-rich region but lack CARE-2/Rel-2. The LTR-rich regions of all of these contigs are depicted in Figure 7A.
To ascertain whether these areas are of subtelomeric origin, the
distribution of genes and repeated sequences in the surrounding areas
was studied. We found that upstream of the LTRs (Fig. 7A, left) the
sequences soon diverge. These different upstream flanking sequences
contain a variety of different genes. For example, contig 3048 has a
tRNALeu gene, contig 1846, part of a TUP1 gene, and
contig 2757, a homolog of the S. cerevisiae YJL004c gene,
each within 500 bp of the psi LTR fragment. The other contigs,
apart from 2250, each have a gene within 1 kb of the psi LTR.
Contig 2250 has a degenerate non-LTR retrotransposon starting ~300
bp upstream. Several of the contigs contain sequences extending 20 kb
or more upstream of the LTR-rich region. In contrast, the contigs with
CARE-2/Rel-2-like sequences downstream of the LTRs generally
extend only 2-6 kb downstream and these sequences contain no
recognizable genes. An exception is contig 2935, which extends >8 kb
downstream and contains a long ORF with similarity (not shown) to the
Y' subtelomeric elements of S. cerevisae (Louis and Haber
1992 The LTR-rich regions that are present at the boundary between the
subtelomeric repeats and the centromere-proximal unique sequences
appear to have been subjected to high levels of sequence rearrangement
(Fig. 7A). Elements that are common to a majority of these regions are
kappa LTRs, each bearing a tui LTR insertion, and, a
short distance upstream, partial psi LTRs. The tui
LTRs are inserted into the kappa LTRs at the same position and
in the same orientation in all of the various contigs, although many of
them have suffered small deletions. The immediate upstream flanks of
the kappa LTRs are identical in all of the contigs, but there
are a variety of distinct downstream flanks. For instance, some are
immediately flanked by CARE-2/Rel-2-like sequences, others have an intervening partial weka LTR, whereas others are
truncated, etc. Upstream of the corrupted kappa LTRs, the
partial psi LTRs vary on the different contigs, some suffering
more widespread deletions than others. The upstream flanks of the
psi fragments mark the boundary between the repeated sequences
and the upstream unique sequence in some contigs, although other
contigs share similar sequence for up to several hundred base pairs
further upstream. Several contigs have a rho LTR between the
psi and kappa LTRs. Again, these rho LTRs
have the same flanking sequences and the same orientation. Some contigs
carry different combinations of these common features. For instance,
contig 3079 has the rho LTR between psi and
kappa and the downstream flank of kappa is within a
weka LTR. Contig 3048, however, has the rho LTR, but not the partial weka, whereas contig 2956 has the partial
weka but not the rho LTR. Other contigs carry unique
variations The arrangement of these sequences suggests that they are the result of multiple rounds of sequence duplication and recombination, interspersed with a variety of transposition events. Presumably, the ancestral structure consisted of a kappa LTR, with a tui insertion, lying downstream of a partial psi LTR. For some reason, this sequence was the subject of multiple rounds of duplication. At some stage during the duplication process, a rho LTR likely became inserted between the kappa LTR and psi fragment of one copy and subsequently became part of the duplicated sequences. Recombinations, at various stages among kappa LTRs with different flanking sequences is also likely. In addition, a variety of deletions occurring in the tui and psi LTRs during the duplication process is suggested. The sequences of some of these contigs are suggestive of some quite dramatic rearrangements, involving areas of the genome not closely associated with the subtelomeres. For instance, contig 2898 has not only suffered a Tca2 insert within the rho LTR, but a tara insert within the kappa LTR and another rho insert within the tara element. The tara insert in this contig is not flanked by a direct repeat, and its downstream flanking sequence, in which the remainder of the kappa LTR might be expected to lie, bears no resemblance to a kappa LTR. The sequences downstream of tara also bear no resemblance to the subtelomeric CARE-2/Rel-2 sequences. Instead, following an additional zeta LTR and a truncated toroa LTR, there are several apparent genes. This suggests that the LTR-rich region in this contig, although clearly related to the others, is no longer closely associated with the subtelomeres. LTRs that lie in close proximity to one another are not restricted to
the subtelomeric regions. Several examples of LTRs grouped together in
intergenic regions are shown in Figure 7B. The arrangement of LTRs in
the CBP1 (corticosteroid-binding protein) gene is of particular
interest, as LTRs have inserted within the ORF and now supply its last
13 codons and presumably the transcriptional termination signals as
well. The protein encoded by this ORF is still functional, however,
with demonstrated high-affinity corticosteroid-binding activity (Malloy
et al. 1993
Retrotransposons are an abundant and ubiquitous component of the eukaryote genome, and, as such, are a common source of genetic variation. The analysis of the retrotransposon complements of different species is of interest, as it should further our understanding of the role played by these elements in host evolution and may reveal the various strategies by which the hosts have attempted to prevent the overproliferation of these elements. Conversely, the various strategies used by the elements to try to avoid any host-driven elimination mechanisms may also become apparent. We have presented here an analysis of the retrotransposons in the genome of the asexual yeast C. albicans. The results of such an analysis are of special interest as the retrotransposons of the related yeast S. cerevisiae have been analyzed in depth and thus provide an excellent reference for comparison. Here, we have described in some detail the methods by which a wide variety of new C. albicans elements were identified and classified into families. We have presented a working definition of a retrotransposon family for C. albicans that may assist in the classification of any further C. albicans retrotransposons. We compared the concept of a retrotransposon family in C. albicans with the recognized families of Ty elements and concluded that they are roughly equivalent in that they represent a similar diversity of sequences. An initial characteriza-tion of the various families was also undertaken. A more in-depth analysis of the several relatively intact retrotransposons that we identified will be presented elsewhere. Perhaps the most interesting finding to emerge from this work is that
the number of distinct retrotransposon families is much higher in
C. albicans than it is in S. cerevisiae. Even if the families of C. albicans elements that show some sequence
similarity (Table 2) were to be combined into superfamilies of related
elements, there would be 20 such groups, still considerably more than
in S. cerevisiae. Another difference is that the majority of
the C. albicans families appear to be nonfunctional and of low
copy number. In contrast, the Ty elements are largely intact and are present at higher copy numbers. Furthermore, C. albicans has
non-LTR retrotransposons and DNA transposons (Chibana et al. 1998 One major difference between C. albicans and S. cerevisiae is that C. albicans has a nonstandard genetic
code Another obvious difference between the two species is that S. cerevisiae can reproduce sexually, whereas C. albicans
appears to be strictly asexual (Scherer and Magee 1990 Several of the full-length retrotransposons in C. albicans
have highly degenerate internal regions, suggesting that they are nonfunctional, yet have the identical LTRs and perfect TSDs
characteristic of recently transposed elements. What could explain
these apparent contradictions? Frequent gene conversion among the LTRs
of these families is one possibility. If this were the case, however,
the expectation would then be that most or all of the LTRs of these families would be highly similar, whereas, in fact, they are fairly heterogeneous groups. Gene conversion also doesn't account for the
fact that these elements all have intact PBSs and PPTs as well. Rather,
it seems likely that these elements have indeed transposed fairly
recently. Presumably, this would occur via the mRNAs of these elements
being processed by the products of other retrotransposons in
trans. The minimum requirements for
trans-activation are that an element be transcribed and have
an intact PBS, PPT, and mRNA-packaging signal, and that these be
recognized by the trans-activating element. We know nothing
about the mRNA-packaging signals of C. albicans
retrotransposons, but all of the elements that appear to move via
trans-activation do have intact PBSs and PPTs, and Tca1, at
least, is known to be transcribed (Chen and Fonzi 1992 The various families of elements in C. albicans appear to represent the full range of stages of retroelement speciation (Fig. 3A). For some families, such as san or gamma, the members are all nearly identical. Other families, such as psi, include several elements that are very similar, but also some more divergent ones. Further families, such as pi, represent quite a diverse range of sequences and the members clearly fall into two distinct subfamilies. Then, there are closely related families of elements, such as phi and chi, followed by families such as lambda and zeta, which are related but probably diverged a long time ago. Finally, there are families, such as san and gamma, for which the LTRs are very different, but the PBSs and PPTs are similar in the full-length versions, suggesting that they are also related. These elements thus provide snap shots of retrotransposon speciation in action. Further analysis of such sequences may yield interesting insights into the speciation process. For example, the role of recombination, in particular, gene conversion, in maintaining sequence homogeneity and thus inhibiting speciation, may become apparent. Another example could be in determining whether positive selection ever has a role, for instance, in helping related elements to avoid competition for limited host factors, such as tRNA primers. All of the families for which we have identified full-length and apparently intact members, gamma (Tca2), san (Tca4), and omega (Tca5), have very low levels of sequence diversity (Table 1), suggesting that they have arisen only recently. These elements are unlikely to have arrived via horizontal transmission, however, as they are clearly related to other C. albicans elements. Therefore, they must have arisen as a result of divergence from some progenitor element. These findings are consistent with the retrotransposons of C. albicans being in a state of flux with new elements being continually generated and diverging, whereas older elements become nonfunctional as a result of either random mutation or deletion via inter-LTR recombinations. The abundance of nonfunctional families suggests that the remnants of ancient elements are not efficiently removed from the genome, but rather persist and gradually diverge as random mutations accumulate. Each of the Ty element families of S. cerevisiae has a
preference for inserting at particular sites within the genome (Kim et
al. 1998 At the junction between the C. albicans subtelomeric repeats
and the centromere-proximal unique sequences, there is often a region
that is rich in LTRs and other transposable elements (Fig. 7A). The
frequency of this region in the database suggests that it is likely to
be present on half or even more of the chromosome ends. The different
copies of this region share a similar underlying pattern of LTR
insertions, suggesting that they all have been derived from some
ancestral sequence via multiple rounds of duplication. Recombinations
and additional transpositions during the duplication process have
served to make each copy distinct. The frequent occurrence of this
LTR-rich region at C. albicans subtelomeres prompts the question of whether it has any adaptive significance or whether the
C. albicans chromosome ends are in a continual state of flux, and this just happens to be a prevalent structure at this time. Analysis of the subtelomeric regions of other strains and related Candida species may reveal the significance of such
structures. Of interest, a recent report (Morschhauser et al. 1999 The arrangement of sequences in the subtelomeres suggests that recombinations between LTRs are a common occurrence in C. albicans. For instance, the subtelomeric kappa LTRs all the have the same upstream flanking sequence, but a variety of downstream flanks, suggesting several recombination events. Recombinations between LTRs at different locations in the genome has also apparently occurred. For instance, the arrangement of LTRs and genes in contig 2898 (Fig. 7A) is most easily explained by a recombination between a tara insert within a subtelomeric kappa LTR, and a tara element at an intergenic location. The result of such an exchange would be that subtelomeric sequences located upstream of the recombination would subsequently be in an intergenic region and vice versa for the other region involved. Similarly, groups of corrupted and rearranged elements in other locations (Fig. 7B) bear witness to past genomic rearrangements involving retrotransposon sequences. Our results suggest that the pattern of retrotransposon evolution in C. albicans has differed markedly from that of S. cerevisiae. We hope that further analyses of retrotransposon populations in these and related species will contribute to our understanding of more general aspects of genome evolution. We also hope that our description of the methods by which we identified and classified a wide variety of C. albicans retrotransposons will assist in the study of retrotransposons in other species.
|