|
|
|
|
Vol. 9, Issue 8, 689-710, August 1999
RESEARCH
|
| |
ABSTRACT |
|---|
|
|
|---|
Phylogenetic analysis of aminoacyl-tRNA synthetases (aaRSs) of all
20 specificities from completely sequenced bacterial, archaeal, and
eukaryotic genomes reveals a complex evolutionary picture. Detailed
examination of the domain architecture of aaRSs using sequence profile
searches delineated a network of partially conserved domains that is
even more elaborate than previously suspected. Several unexpected
evolutionary connections were identified, including the apparent origin
of the
-subunit of bacterial GlyRS from the HD superfamily of
hydrolases, a domain shared by bacterial AspRS and the B subunit of
archaeal glutamyl-tRNA amidotransferases, and another previously
undetected domain that is conserved in a subset of ThrRS, guanosine
polyphosphate hydrolases and synthetases, and a family of GTPases.
Comparison of domain architectures and multiple alignments resulted in
the delineation of synapomorphies
shared derived characters, such as
extra domains or inserts
for most of the aaRSs specificities. These
synapomorphies partition sets of aaRSs with the same specificity into
two or more distinct and apparently monophyletic groups. In conjunction
with cluster analysis and a modification of the midpoint-rooting
procedure, this partitioning was used to infer the likely root position
in phylogenetic trees. The topologies of the resulting rooted trees for
most of the aaRSs specificities are compatible with the evolutionary
"standard model" whereby the earliest radiation event separated
bacteria from the common ancestor of archaea and eukaryotes as opposed
to the two other possible evolutionary scenarios for the three major
divisions of life. For almost all aaRSs specificities, however, this
simple scheme is confounded by displacement of some of the bacterial aaRSs by their eukaryotic or, less frequently, archaeal counterparts. Displacement of ancestral eukaryotic aaRS genes by bacterial ones, presumably of mitochondrial origin, was observed for three aaRSs. In
contrast, there was no convincing evidence of displacement of archaeal
aaRSs by bacterial ones. Displacement of aaRS genes by eukaryotic
counterparts is most common among parasitic and symbiotic bacteria,
particularly the spirochaetes, in which 10 of the 19 aaRSs seem to have
been displaced by the respective eukaryotic genes and two by the
archaeal counterpart. Unlike the primary radiation events between the
three main divisions of life, that were readily traceable through the
phylogenetic analysis of aaRSs, no consistent large-scale bacterial
phylogeny could be established. In part, this may be due to additional
gene displacement events among bacterial lineages. Argument is
presented that, although lineage-specific gene loss might have
contributed to the evolution of some of the aaRSs, this is not a viable
alternative to horizontal gene transfer as the principal evolutionary
phenomenon in this gene class.
[Complete multiple alignments of all aaRSs from complete genomes as well as the alignments of conserved regions used for phylogenetic tree construction are available at ftp://ncbi.nlm.nih.gov/pub/koonin/aaRS]
| |
INTRODUCTION |
|---|
|
|
|---|
Aminoacyl-tRNA synthetases (aaRSs) are key
components of the protein translation machinery that
catalyze two basic reactions: (1) activation of amino acids via the
formation of aminoacyl adenylates and (2) linking the activated amino
acid to the cognate tRNAs. The aaRSs generate AMP as the second end
product of this reaction, which differentiates them from the majority
of ATP-dependent enzymes that produce ADP. aaRSs specific for each of
the 20 amino acids have been identified, and there are two structurally
distinct and apparently unrelated classes of aaRS, each encompassing 10 specificities (Cusack et al. 1990
, 1991
; Eriani et al. 1990
, 1995
; Cusack 1995
, 1997
). The two classes have different modes of
aminoacylation: aaRSs of class I aminoacylate the 2'OH of the
cognate tRNA, whereas those that belong to class II aminoacylate
3'OH (with the exception of PheRS). aaRSs of each class contain a
conserved core domain that is involved in ATP binding and hydrolysis
and combines with additional domains that determine the specificity of
interactions with the cognate amino acid and tRNA (Delarue and Moras
1993
; Cusack 1995
, 1997
). The core domain of class I contains a
parallel
-sheet, which resembles the nucleotide-binding Rossmann
fold in its topology (Moras 1992
). The class I core domain contains 2 conserved motifs, designated "HIGH" and "KMSKS" (after the
characteristic amino acid signatures), that are directly involved in
ATP binding (Eriani et al. 1990
; Moras 1992
; Arnez and Moras 1997
). A
specific structural similarity has been suggested to exist between the class I aaRSs core domain and a superfamily of nucleotidyltransferases that are typified by the bacterial cytidylyl transferase TagD and
contain a conserved HIGH-like motif (Bork et al. 1995
).
The class II core contains a mixed
-sheet similar to that found in
biotin synthases (Artymiuk et al. 1994
). This domain contains three
loosely conserved motifs that participate in ATP binding; they are
unrelated to the HIGH and KMSKS motifs (Eriani et al. 1990
; Moras 1992
;
Arnez and Moras 1997
). The extra domains of aaRSs are either inserted
into loops within the core domain or appended to the amino and carboxyl
termini of the core. These accessory domains show remarkable diversity,
resulting in a complex, modular domain architecture, which is largely
amino acid specific, although some domains are common in several aaRSs
of different specificities (Delarue and Moras 1993
).
Typically, aaRSs of the same specificity are highly conserved, whereas
those with different specificities show only limited conservation,
mostly confined to the core, ATP pyrophosphatase domain. There are only
three apparent exceptions to this rule: (1) GlnRS, unlike other aaRSs,
is present only in eukaryotes and
-Proteobacteria and appears to
be specifically related to a subset of GluRS (Freist et al. 1997a
,b
;
Siatecka et al. 1998
), (2) the same type of relationship has been
described for AsnRS and AspRS (Shiba et al. 1998
), and (3) there are
two types of LysRS that belong to class I and class II, respectively,
and appear to be unrelated to each other (Ibba et al. 1997a
,b
; Koonin
and Aravind 1998
; Siatecka et al. 1998
). aaRSs for 17 amino acids
appear to be universal, that is, they are encoded by all organisms for
which genome sequences are available. The exceptions are GlnRS that, as
already mentioned, is missing in most bacteria and archaea, AsnRS that
is missing in most archaeal and several bacterial species, and CysRS
that so far has not been identified in two archaea (Doolittle and Handy
1998
; Koonin and Aravind 1998
). The mechanism for postaminoacylation formation of Gln and Asn via transamidation of tRNAs charged with Glu
and Asp, respectively, has been characterized (Curnow et al. 1996
;
Wilcox and Nirenberg 1968
). The mechanism of cysteine incorporation into proteins in those archaea that lack CysRS remains a mystery. These
exceptions notwithstanding, the ubiquity of aaRSs indicates that the
two classes have evolved by serial duplication at a very early stage of
evolution and had been already locked into the distinct specificities
in the last common ancestor (LCA) of all extant life forms.
For several reasons, aaRSs appear to be an excellent test case for an analysis of the forces that shape gene and genome evolution on a large time scale.
| 1. | The set of aaRSs is naturally defined by the 20 specificities required for protein synthesis. |
| 2. | aaRSs are ubiquitous (with the exceptions mentioned above) and essential, therefore a gene encoding an aaRS generally cannot be lost in evolution unless it is displaced by another gene that encodes a different form of aaRS of the same specificity. |
| 3. | aaRSs with the same specificity typically do not form paralogous
families only a few isolated duplications of this type have been
noticed. This significantly reduces ambiguity in phylogenetic analysis.
|
| 4. | As the specificities of at least 17 of the 20 aaRSs (GlnRS and possibly AsnRS and LysRS being the exceptions) apparently have been established in the LCA and have not changed ever since, it seems unlikely that the aaRS genes have undergone major changes in evolutionary rates. |
| 5. | Unlike, for example, ribosomal proteins, aaRSs typically are not
involved in complex interactions with multiple protein partners. The
only interactions that are essential for their function are those with
amino acids, ATP, and the cognate tRNA (although exceptions are
possible). Discrimination of cognate from noncognate tRNAs by aaRSs is
a complex process, the details of which differ for different
specificities, but, at least in some cases, aaRSs are compatible with
tRNAs even from phylogenetically distant organisms (Bedouelle et al.
1993 |
| 6. | Given the variety of modular domain arrangements seen in aaRSs, phylogenetic analysis might shed light on the modes whereby such modules are acquired and exchanged during evolution. |
aaRSs have been among the most popular objects of molecular
phylogenetic studies, and several unusual evolutionary patterns have
been observed when tree topologies for aaRSs were compared to the
topologies derived from the analysis of rRNAs and other molecules
involved in translation (Nagel and Doolittle 1995
; Brown and Doolittle
1997
; Shiba et al. 1997a
; Doolittle and Handy 1998
). Phylogenetic
analysis of aaRSs is becoming increasingly interesting with the growth
of the collection of complete genome sequences that currently consists
of >20 genomes of bacteria, archaea, and eukaryotes. Because each of
the aaRSs is indispensable in the context of the modern-type
translation system, this collection provides us with at least 17 sets
of sequences of functionally equivalent aaRSs from all these diverse
organisms. Although many sequences of aaRSs have been available for a
long time, complete genomes are critical for conducting a convincing
evolutionary analysis. Only from complete genome sequences, the full
information on all aaRSs encoded by each species, including all
possible paralogs, can be extracted. In a recent insightful overview,
Doolittle and Handy (1998)
note that the number of apparent
evolutionary anomalies grows rapidly with the increase in genome
sequence information, resulting in a highly complex picture.
Here, we describe an attempt of a comprehensive analysis of the evolutionary patterns for all 20 sets of aaRSs using, primarily, the available complete genome sequences. We pursued two principal goals: (1) Using the recently developed sensitive methods for sequence analysis, together with structural information, delineate as completely as possible the domain architecture of all aaRSs; (2) generate phylogenetic trees for aaRSs of all specificities using carefully constructed multiple alignments and, whenever feasible, infer the root position. The results of phylogenetic analysis of most of the aaRSs appear to be compatible with the "standard model" that postulates the original radiation of bacteria and archaea-eukaryotes, followed by the divergence of the latter two divisions. However, for at least 15 of the aaRS specificities, this straightforward scenario needs to be amended by including horizontal gene transfer events, in some cases multiple ones, between major phylogenetic lineages, as well as acquisition, loss, and exchange of accessory domains. Our general conclusion is that the available sequence information is sufficient for reconstructing the principal events in the evolution of most, if not all, of the aaRSs.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Modular Domain Architectures of aaRSs
Previously Undetected
Accessory Domains and New Occurrences of Known Domains
Careful examination of the multiple alignments of aaRSs of all 20 specificities shows that each of them, without exception, has a
complex, modular architecture (Fig.
1).
Furthermore, the accessory domains form a network that connects aaRS of
different specificities. Many of these domains have been described in
previous studies (Delarue and Moras 1993
; Koonin et al. 1994
; Simos et al. 1996
; Aravind and Koonin 1999
), but using iterative profile searches with PSI-BLAST, we identified several previously undetected domains as well as new occurrences of known domains. The extensive structural characterization of the aaRSs has produced representative structures for almost all of the domains seen in these proteins (Fig. 1).
|
Four domains are shared by aaRSs of class I and class II (Fig. 1A,B).
These are (1) a predicted RNA-binding domain that is a distinct version
of the OB-fold (EMAP domain) and is found in all archaeal and a subset
of bacterial MetRS, some of the eukaryotic TyrRS (both of class I), and
the
-subunit of PheRS (class II); (2) the "DALR" domain that
is shared by seven aaRSs of class I and the
-subunit of bacterial
GlyRS of class II (see below); (3) a small domain that is predicted to
possess an
-helical, coiled-coil structure but nevertheless is
highly specific to aaRSs, is readily detectable by iterative database
searches without any false positives, is present in animal TrpRS,
MetRS, and GlnRS (class I) and HisRS, ProRS, and GlyRS (class II), and
has been shown to facilitate the formation of multi-aaRS complexes that have been isolated from animal cells as well as their interaction with
tRNAs (Rho et al. 1998
); and (4) a small carboxy-terminal domain
(designated "C-V/I/G" in Fig. 1A,B) shared by ValRS, eukaryotic, and archaeal IleRS (class I) and archaeal and eukaryotic GlyRS (class II).
All these domains have been described previously, but, with the
exception of the EMAP domain that has been analyzed in considerable detail (Simos et al. 1996
; Weiner and Maizels 1999
), the present study
expanded the range of aaRSs that contain each of them (Fig. 1). In
particular, the domain that we designated DALR, after a characteristic
pattern of amino acid residues that is conserved in many of the
respective sequences, has been recognized in ArgRS (where it has been
designated Add-2), MetRS, and the RS for the three aliphatic amino
acids (Cavarelli et al. 1998
) but, to our knowledge, not in CysRS,
class I LysRS, or the
-subunit of the bacterial GlyRS. The
detection of the DALR domain in these additional sets of aaRSs makes it
the most widespread domain in aaRSs, after the two core domains. It is
an
-helical domain with a unique architecture that has been
implicated in anticodon binding (Brunie et al. 1990
; Cavarelli et al.
1998
). Furthermore, it has been shown that deletions in the
carboxy-terminal portion of the
-subunit of Escherichia
coli GlyRS affect tRNA recognition (Toth and Schimmel 1990
), which
appears to be compatible with an anticodon-binding function of the DALR
domain. In this regard, the presence of the DALR domain in the class I
LysRS is especially interesting because this aaRS also contains an
anticodon-binding domain shared with GluRS (Fig. 1A). The combination
of these two domains may indicate a complex mode of anticodon binding
by the LysRS.
Other connections between accessory domains are confined within class I
or class II. In particular, there is a remarkable colinearity of the
domain arrangements in class I aaRSs that are specific for aliphatic
amino acids (Val, Ile, and Leu) and methionine. In addition to the
carboxy-terminal DALR domain, these aaRSs share a large common insert
in the core that contains five partially conserved motifs subject to
deletion or rearrangement (Fig. 1A). Furthermore, all ValRS and subsets
of aaRSs for each of the other three amino acids in this subclass of
class I also contain an inserted Zn-ribbon module; a similar module is
inserted also in class I LysRS (Fig. 1A). Another domain typical of
class I is the insert shared by GluRS, GlnRS, and CysRS (Fig. 1A). In
class II, the most common domains, after the core, are the
/
-structured anticodon-binding domain found in HisRS, ThrRS,
and ProRS as well as eukaryotic and archaeal GlyRS and the OB-fold
anticodon-binding domain present in AspRS, AsnRS, and LysRS (Fig. 1B).
The other accessory domains are found in aaRSs of one or two specificities.
Several aaRSs share conserved domains with other classes of proteins,
both involved in translation and performing very different functions
(cf. Koonin et al. 1994
; Simos et al. 1996
,1998
; Tas and Murray 1996
;
Markus et al. 1998
; Aravind and Koonin 1999
). In the course of the
present study, we identified six additional domains that are shared by
aaRSs and proteins of other functional classes. For each of these
cases, the evolutionary relationship between the respective domains in
the aaRSs and other proteins was supported by a statistically
significant sequence similarity (e < 0.01 or better) as
computed using the PSI-BLAST program.
The first unexpected finding involves a domain that is inserted in the
core of bacterial AspRS and in the B subunit (GatB) of archaeal
Glu-tRNAGln amidotransferases [hereinafter GAD domain, after
GatB-AaRS-for-Asp (D)] (Figs. 1B and
2A). In archaeal GatB proteins,
the GAD domain also forms an insert that is readily detectable by
comparison with the bacterial counterparts (data not shown). The GAD
domain contains ~120 amino acid residues and, as seen in the X-ray
structure of the Thermus thermophilus AspRS, consists of an
antiparallel
-sheet flanked by
-helices (Delarue et al. 1994
)
and resembles a circularly permuted ferredoxin-like fold (data not
shown). GAD domain has been tentatively implicated in the stabilization
of the interaction of the bacterial AspRS with the cognate tRNA
(Delarue et al. 1994
). This is generally compatible with the fact that GatB does not possess the transamidase activity (which resides in GatA)
and is expected to be involved in tRNA recognition, although it may
also be responsible for the ATPase activity of the complex (Curnow et
al. 1997
). The presence of the GAD domain in two different proteins
that recognize tRNAs for acidic amino acids (Asp and Glu) suggests a
specific role of this domain in the recognition of, and possibly
discrimination between, these tRNAs. Given the presence of two versions
of GatB
one with and one without the GAD domain
in most archaea
(Table 1) and its presence in bacterial AspRS (but
not GluRS), a simple hypothesis could be that GAD domain is responsible
for the specific recognition of tRNAAsp and its
discrimination from tRNAAsn. This, however, is hardly
compatible with the presence of the GAD-containing GatB protein in the
archaeon Pyrococcus horikoshii that also encodes an AsnRS
(Table 1). Thus, GAD domain might recognize tRNAGlu in
archaea and tRNAAsp in bacteria.
|
|
The second previously undetected domain is shared by eukaryotic and
some of the bacterial ThrRSs, a distinct family of GTPases (the Obg
family), and guanosine polyphosphate hydrolase (SpoT) and synthetase
(RelA), which are involved in stringent response in bacteria (Cashel et
al. 1996
). We named it the TGS domain, after ThrRS,
GTPase, and SpoT (Figs. 1B and 2B).
Interestingly, TGS domain was detected also at the amino terminus of
the uridine kinase from the spirochaete Treponema pallidum
(but not any other organism, including the related spirochaete
Borrelia burgdorferi) where it precedes the "HxxxH"
domain, in an arrangement similar to that seen in ThrRS (Fig. 1B; see
below). TGS is a small domain that consists of ~50 amino acid
residues and is predicted to possess a predominantly
-sheet
structure; this is one of the few domains in the aaRSs for which no
structure has been determined so far (Fig. 1B). There is no direct
information on the functions of the TGS domain, but its presence in two
types of regulatory proteins (the GTPases and guanosine polyphosphate
phosphohydrolases/synthetases) suggests a ligand (most likely
nucleotide)-binding, regulatory role.
We observed that the
-subunit of bacterial GlyRS contains a domain
that showed a distant but statistically significant similarity to the
recently described HD superfamily of hydrolases [Figs. 1B and 2C;
(Aravind and Koonin 1998
)]. The principal predicted catalytic residues
of the HD hydrolases, namely the histidine-aspartate doublet that is
the namesake of the superfamily, are missing in GlyRS-
, although a
carboxy-terminal aspartate also implicated in catalysis is conserved;
this resembles the conservation pattern seen in the guanosine
polyphosphate synthetases (RelA) [Fig. 2C (Aravind and Koonin 1998
)].
The function of the HD domain in the
-subunit of GlyRS remains
uncertain; it has been reported that the amino-terminal one-half of the
-subunit, along with the
-subunit, is required for the
glycyl-adenylate formation (Toth and Schimmel 1990
). An interesting
aspect of these observations is that they make the
-subunit of the
bacterial GlyRS the only aaRS subunit that does not contain the core
domain of either class I or class II (Fig. 1A,B).
ThrRs and AlaRS share a domain that is typified by the presence of two
conserved histidines separated by three amino acid residues and was
accordingly designated the HxxxH domain (Fig. 1B; data not shown). The
HxxxH consists of ~120-140 amino acids and is predicted to possess
a mixed
/
structure; along with the TGS domain, this is one
of the remaining structurally uncharacterized domains in aaRSs (Fig.
1B). In addition to the aaRSs of two specificities, the HxxxH domain
was found in four uncharacterized gene product (three from the archaeon
P. horikoshii and one yeast) that contain additional sequences
similar to those seen in AlaRS and seem to have evolved from the latter
by gene truncation. A version of the HxxxH with a disrupted motif was
detected in the ThrRS from Mycoplasma as well as in the uridine kinase
from T. pallidum (see above). Finally, a fragment of the HxxxH
domain is fused to the Pseudomonas syringae CmaT protein that,
interestingly, is involved in nonribosomal peptide synthesis (Ullrich
and Bender 1994
). An HxxxH signature is generally typical of
metal-dependent hydrolases, for example Zn-dependent proteases. The
presence of a domain containing this motif in aaRSs may suggest a
functionally important hydrolytic activity, for example, hydrolysis of
mischarged aminoacyl-tRNAs.
We identified a winged helix-turn-helix (HTH) domain at the amino
termini of the PheRS
-subunit from eukaryotes and archaea (including the crenarchaeon Sulfolobus solfataricus but with a highly modified version in Methanococcus jannaschii) and the
spirochaetes (Figs. 1B and 2D). This domain is specifically related to
the similar nucleic-acid-binding domains from double-stranded (ds)RNA adenosine deaminases, ribosomal protein S10, and the poxvirus dsRNA-binding protein E3L (Fig. 2D). The structure of the adenosine deaminase has been recently determined, and thus the winged-HTH structure of the amino-terminal domain of this protein, which has been
shown to bind Z-DNA, was demonstrated experimentally (Schade et al.
1999
). Given that some of the proteins containing this domain, for
example, S10 and E3L, bind RNA, particularly dsRNA, this might be the
likely function of the winged-HTH domain in PheRS. Specifically, it
seems possible that this domain contributes to an unusual, for aaRSs,
mode of tRNA binding via a stem.
Finally, we observed that a domain inserted in the core of the bacterial ProRS (Fig. 1B) is also represented by a family of small proteins found in several bacterial species (typified by the E. coli YbaK protein and accordingly designated "YbaK domain"; data not shown). The structure and function of this domain remain to be determined.
Taken together, all these observations reinforce the notion that aaRSs
are prone to recruiting domains from other types of proteins and hence
acquire additional functional capabilities. The readily recognizable
domain recruitments in aaRSs typically are lineage specific and can be
mapped to very different, ancient or relatively recent, stages of
evolution. For example, given their near ubiquity in bacteria, the TGS
domain, the S4 domain, and the GAD domain most likely became fused to
the ThrRS, TyrRS, and AspRS, respectively, early in bacterial evolution
(see also below). Other apparently ancient domain recruitments in aaRSs include the EMAP domain in MetRS and the winged-HTH domain in PheRS.
The latter, for example, must have been present at the amino terminus
of the PheRS
-subunit already in the common ancestor of archaea
and eukaryotes. In contrast, the glutathione S-transferase domain and the small, coiled-coil interaction module appear to be
relatively recent acquisitions because they are present in aaRSs of
different specificities but exclusively within the animal lineage (Fig. 1).
Other domains that we now consider integral parts of aaRSs, such as
those involved in anticodon binding (e.g., the DALR domain), might have
evolved in the same fashion very early in evolution, but the sources
are not readily identifiable anymore. "Horizontal evolution" of
aaRSs, that is transfer of domains between aaRSs of different
specificities, has been discussed (Delarue and Moras 1993
). It does
seem likely that the observed mosaic of domains in part has been
generated by recombination between aaRS genes themselves, as opposed to
independent acquisition of domains. The presence of the DALR domain
that generally is typical of class I aaRSs in the
-subunit of
GlyRS (see above) may be indicative of this type of an evolutionary
event; this mode of dissemination also seems likely for the EMAP domain
(Fig. 1A,B).
Reconstructing the Evolution of aaRSs
Phylogenetic Trees
We used the multiple alignments of the conserved portions of the aaRSs of all 20 specificities to generate distance matrices and construct phylogenetic trees using the neighbor-joining and Fitch-Margoliash methods. For each of these methods, 1000 bootstrap replications were performed, to evaluate the reliability of the results, and the consensus topology was derived. The consensus topologies for the neighbor-joining and Fitch-Margoliash methods were then combined (see Materials and Methods for details) to produce the final trees shown in Figure 3.
|
Likely Synapomorphies in aaRSs and Their Use as Phylogenetic Markers
For many gene families, analysis of shared derived features
(characters) of proteins, or synapomorphies, can be used as an important complement to the traditional, alignment-based phylogenetic tree analysis (e.g., Makarova et al. 1999
). Primarily, such features are manifest as unique domain arrangements. Synapomorphies can be used
to define monophyletic groups and may be helpful in establishing the
root position because the root cannot lie within a monophyletic group
defined by a synapomorphy. Using synapomorphies as phylogenetic markers
requires distinguishing them, first, from primitive features that were
already present in the common ancestor of the analyzed protein family,
although they might have been lost in some lineages, and second, from
independently acquired features. In cases when there are two distinct
domain architectures within an aaRS specificity, one of these is likely
to be a primitive feature and the other one a derived feature
(synapomorphy). Deciding which is which is not straightforward and can
be confidently done only when conserved features of domain architecture
are seen in different aaRS specificities as discussed below (this is
analogous to tree rooting by paralogy).
We attempted to systematically delineate the likely synapomorphies in aaRSs, to partition the aaRSs of the same specificity into likely monophyletic groups. For this purpose, domain architectures of aaRSs with the same and different specificities were compared in conjunction with clustering by sequence similarity and tree analysis using the Fitch-Margoliash and neighbor-joining methods.
With the exception of ValRS and CysRS, all ubiquitous aaRSs have more than one distinct domain architecture (Fig. 1A,B). Such distinctions do not exist in class I LysRS and in the bacterial-type GlyRS either, but these have limited phyletic distribution (Fig. 1A,B; see discussion below). The complete conservation of the elaborate domain architecture of ValRS, which consists of seven distinct domains, including the core (Fig. 1A), in all studied life forms seems unexpected given the diversity of domain organizations seen in the other aaRSs (see also discussion below).
For some of the aaRSs, synapomorphies appear unambiguous and allow us
to easily detect distinct lines of descent. The most obvious of these
are the two types of LysRS (see above) and GlyRS. As indicated above,
the class I LysRS found in euryarchaea, spirochaetes, and rickettsia is
unrelated to the type II enzyme present in eukaryotes, the rest of the
bacteria, and the crenarchaea. The majority of bacteria possess a GlyRS
that consists of two unrelated subunits (see also the discussion of the
domain architecture of the
-subunit above) and is distinct from
the enzyme found in eukaryotes, archaea, and a small subset of bacteria
(Freist et al. 1996
; Fig. 1B). The
-subunit of the bacterial GlyRS
contains a modified class II core domain and is no more similar to the
eukaryotic-archaeal GlyRS than it is to other class II aaRSs. Thus,
there is no indication that these two types of GlyRS have a common origin.
These two exceptional cases apart, the synapomorphies seen in IleRS are
most striking. Here, the distinction between the eukaryotic, archaeal,
and a small subset of bacterial enzymes, on one hand, and the rest of
the bacterial enzymes, on the other hand, involves four distinct
domains (modules). One of these (the Zn-ribbon) is located differently
in the two sets of IleRSs, two others, namely the C-V/I domain and the
C-V/I/G domain, are present only in the eukaryotic-archaeal subset,
and finally, the more specific carboxy-terminal domains are conserved
within each set but not between them (Fig. 1A). Notably, in this case,
the arrangement of three of these domains, namely the Zn-ribbon and the
C-V/I and C-V/I/G domains, is exactly the same in the
eukaryotic-archaeal IleRS and in the ValRS (Fig. 1A). Thus, the
ancestral domain architecture can be inferred with considerable
confidence, and we are in a position to conclude that bacterial IleRSs
have lost the C-V/I and C-V/I/G domains, whereas the Zn-ribbon has
relocated in the course of bacterial evolution. In the same vein, a
comparison of the two distinct domain architectures of the LeuRSs with
those of the ValRSs and IleRSs suggests that the presence of the
Zn-ribbon in the bacterial as opposed to archaeal-eukaryotic LeuRS is
an ancestral feature, whereas the rearrangement of the modules in the
large insert of bacterial LeuRS is derived (Fig. 1A). Another convincing synapomorphy is seen in the bacterial TyrRSs that possess a
conserved arrangement of two domains (the
-helical
anticodon-binding domain and the S4 domain) that are missing in the
eukaryotic-archaeal set (Fig. 1A). Just as in the case of IleRS, it is
possible to infer the ancestral state "by paralogy" because TrpRS
shares a carboxy-terminal domain (C-Y/W) with the archaeal-eukaryotic
but not bacterial TyrRS (Fig. 1A). More tentatively, it can be
hypothesized that certain more complex domain architectures are more
likely to be derived states than simpler ones, particularly when
inserts in the core domain are involved. This is the case for ProRS,
AspRS, and eukaryotic-type GlyRS (Fig. 1; Table
2).
|
The analysis of other aaRSs illustrates the distinction between those features of domain organization that are likely to be bona fide synapomorphies and those that do not seem to qualify. Consider, for example, MetRS, for which five distinct domain arrangements are discernible (Fig. 1A). The EMAP domain, which is present in the archaeal MetRS and those from several diverse groups of bacteria but not in other bacteria or eukaryotes (Fig. 1A), does not seem to be a useful marker for large-scale phylogenetic analysis. Its distribution does not at all reflect clustering of MetRS by sequence similarity (data not shown) or the topology of the trees constructed using the neighbor-joining and Fitch-Margoliash methods (Fig. 3). Amidst the bacteria, those MetRSs that contain EMAP domain do not form a compact group (Fig. 3). Thus, the phyletic distribution of the EMAP domain may be explained by lineage-specific losses, independent acquisitions, or, most likely, a combination thereof. The mobility of this domain is underscored by the fact that it has been detected also in subsets of TyrRS and PheRS (Fig. 1A,B). Among the other domains found in MetRS, the GST domain and the carboxy-terminal coiled-coil domain (Fig. 1A) could be valid phylogenetic markers, but these would be useful only to examine the evolution within the eukaryotic crown group. In contrast, the Zn-ribbon module that is inserted in the middle of the domain typical of the aliphatic aaRSs in the archaeal, eukaryotic, and some of the bacterial MetRSs (Fig. 1A) is a likely synapomorphy. The distribution of this motif correlates with the clustering by sequence similarity (data not shown) and with the monophyletic groups that are apparent from tree analysis (Fig. 3).
Altogether, unique features of domain architecture that are likely synapomorphies and thus may be valid phylogenetic markers, allowing us to establish or corroborate the primary evolutionary partitioning in the given set of aaRSs, were found for 14 of the 20 specificities (Fig. 1A,B; Table 2). Thus, at least in the case of aaRSs, comparative analysis of domain architectures is a major source of evolutionary information that must be carefully reconciled with other lines of evidence, to produce credible evolutionary scenarios.
Evolutionary Scenarios for aaRS
The most notable outcome of this analysis seems to be that, with only a few exceptions, the trees produced by the described procedures are readily interpretable in terms of relatively simple evolutionary scenarios (Fig. 3; Table 2). The most contentious issue in any phylogenetic analysis is the root position. The apparent synapomorphies do not directly indicate the root position; they only outline monophyletic groups. However, the correlation between partitioning produced by comparison of domain architectures, the results of clustering by sequence similarity, and modified midpoint rooting that was observed for the majority of the aaRS typically allows one to locate the root with considerable confidence (Fig. 3; Table 2).
We examined the trees and the domain architectures of the aaRSs with
regard to the evolutionary relationships between the three major
divisions of life
bacteria, archaea, and eukaryotes. Assuming the
monophyly of each of these divisions and (for the moment) ignoring the
possibility of horizontal gene transfer, three topologies of a rooted
tree are possible: (1) B|A,E; (2) E|A,B, and (3) A|B,E
(A = archaea, B = bacteria, E = eukaryotes; the vertical line
indicates the root position). The predominant pattern in the aaRS
phylogenies that is seen in 12 of the 20 specificities involves
partitioning into two major groups, one of which includes archaea,
eukaryotes, and a subset of bacteria, and the second one the rest
(typically, the majority) of bacteria (Fig. 3; Table 2). The most
likely position of the root typically is between these groups. Thus,
these 12 phylogenies are best compatible with model 1, which has been
aptly designated the standard model by Doolittle and Handy (1998)
. The
conclusion that the standard model is best compatible with the data
rests on some semblance of "relaxed molecular clock" being valid.
Most of the aaRS trees contain a long branch separating
archaea-eukaryotes (in many cases, with the admixture of several
bacterial species) and the bulk of bacteria; this is where the
procedures we used typically place the root, and this is supported by
the analysis of likely synapomorphies (Fig. 3; Table 2). However,
should this long branch correspond instead to a systematic, major
increase in the rate of aaRS evolution at the base of the bacterial
trunk, it would be impossible to rule out topologies 2 and 3. In that
case, the observed distinctions in domain architectures would be
interpreted to indicate that the archaeal-eukaryotic architecture is
the primitive state, whereas the bacterial architecture is the derived
state (synapomorphy).
Examination of the aaRS trees lends no support to evolutionary schemes
that postulate the origin of eukaryotes from a particular subdivision
of archaea, such as the eocyte hypothesis (Rivera and Lake 1992
) or the
hydrogen hypothesis (Martin and Müller 1998
). No specific
association was seen between eukaryotic aaRSs and those from
Crenarchaeota as suggested by the first of these schemes or those from
methanogens as implied by the second one. It should be further noticed
that the standard model is supported by the results of rooting by
paralogy where such are available, namely for aliphatic amino acid
aaRSs (Brown and Doolittle 1995
; Hashimoto et al. 1998
), tyrosine and
tryptophan (Brown et al. 1997
), and aspartate-asparagine and lysine
(see below).
The standard model, however, requires major amendments to account for
the topology of the aaRS trees. There are only three trees, curiously
all from class I, that conform to this model precisely, namely those
for LeuRS, TyrRS, and TrpRS (this does not rule out interesting and
unusual events in the evolution of these aaRSs; see Table 2; discussion
below). The remaining trees fall into two categories: (1) those in
which eukaryotic aaRSs cluster with the bacterial ones, to the
exclusion of the archaea, namely ValRS, AlaRS, and ThrRS, and (2) those
in which varying subsets of bacterial aaRSs invade the
eukaryotic-archaeal cluster
all the rest except for LysRS, CysRS, and
HisRS (Fig. 3). Class I LysRS is seen only in archaea and a small
subset of bacteria (Fig. 3), so, by definition, the standard model (or
any alternative model) does not apply. Class II LysRS so far was found
in only one archaeal species, the crenarchaeon S. solfataricus; the Sulfolobus LysRS clearly groups with the
bacterial subtree (Fig. 3). CysRS so far has been detected in only two
archaeal species (see above). The CysRS tree is poorly resolved, and
there are no synapomorphies to complement it, but both clustering and
modified midpoint rooting methods suggest a root position between the
eukaryotic branch and the rest of the tree that includes the two
archaeal CysRSs along with the bacterial ones (Fig. 3). The HisRS tree
does not show a clear separation of the eukaryotic-archaeal and
bacterial clusters but rather contains a trifurcation in which archaea
and eukaryotes are equidistant from each other and from the bulk of bacteria; thus, this tree is not incompatible with the standard model
although it lends no direct support to it (Fig. 3).
The eukaryotic-bacterial affinity seen in three aaRSs is readily explained by displacement of the original eukaryotic gene by a cognate bacterial version, in all likelihood, the mitochondrial gene transferred to the nuclear genome (although, just as for the standard model and the alternative models discussed above, the possibility of a major acceleration of evolution in the archaeal lineage cannot be formally ruled out as the underlying basis of the observed topology). In one case, that of ThrRS, this has been preceded by a duplication of the mitochondrial gene (Fig. 3). Conversely, displacement of the mitochondrial enzyme by the ancestral eukaryotic one seems to have occurred in the evolution of HisRS and SerRS, in the latter case following a duplication (Fig. 3).
Examination of the emerging complete evolutionary picture (Fig. 3; Tables 2 and 3) seems to suggest horizontal gene transfer, rather than lineage-specific gene loss, as the principal explanation for the anomalies in the evolution of aaRSs. Given that clustering of a subset of bacteria with eukaryotes (and/or archaea) is observed for the majority of the aaRSs, the lineage-specific gene loss theory would imply that the LCA contained diverged duplicates of many, if not all, of the aaRS genes, which have been differentially lost in different lineages during subsequent evolution. One would assume, however, that these diverged duplicates of aaRSs have been fixed by selection in the lineage leading to the LCA as a result of the adaptation of the two versions to distinct functional niches. Should that be the case, there would be selective pressure to maintain both versions, along with likely advantages of shedding one, and we would be sure to expect relics of the original duplication persisting in at least some species and some aaRS specificities. Such traces, however, are conspicuously missing.
|
It is instructive to consider two cases that, at a superficial glance,
might have been considered evidence in support of the primordial
duplication theory. The first one involves the two unrelated types of
LysRS, one of which belongs to class I and the other one to class II.
It has been noted that if an organism was found that encoded both
types, this would support the differential loss theory (Doolittle and
Handy 1998
). A genome of such an organism is available
the spirochaete
T. pallidum. A closer analysis shows, however, that T. pallidum encodes a distinct type of class II LysRS
the small
form comprised of the core domain alone
that is present, in addition
to the typical bacterial LysRS, in
-proteobacteria, and
Aquifex (Figs. 1A and 3). Under the differential loss theory, one would be forced to conclude that the LCA encoded three LysRS
the class I enzyme and two distinct forms of the class II enzyme. Evolution
of the truncated version in one of the bacterial lineages, with its
subsequent dissemination by horizontal transfer, seems to be a more
realistic explanation for the observed phyletic distribution of LysRS.
The presence of two versions of HisRS in Aquifex,
Synechocystis, and Bacillus also might appear to
support the differential gene loss theory. The HisRS tree is more
difficult to interpret than those for other aaRSs because of the
uncertainty of the root position (Fig. 3). Nevertheless, assuming that
the standard model still applies, the differential loss scenario
predicts that whereas one of the HisRS in Aquifex,
Synechocystis, and Bacillus should be a typical
bacterial form, the other one
inherited directly from the LCA
should
be equidistant from the archaeal and eukaryotic orthologs. In reality,
however, reliable clustering of this second form with the archaeal
HisRS was observed (Fig. 3), which again makes horizontal transfer, in
this case from an archaeal source, the most likely explanation.
An equally strong argument for the major role of horizontal gene transfer, as opposed to differential loss, in aaRS evolution is the nonrandomness of the set of bacterial species that invade the archaeal-eukaryotic part of the phylogenetic trees for the aaRSs (Fig. 3; Table 3). The main contribution to this invasion is from bacterial groups that include intimate parasites and symbionts of eukaryotes, particularly spirochaetes and chlamydiae (Table 3). The differential gene loss theory offers no explanation why these groups of bacteria should have lost the aaRS versions retained by the majority of bacteria. In contrast, it is obvious that, compared with other bacteria, these organisms had a greater opportunity to acquire eukaryotic genes because of their long-term and intimate contact with the eukaryotic hosts.
Thus, multiple horizontal gene transfers, typically resulting in the displacement of the original aaRS genes in the recipient lineage, seem to have made the principal contributions to the deviations of the phylogenetic trees for the aaRSs from the standard model. It must be emphasized, however, that should this model prove to be wrong (see above), this would not affect the conclusion that these horizontal transfer events occurred in the course of evolution of aaRSs. Examination of the tree topologies in Figure 3 makes it clear that should the root for most of the trees lie, for example, on the branch connecting eukaryotes and archaea (model 2 above), the statistically supported clustering of subsets of bacterial aaRSs with eukaryotes still remains to be accounted for, horizontal gene transfer being the most likely explanation.
Most of the tree topologies are readily explained by a small number of
horizontal transfer events; three trees, namely, MetRS, ArgRS, and
HisRS, present a complex but seemingly interpretable picture, and
two
SerRS and CysRS
are hard to interpret (Table 2). It has been
suggested that the anomalies observed in some of the aaRS trees,
particularly for IleRS, can be explained by just one horizontal gene
transfer from eukaryotes, with subsequent dissemination among bacteria
(Shiba et al. 1997
; Brown et al. 1998
; Doolittle and Handy 1998
). This
is a plausible idea that is compatible with the reliable clustering of
all bacterial species that are suspected to have acquired the
respective eukaryotic gene in the IleRS tree and in the HisRS tree
(Fig. 3). Furthermore, the bacterial groups that appear to be most
prone to horizontal transfer from eukaryotes
the spirochaetes and
chlamydiae
also form clusters in the CysRS and TrpRS trees, which
suggests gene exchange between them, although in these cases,
horizontal transfer from eukaryotes is not suspected (Fig. 3). The
dissemination of the eukaryotic-type IleRS on plasmids, which renders
bacteria resistant to the antibiotic mupirocin, explains not only the
mechanism of, but also the likely selective pressure behind at least
some of these interbacterial horizontal gene transfers (Hodgson et al.
1994
; Sassanfar et al. 1996
; Brown et al. 1998
). The topologies of the
trees for MetRS, ArgRS, and Asp-AsnRS, however, are not readily
compatible with this possibility and, rather, suggest multiple
transfers of eukaryotic genes into different bacteria (Fig. 3; Table 2).
Gene transfer from archaea to bacteria has been much less prominent, at
least amidst the bacterial taxa that have been sampled so far (Table
2). The apparent transfer of HisRS from archaea to bacteria has already
been discussed. Other events of this type involve class I LysRS, PheRS,
and possibly MetRS (Table 2). Given its ubiquity in archaea, sporadic
presence in bacteria, and apparent absence in eukaryotes, it seems most
likely that class I LysRS evolved in archaea and has been horizontally
transferred to bacteria. Notably, the tree topology, which is strongly
supported by bootstrap analysis, suggests two independent transfer
events
from euryarchaea to the spirochaetes and from Crenarchaea to
Rickettsiae (Fig. 3). The notable aspect of the evolutionary scenario
for PheRS is that in most bacteria, including the spirochaetes, the
genes for
and
subunits form an operon, whereas in the
archaea, which apparently donated both genes to the spirochaetes, they
are not adjacent. It appears likely that the operon organization is
ancestral, and an archaeon containing this operon might be found eventually.
Other types of interdivision transfer appear to be rare. Horizontal
gene transfer from bacteria to archaea seems to be a distinct possibility only for CysRS, which is present in two of the four completely sequenced archaeal genomes, AsnRS, so far identified only in
P. horikoshii, and class II LysRS that might have been acquired in the Sulfolobus lineage. In addition, the TrpRS of P. horikoshii apparently has been acquired from eukaryotes
(Fig. 3). This limited extent of aaRS gene exchange between bacteria and archaea appears rather unexpected, given the prominence of horizontal transfer of aaRS genes from eukaryotes to bacteria, and also
the apparently considerable exchange of other genes between archaea and
bacteria (Koonin et al. 1997
; Aravind and Koonin 1998
; Makarova et al.
1999
). The most straightforward explanation, which allows direct
experimental verification, is that bacterial aaRSs are generally poorly
compatible with archaeal tRNAs. Of course, a cautionary note regarding
the small available sampling of complete archaeal genomes, which all
come from thermophilic species, also applies to this direction of
horizontal gene transfer.
Unlike the relationship between the three primary divisions of life
that could be resolved for the majority of aaRSs in support of a
modified standard model, no consistent, large-scale bacterial phylogeny
emerged from the aaRS trees. This is in itself not surprising because
inconsistent and unreliable tree topologies have been observed
frequently for different bacterial genes. In the majority of the aaRS
trees, the "true bacterial" part (i.e., those bacterial aaRSs that
have not been horizontally transferred from eukaryotes or archaea as
discussed above) shows, more or less, a star topology, with no or
little statistical support for any particular relationship between the
major lineages (Fig. 3). The trees for TyrRS, TrpRS, and LeuRS are
exceptional in that strongly supported
but different in each
case
partitioning of the bacteria into two clusters is observed (Fig.
3). In the rest of the trees, the only clusters that are consistently
seen are the terminal branches, namely, the two species of
-proteobacteria (E. coli and Haemophilus
influenzae), spirochaetes (B. burgdorferi and T. pallidum), and mycoplasmas (Mycoplasma genitalium and
Mycoplasma pneumoniae). Interestingly, however, even these
relatively close affinities are violated in some of the aaRS trees.
Thus, E. coli and H. influenzae behave differently in
the TyrRS tree, whereas the two spirochaetes show different affinities
in the ProRS and ThrRS trees, in addition to the aforementioned
presence of the truncated form of Class II LysRS in Treponema
but not in Borrelia (Fig. 3). In each of these cases, the
members of the respective pair of related species cluster, with a good
bootstrap support, with other, distantly related bacteria or with
eukaryotes. Thus, in the ThrRS tree, the Treponema protein
clusters with
-proteobacteria, whereas the one from
Borrelia clusters with Aquifex and
Mycobacterium (Fig. 3). In the ProRS tree, there is strongly
supported clustering of Borrelia with eukaryotes and
Treponema with other bacteria (Fig. 3); the latter association
is corroborated by the insertion of the YbaK domain that is a hallmark
of bacterial ProRS and is present in Treponema but not in
Borrelia. Horizontal transfer of the eukaryotic ProRS gene
into the Borrelia lineage subsequent to its divergence from
Treponema, followed by the elimination of the typical
bacterial gene, might explain it.
Other unexpected, but statistically supported, bacterial clusters are seen in the trees for AspRS (Bacillus-Synechocystis) and HisRS (spirochaetes-Helicobacter); clustering of spirochaetes with chlamydiae, mentioned above, belongs in the same category. These observations seem to indicate horizontal transfer of at least some of the aaRS genes between distant bacterial species. Additional, more ancient gene transfer events might be obscured by the star topology.
Our phylogenetic analysis may clarify the evolutionary scenario for
AspRS and AsnRS. Eukaryotes encode both a cytoplasmic and a
mitochondrial aaRS for each of these amino acids; archaea typically
lack AsnRS (so far the only exception is P. horikoshii) and
incorporate asparagine into proteins via the transamidation route,
whereas the majority of bacteria encode AsnRS (Curnow et al. 1996
;
Shiba et al. 1998
). The insertion of the GAD domain identifies
bacterial aaRSs as a likely monophyletic group. Clustering by sequence
similarity suggested the root position between this group and the rest
of the AspRSs together with AsnRS. The midpoint procedure, however,
placed the root between all AspRSs and AsnRSs. To resolve the
ambiguity, we aligned the sequence of class II LysRS with those of
AspRS and AsnRS and rooted the tree using the LysRS as an outgroup.
Under this approach, the root was confidently placed between bacterial
AspRS and the rest of the AsxRSs (Fig. 3; data not shown). Thus, the
most likely scenario is that AsnRS originally evolved by duplication of
eukaryotic AspRS, which was followed by horizontal transfer into
bacteria, perhaps with subsequent dissemination among bacterial
species, and at least one archaeon (Fig. 3; Table 2). This scenario is
similar to that for GluRS and GlnRS (Siatecka et al. 1998
; Fig. 3) but
different from the one recently proposed for AsnRS, which postulated
its origin by duplication of the AspRS gene early in bacterial
evolution (Shiba et al. 1998
). It remains unclear why the topologies of
the AspRS and AsnRS trees observed in our analysis and in that of Shiba and coworkers (1998)
were different; differences in the alignments are
likely to contribute.
The case of the eukaryotic-type GlyRS is particularly interesting. Here, both the analysis of domain architectures (a unique insert in the core of the eukaryotic and archaeal proteins) clustering and modified midpoint rooting procedures suggested the likely position of the root between archaea-eukaryotes and bacteria (Fig. 3). However, only a minority of bacterial species possess this form of GlyRS. It appears that either a horizontal transfer of the archaeal-eukaryotic GlyRS to bacteria occurred very early during evolution or this is the ancestral GlyRS that has been displaced by a newly evolved form in the majority of bacteria.
Evolutionary scenarios for CysRS and SerRS remain uncertain. There is a
notable correlation between the absence of CysRS and the presence of an
unusual, highly diverged SerRS in some of the archaea, namely the
methanogens M. jannaschii and Methanobacterium thermoautotrophicum (Fig. 3). The properties of this unique SerRS and the pathway of cysteine incorporation into proteins in these archaea remain to be investigated experimentally. The SerRS from the
other two archaeal species reliably cluster with the eukaryotes and a
small subset of bacteria (Fig. 3). A recent study by others also
reported this dramatic difference between the two types of archaeal
SerRS as well as clustering of the SerRS from the methanogens with
those from Gram-positive bacteria and Cyanobacteria (Lenhard et al.
1999
); our analysis failed to provide support for the latter grouping.
Conclusions
Comparison of the complete sets of aaRSs from diverse species of bacteria, archaea, and eukaryotes reveals a number of unique domain architectures. Despite numerous structural studies on aaRSs, several previously undetected domains could be identified using improved methods of sequence analysis. The exact functions of these domains and the mode of their interaction with the aaRS core remain to be determined by combination of structural and biochemical analyses. Some of the distinct domain arrangements appear to be synapomorphies, that is, they define monophyletic groups within a given aaRS specificity.
Combined with traditional phylogenetic trees, analysis of these
synapomorphies suggests relatively simple evolutionary scenarios for
most of the aaRSs. All these scenarios are based on the standard model
of evolution for the translation system, which postulates an original
radiation of bacteria and the common ancestor of archaea and
eukaryotes. This standard model is compatible with the results of the
phylogenetic analysis of aaRSs, both qualitatively
at the level of
synapomorphies
and quantitatively
in terms of the statistically supported topology of phylogenetic trees. However, alternative models
for the relationships between bacteria, archaea, and eukaryotes cannot
be ruled out if a major, systematic increase in the evolutionary rates
at the base of the bacterial subtrees is postulated.
Regardless of the exact model of relationships between bacteria, archaea, and eukaryotes, phylogenetic analysis makes it clear that evolution of aaRSs involved a variety of horizontal gene transfers. The principal types of such events are transfer of eukaryotic aaRS genes into bacteria, resulting in the displacement of the respective ancestral bacterial genes, and displacement of original eukaryotic genes by mitochondrial genes transferred to the nuclear genome. Instances of likely horizontal transfer of aaRS genes from archaea to bacteria also were detected but are less common. There were no clear indications of horizontal transfer of aaRS genes from bacteria to archaea, although two likely cases of a eukaryotic gene being acquired by an archaeon were detected. In addition, for several aaRSs, there were strong indications of gene transfer between major bacterial lineages, and it appears that other events of this type might be obscured by the star topology of the bacterial trees.
The influx of eukaryotic aaRS genes into the bacterial world has been nonrandom. The fraction of transferred eukaryotic genes is the greatest in bacterial groups that consist predominantly or exclusively of parasites or symbionts, particularly the spirochaetes. Thus, horizontal gene transfer seems to have been a major force in the evolution of aaRSs, but some routes have been strongly favored, (e.g. from eukaryotes to spirochaetes), whereas others might have been (nearly) prohibited (from bacteria to archaea). Further genome sequencing, for example, of nonthermophilic and particularly symbiotic archaea, should be revealing in terms of the nature of these preferences and restrictions. It hopefully will become clear which of them simply correlate with the intensity of contact between two particular taxa and which stem from intrinsic features of the translation system, such as compatibility (or lack thereof) between aaRSs and the cognate tRNAs.
| |
METHODS |
|---|
|
|
|---|
Databases and the aaRS Sequence Set
The databases used in this study were the nonredundant database (NR) at the NCBI (NIH, Bethesda, MD) and a collection of aaRS sequences from completely sequenced genomes. The latter were initially extracted from the Genomes division of the Entrez system (http://www.ncbi.nlm.nih.gov/Entrez/Genome/org.html) using the available genome annotation. Additionally, all the protein sequences from complete genomes were searched (see below) using the E. coli aaRS sequences as queries, to detect any aaRS homologs that might have been misannotated. The aaRS sequence set used in this analysis included the entire complement of aaRSs from 12 complete bacterial genomes (E. coli, H. influenzae, Helicobacter pylori, M. genitalium, M. pneumoniae, Bacillus subtilis, Chlamydia trachomatis, B. burgdorferi, T. pallidum, Mycobacterium tuberculosis, Synechocystis sp., Aquifex aeolicus), four archaeal genomes (M. jannaschii, M. thermoautotrophicum, Archaeoglobus fulgidus, P. horikoshii), and one eukaryotic genome, that of the yeast Saccharomyces cerevisiae; in addition, all the available aa