|
|
|
|
Vol. 11, Issue 2, 240-252, February 2001
LETTER
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
By comparing the gene order in the completely sequenced archaeal genomes complemented by sequence profile analysis, we predict the existence and protein composition of the archaeal counterpart of the eukaryotic exosome, a complex of RNAses, RNA-binding proteins, and helicases that mediates processing and 3'->5' degradation of a variety of RNA species. The majority of the predicted archaeal exosome subunits are encoded in what appears to be a previously undetected superoperon. In Methanobacterium thermoautotrophicum, this predicted superoperon consists of 15 genes; in the Crenarchaea, Sulfolobus solfataricus and Aeropyrum pernix, one and two of the genes from the superoperon, respectively, are relocated in the genome, whereas in other Euryarchaeota, the superoperon is split into a variable number of predicted operons and solitary genes. Methanococcus jannaschii partially retains the superoperon, but lacks the three core exosome subunits, and in Halobacterium sp., the superoperon is divided into two predicted operons, with the same three exosome subunits missing. This suggests concerted gene loss and an alteration of the structure and function of the predicted exosome in the Methanococcus and Halobacterium lineages. Additional potential components of the exosome are encoded by partially conserved predicted small operons. Along with the orthologs of eukaryotic exosome subunits, namely an RNase PH and two RNA-binding proteins, the predicted archaeal exosomal superoperon also encodes orthologs of two protein subunits of RNase P. This suggests a functional and possibly a physical interaction between RNase P and the postulated archaeal exosome, a connection that has not been reported in eukaryotes. In a pattern of apparent gene loss complementary to that seen in Methanococcus and Halobacterium, Thermoplasma acidophilum lacks the RNase P subunits. Unexpectedly, the identified exosomal superoperon, in addition to the predicted exosome components, encodes the catalytic subunits of the archaeal proteasome, two ribosomal proteins and a DNA-directed RNA polymerase subunit. These observations suggest that in archaea, a tight functional coupling exists between translation, RNA processing and degradation, (apparently mediated by the predicted exosome) and protein degradation (mediated by the proteasome), and may have implications for cross-talk between these processes in eukaryotes.
| |
INTRODUCTION |
|---|
|
|
|---|
Operonic organization of genes, whereby
groups of functionally linked genes are adjacent in the chromosome
allowing their regulated cotranscription and subsequent translation
from a single polycistronic mRNA, is the governing principle of
bacterial and archaeal genome organization and expression ( Jacob et
al. 1960
; Miller and Reznikoff 1978
; Huynen and Snel 2000
). However,
comparisons of the arrangement of orthologous genes in completely
sequenced prokaryotic genomes have shown that not only is there very
little conservation of gene order above the operon level even between relatively close species, but operons themselves show considerable evolutionary plasticity (Mushegian and Koonin 1996
; Tatusov et al.
1996
; Koonin and Galperin 1997
; Siefert et al. 1997
; Watanabe et al.
1997
; Dandekar et al. 1998
; Itoh et al. 1999
). Only several operons
that encode physically interacting subunits of multiprotein complexes
such as the ribosomal subunits or the proton ATPase are conserved
across a wide range of genomes (Mushegian and Koonin 1996
; Dandekar et
al. 1998
).
Conceptually, the operonic principle should allow for systematic
prediction of the functions of uncharacterized genes on the basis of
genomic context (Overbeek et al. 1999
; Huynen and Snel 2000
; Huynen et
al. 2000
). The underlying assumption is that genes that belong to the
same operon always encode functionally linked proteins, i.e., proteins
comprising subunits of the same macromolecular complex, catalyzing
different stages of the same pathway or regulating different aspects of
the same process. The generally low conservation of gene order in
prokaryotes is a mixed blessing for this approach. The relatively small
number of conserved gene strings limits the possibilities for
systematic prediction of gene functions. However, those few gene
strings that are actually conserved are confidently inferred to form
operons and therefore provide robust material for functional predictions.
During a systematic comparative analysis of the gene order conservation
in the sequenced bacterial and archaeal genomes, we attempted to obtain
a conservative estimate of the predictive power of this approach and
found that, from the set of 2422 clusters of orthologous groups (COGs)
of proteins (Tatusov et al. 1997
, 2000
), major functional predictions
were possible for ~90, or ~4% of the total (Wolf et al. 2000
).
In most of these cases, the prediction applied to just one
uncharacterized gene (a representative of a COG) that belonged to a
known or clearly predicted operon. In several instances, however,
previously undetected operons were identified and their functions could
be predicted through a combination of genome organization comparison
and detailed sequence analysis. Here we present and discuss in greater
detail the most notable of such cases, the prediction of the archaeal
counterpart to the eukaryotic exosome, a complex of RNAses, RNA-binding
proteins, and helicases that mediates processing and 3'->5'
degradation of a variety of RNA species (Mitchell et al. 1997
; Decker
1998
; van Hoof and Parker 1999
). We predict several previously
undetected exosome subunits and show that the predicted operons coding
for potential exosome components also include genes for the catalytic subunit of the proteasome, those for two ribosomal proteins, and a
DNA-directed RNA polymerase subunit. These observations suggest tight
functional or perhaps even physical coupling between the exosome and
the proteasome and may have implications for the functions of these
complexes in eukaryotes.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Prediction of Archaeal Exosome Subunits and the Potential Exosomal Superoperon
The eukaryotic exosome consists of several paralogous proteins
containing the Rnase PH domain and known or predicted to possess 3'->5' exonuclease activity; two additional 3'-5'
exonucleases containing, respectively, the RNase II and RNase D
domains; RNA-binding proteins containing the S1 domain; and more
loosely associated, but functionally connected, helicases and adapter
proteins (the subunit composition apparently can vary in different
eukaryotes; the yeast subunits are listed in Table
1) (Mitchell et al. 1997
; Decker 1998
; van
Hoof and Parker 1999
). All archaea, except for Methanococcus
jannaschii and Halobacterium sp., encode highly conserved
orthologs of the Rrp41p and Rrp42p subunits predicted to possess the
exonuclease activity (Tables 1, 2); these
proteins have been annotated as an RNase PH homolog and polynucleotide phosphorylase homologs, respectively, in some of the original annotations of archaeal genomes (Smith et al. 1997
; Kawarabayasi et al.
1999
). A systematic comparative analysis of the archaeal genomes within
the framework of the COG project (Makarova et al. 1999
; Tatusov et al.
2000
) resulted in the identification of the archaeal ortholog of the
Rrp4p subunit which, again, is missing in M. jannaschii and
Halobacterium sp. (Tables 1, 2; Fig.
1). This protein contains two predicted
RNA-binding domains, namely a central S1 domain and a previously
undetected, carboxy-terminal KH domain (Fig. 1). In addition, it
contains a small amino-terminal domain, which we designated pre-S1,
that is predicted to adapt an all-
-sheet structure and includes a
characteristic, conserved GXG signature (Fig. 1). It has been reported
that Rrp4p is a 3'-5' exonuclease (Mitchell et al. 1997
).
However, neither the S1 nor the KH RNA-binding domains are known to
possess enzymatic activity and the small pre-S1 domain has no features
suggestive of an enzymatic function either (Fig. 1). Thus it seems
possible that Rrp4p is an RNA-binding subunit of the exosome, and the
reported nuclease activity could be spurious; an alternative, unusual
possibility is that, in this case, the S1 domain itself is a nuclease.
|
|
|
During the recent systematic comparison of the gene order in
prokaryotic genomes (Wolf et al. 2000
), we observed that the genes
coding for orthologs of Rrp4p, Rrp41p, and Rrp42p form a conserved
triad in all archaeal genomes except M. jannaschii and Halobacterium sp. (Fig. 2A).
Conservation of three genes in a row in multiple archaeal genomes,
particularly between Euryarchaeota and Crenarchaeota, is unusual and is
seen in only a few of the most conserved operons which encode
physically interacting subunits of large macromolecular complexes such
as the ribosome or the H+-ATPase (Mushegian and Koonin 1996
;
Dandekar et al. 1998
; Huynen and Snel 2000
; Huynen et al. 2000
).
Therefore, the conservation of the order among the genes coding for the
archaeal counterparts of the core subunits of the eukaryotic exosome in
most of the archaeal genomes made us speculate that these proteins
could form a complex equivalent to the exosome and prompted a further
investigation in search of potential additional components and
connections with other functional systems. To this end, we applied an
iterative strategy for genome context analysis that combined comparison of genome organization with additional, in depth sequence similarity searches. Detailed sequence analysis was performed for members of the
detected conserved gene strings, after which, if new homologs were
detected, the next round of genome context examination was done.
|
A multiple alignment of the regions of the archaeal genomes around the
exosome gene triad was constructed by manually combining the relevant
sections of template-anchored genome alignments that were produced for
each of the genomes (see Methods; Wolf et al. 2000
). The genes that
comprised the multiple alignment were reannotated using the information
already contained in the COG database, searches against a collection of
protein domains using the NCBI CD server, and iterative database
searches using the PSI-BLAST program. As a result of these searches,
the multiple alignment of the genome regions encoding the predicted
exosome components was supplemented with genes that, in some of the
archaea, are located in other parts of the genome but are orthologous
to genes in partially conserved positions of the alignment. In most
cases, the orthologous relationships between these archaeal genes could
be readily established on the basis of statistically highly significant
protein sequence similarity, with a large margin separating orthologs
and paralogs; the eukaryotic orthologs were much less similar but also
were identified confidently either through regular, single-pass BLAST searches or by additional, iterative PSI-BLAST searches (Table 2).
These analyses resulted in the delineation of a potential superoperon
(by superoperon, we mean an array of functionally linked genes that
could be coregulated in a complex fashion, probably forming several
partially independent operons) that, in addition to the predicted
exosome subunits, encodes a remarkable panoply of proteins involved in
other central functional systems of the archaeal cells (Fig. 2A). The
potential superoperon consists of genes for the following categories of
proteins: (1) predicted exosome subunits, which include not only the
orthologs of eukaryotic exosome proteins described above, but also
archaeal orthologs of two protein subunits of the tRNA-processing RNase
P (Frank and Pace 1998
) and the ortholog of the eukaryotic protein
IMP4, a component of the eukaryotic U3 small nucleolar
ribonucleoprotein (Lee and Baserga 1999
); (2) the catalytic subunit of
the proteasomal protease (one of the two archaeal paralogs) (Baumeister
et al. 1998
; De Mot et al. 1999
); (3) two ribosomal proteins, L15E and L37AE; (4) prefoldin, a translation-associated molecular chaperone that
facilitates folding of nascent polypeptides (Vainberg et al. 1998
;
Leroux et al. 1999
; Leroux and Hartl 2000
); (5) DNA-directed RNA
polymerase subunit RPC10; and (6) three uncharacterized conserved proteins. All nine available archaeal genomes encode proteins from each
of these categories, with the single, puzzling exception of the
otherwise highly conserved RPC10 protein missing in Thermoplasma acidophilum; as noted above, subsets of the predicted exosome subunits are also missing in M. jannaschii,
Halobacterium sp. and T. acidophilum (Fig. 2A).
The organization of the potential superoperon is best preserved in Methanobacterium thermoautotrophi
The organization of the potential superoperon is best preserved in
Methanobacterium thermoautotrophicum where it is
predicted to consist of 15 genes. Only one gene, that for RPC10, is
found in a different chromosomal location in the Crenarchaeon
Sulfolobus solfataricus, whereas in the second Crenarchaeon,
Aeropyrum pernix, three genes are relocated. In the rest of
the Euryarchaea, the perturbations in the superoperon organization are
more severe (Fig. 2A). A superoperon of this size is outstanding in
archaeal genomes; in terms of the scale of gene order conservation, it is second only to the ribosomal superoperon (Wolf et al. 2000
). The
conservation of the (nearly) complete superoperon in a representative of the Euryarchaea and in the Crenarchaea, the two major archaeal lineages, strongly suggests that the superoperon is an ancestral feature that has already been present in the common ancestor of the archaea.
To identify additional genes that could be connected functionally to
the predicted archaeal exosome, we extended the searches in two
directions. Firstly, the archaeal genomes were searched for orthologs
of those exosome subunits whose counterparts are not encoded in the
potential superoperon. This resulted in the identification of the
archaeal ortholog of the RNA-binding subunit Csl4p which, like the
other three core subunits, is missing in M. jannaschii and
Halobacterium sp. (Table 1; Fig. 2B). Csl4p and its orthologs
are paralogs of the Rrp4p group of exosome subunits. The two subunits
share the pre-S1 domain and the central S1 domain, but instead of the
KH domain, the archaeal Csl4p orthologs contain a different type of
predicted RNA-binding domain at their carboxyl-termini, namely a
rubredoxin-like Zn-ribbon (Fig. 1; Aravind and Koonin 1999
). In the
eukaryotic Csl4p, the counterpart of the archaeal Zn-ribbon, although
retaining many of the conserved residues including a basic dyad, has
lost the metal-chelating cysteines, indicating that archaea possess the
primitive form of this protein (Fig. 1). The pre-S1 domain of the Csl4p
and Rrp4p orthologous groups is predicted to assume an all
fold
that may form a five-stranded barrel (Fig. 1); the conservation of this
domain suggests a common interaction partner for these proteins. The
genomic context of the Csl4p orthologs appears to extend the theme of
juxtaposition of genes coding for proteins involved in different
central cellular processes that was noticed in the potential
superoperon. In all archaeal genomes that encoded Csl4p, with the
exception of T. acidophilum, this gene is followed by the gene
for the RPC19 subunit of the DNA-directed RNA polymerase (with or
without an inserted uncharacterized gene; Fig. 2B), which reinforces
the exosome-transcription connection. In A. pernix
and Archaeoglobus fulgidus, adjacent to the gene for Csl4p is
a gene for a methyltransferase, which is conserved in all archaea and
eukaryotes, but in the rest of them is located elsewhere on the
chromosome. The phyletic distribution of this methyltransferase, which
is present in all archaea and eukaryotes, but not in bacteria, is
similar to that of other exosome, basal transcription, and translation
components, and together with the apparent operon organization,
suggests that it could belong to the exosome complex. By the same logic
as applied to the superoperon above, the Csl4p-methyltransferase gene
arrangement could be an ancestral character for the archaea. The
methyltransferase contains the motif [ND]PP[YF] which is typical of
nucleic acid purine methyltransferases (data not shown) and could be
involved in a yet-undetected RNA methylation event required for RNA
degradation by the exosome.
A more complicated situation was revealed in the search for the
archaeal counterpart of the eukaryotic exosomal helicase. The
eukaryotic exosomal helicases, Mtr4p and Ski2p, define a distinct family (SKI2) within the helicase superfamily II, which includes both
predicted RNA helicases such as PRP44 (which contains two helicase
domains) and DNA helicases such the Mus308/pol theta proteins (Harris
et al. 1996
; Aravind et al. 1999
; Kim and Rossi 1999
; L. Aravind and
E.V. Koonin, unpubl.). An orthologous group of SKI2 family helicases is
represented in all archaea (COG1204) and shows the greatest similarity
among the archaeal proteins to the Mtr4p and Ski2p helicases (Table 1;
Fig. 2B). However, reciprocal database searches indicate that these
proteins are orthologous to the helicase domain of the eukaryotic
MUS308-like proteins in which the helicase is fused to a DNA Pol I
domain (Harris et al. 1996
). The domain organization of these helicases also supports a function in DNA repair because they contain a carboxy-terminal DNA-binding helix-hairpin-helix (HhH) module that is
shared with the Mus308/pol theta proteins (Aravind et al. 1999
). The
genomic context of this helicase is mostly uninformative except for
M. jannaschii where there are some indications suggestive of a
possible association with other RNA-metabolism-related genes (Fig. 2B).
The adjacent gene encodes a predicted methyltransferase whose
specificity could not be pinpointed. Two genes next to the methyltransferase gene, albeit transcribed in the opposite direction, encode uncharacterized proteins, one of which contains the PilT amino-terminal (PIN) domain (Makarova et al. 1999
). This gene pair is
conserved in three archaeal genomes, but the orthologs of these genes
are missing in A. pernix, M. thermoautotrophicum, Halobacterium sp. and T. acidophilum (Fig. 2B). The
PIN domain is predicted to be an RNA-binding domain and is present in
the Rrp44p/Dis3p subunit of the eukaryotic exosome, suggesting the possibility of an RNA-metabolism-related function for at least some of
the numerous archaeal PIN-containing proteins (Makarova et al. 1999
).
Thus, whereas a dual role in DNA repair and the exosome is technically
possible for the archaeal helicases of COG1024, the evidence from the
above observations is at present weak.
An alternative and perhaps stronger candidate for the role of a
helicase associated with the predicted archaeal exosome is suggested by
the juxtaposition of a gene coding for a predicted RNA helicase with
one of the fragments of the potential exosomal superoperon in A. fulgidus (AF1149; Fig. 2). This predicted helicase, a more
peripheral member of the SKI2 family, is represented by two paralogs in
all archaea except M. jannaschii and Halobacterium sp., and by a single copy in two bacteria, Escherichia coli
and Mycobacterium tuberculosis. M. jannaschii
and Halobacterium sp., however, lack one of these
paralogous genes, the actual ortholog of AF1149 (COG1201), which
correlates with the loss of the other predicted exosome subunits (see
above). The gene for Lhr, the homologous helicase from E. coli, is adjacent to the gene for RNAse T, which is compatible with
a role in RNA processing in this bacterium. Further genome comparisons
and experimental evidence will be required to verify the role of one or
perhaps both of the archaeal Lhr-like helicases in the predicted
exosome. If their function in the exosome is confirmed, this will be a
case of functional displacement by paralogs (Koonin and Mushegian 1996
)
in the eukaryotic lineage.
Finally, in light of the tight connection between genes coding for
predicted exosome subunits and proteasome subunits within the
superoperon, we examined the genomic context of the remaining proteasome subunits. Notably, in all archaeal genomes, with the exception of Halobacterium sp., the gene for the second
paralogous protease subunit is adjacent to a gene that encodes a
predicted RNAse containing a metallo-beta-lactamase (MBL) catalytic
domain (Aravind 1998
) and an RNA-binding KH domain (Fig. 2B). The
eukaryotic ortholog of the latter protein is the catalytic subunit of
the mRNA polyadenylation cleavage/specificity complex, which is
distinct from the exosome and is involved in a different form of RNA
processing (Preker et al. 1997
; Dickson et al. 1999
; Takagaki and
Manley 2000
). Because in archaea, both the potential exosome components and the MBL-family RNAse are predicted to be functionally linked with
the proteasome, it seems plausible that this RNase is another exosome
subunit or at least functions along with the exosome in RNA
degradation. In three archaeal genomes, the gene for the regulatory ATPase subunit of the proteasome is adjacent to the gene coding for the
ortholog of the eukaryotic transcription factor MBF1; although the two
genes are transcribed divergently, coregulation is still likely given
the conservation of this gene arrangement (Fig. 2B). MBF1 shows
outstanding conservation among archaea and eukaryotes, particularly
within the DNA-binding helix-turn-helix domain and in light of the
evidence from eukaryotes, it is likely to be a basal transcription
factor (Aravind and Koonin 1999
). Thus the juxtaposition of the genes
for MBF1 and the proteasomal ATPase probably reflects coordination
between the proteasome and transcription already suggested by the
presence of the catalytic subunit and RPC10 in the superoperon (Fig.
2).
For three proteins that are encoded in the potential exosomal
superoperon and are conserved in all completely sequenced archaeal genomes, no specific function could be predicted by sequence analysis (Fig. 2A). The superoperon encodes functionally diverse proteins (see
above) and therefore, caution is due in attempting to predict the
functions of these proteins on the basis of the genome context. Nevertheless, an association with the exosome seems most likely considering the numerical prevalence of predicted exosome subunits in
the superoperon, and also the fact that the subunit composition of the
archaeal proteasome has been characterized in detail (Macario et al.
1999
; Wilson et al. 1999
, 2000
) and discovery of new subunits does not
seem particularly likely. One of the uncharacterized conserved proteins
(COG1500) has eukaryotic orthologs (e.g., yeast YLR022c) and it seems
plausible that these are so far undetected exosome subunits or at
least are functionally linked to the exosome; the remaining ones
appear to be archaea-specific.
Functional and Evolutionary Implications
The observations presented here suggest the existence of a complex
network of coregulation and functional and physical interactions in a
striking range of central cellular functions in the archaea, including
translation and cotranslational protein folding, RNA processing,
degradation and modification, and transcription. The previously
unsuspected connections seem to emerge at several levels. The
hypothetical archaeal exosome that appears to be taking shape as the
result of this analysis combines forms of RNA processing that are
thought to be distinct in eukaryotes. In particular, association of
RNase P with the exosome in eukaryotes has not been reported, but the
presence in the archaeal exosomal superoperon of the genes coding for
the orthologs of two RNase P subunits strongly suggests such an
association. Several archaeal RNase P subunits have not been described
previously; multiple alignments of the 30-Kd subunit (yeast Rpp1p) and
the 14-Kd subunit (yeast Pop5p) are shown in Figure
3. Both of these subunits contain no known
conserved domains, but secondary structure prediction based on their
alignments suggest that they assume distinct
/
folds that
could be unique to archaea and eukaryotes (Fig. 3).
|
Similarly, the eukaryotic ortholog of the archaeal MBL-family RNAse
functions within a distinct mRNA-processing system, the polyadenylation
cleavage/specificity complex (Dickson et al. 1999
; Preker et al. 1997
;
Takagaki and Manley 2000
), whereas the IMP4 protein, whose archaeal
ortholog belongs to the exosomal superoperon and is predicted to be a
subunit of the exosome, is part of the splicing machinery in eukaryotes
(Lee and Baserga 1999
).
The apparent connection between the predicted archaeal exosome and the
proteasome is particularly intriguing given the functional parallels
between the two systems that are extensive enough to have prompted van
Hoof and Parker (1999)
to call the exosome the proteasome for RNA.
The salient common features of the two molecular machines include the
presence of several paralogous catalytic subunits (RNAses and
proteases, respectively) all of which are essential for the complex
function, and an ATPase (helicase) subunit (Baumeister et al. 1998
; van
Hoof and Parker 1999
). The eukaryotic proteasomes and their archaeal
counterparts differ in the number of paralogous subunits; the total
number of subunits in the complex is the same, but instead of using 14 copies of just two distinct subunits as the archaea do, eukaryotes
employ 14 subunits with two copies of each incorporated in the complex
(DeMartino and Slaughter 1999
). The findings presented here suggest
exactly the same kind of difference between the eukaryotic exosome and
its postulated archaeal counterpart, the latter including only two RNase PH homologs and two RNA-binding proteins in contrast to the six
and three, respectively, in the eukaryotes (Table 1). It should be
emphasized in this context that, given the evolution of the eukaryotic
exosome by duplication of the ancestral genes for the core exosomal
subunits, the small number of the actual archaeal orthologs of
eukaryotic exosomal proteins (Table 1) by no means should be
interpreted as evidence against the existence of an archaeal exosome.
The prediction is that the diversity of the eukaryotic exosomal
subunits created by paralogous evolution is countered by
multimerization of identical subunits in the hypothetical archaeal
exosome. The only two eukaryotic exosomal subunits whose evolutionary
counterparts appear to be genuinely missing in archaea are Rrp44p and
Rrp6p, two distinct nucleases (Table 1). One could speculate that the
predicted archaeal MBL-like exonuclease might substitute functionally
for at least one of these enzymes, in another case of
nonorthologous displacement.
The striking similarities discussed above indicate that the proteasome and the exosome are not only architecturally and functionally analogous, but also have evolved along parallel routes. Neither do they seem to have evolved independently because given the conservation of the predicted exosomal superoperon in Euryarchaea and Crenarchaea, a functional and perhaps even physical association between the proteasome and the exosome should have already existed at least in the common ancestor of the extant archaea, but more likely in the common ancestor of archaea and eukaryotes. For at least some aspects of their functioning, coupling between the proteasome and exosome seems to make perfect sense. For example, when the proteasome recognizes and destroys an abnormal protein coming off the ribosome, the exosome could start degrading the respective mRNA from the 3'-end.
In this context, physical association, perhaps a transient one, between the proteasome and the exosome seems plausible. For the next level of suggested functional connections, those between the exosome-proteasome and the translation and transcription machineries, physical associations appear to be less likely, although not impossible. However, a global regulatory network, within which transcription rate is tightly coordinated with those of translation and RNA and protein degradation via the regulation of expression of the key subunits of the respective multiprotein complexes, is suggested by the operonic organization of the respective archaeal genes.
Given the deep commonality between information processing systems in
archaea and eukaryotes, an attractive possibility is that the
(super)operon organization of genes that is prominent in archaea but
not in eukaryotes, could help predict functionally important
interactions between gene products that are common to both systems.
Along this line, one could envisage previously unsuspected functional
or even physical links between different types of RNA processing
complexes and between the proteasome and the exosome in eukaryotes.
Interestingly, a functional connection between RNase P and the
proteasome in yeast is suggested by the recent genetic experiments
demonstrating that mutations in a gene for a proteasome subunit and in
a gene for a chaperone involved in proteasome assembly suppress
mutations in the RPM2 gene coding for an RNase P subunit (Lutz et al.
2000
).
Furthermore, the presence of shared domains (including the PINT and
JAB1/pad1 domain) in the eukaryotic proteasomal regulatory complex,
translation initiation factor eIF-3, and transcription regulators
strongly suggests deep evolutionary connections between these processes
(Aravind and Ponting 1998
). Similarly, evolutionary links between the
translation machinery and the eukaryotic nonsense-codon-mediated RNA
degradation system are suggested by the presence of the NIC domain in
eIF4G and NMD2 and by the common functions of NMD3 in RNA degradation
and in translation (Aravind and Koonin 2000
). These extrapolations
require caution because it is imaginable that with the considerable
growth in complexity that is the hallmark of the eukaryotic functional
systems, the ancient coupling could have become less tight and less
direct. Nevertheless, the deployment of proteins sharing a common
origin in translation and in RNA and protein stability regulation
suggests that, at least in the common ancestor of the eukaryotes, these
systems were closely associated as they are predicted to be in the
extant archaea.
Additionally, the present analysis indicates that some proteins of the eukaryote-specific mRNA splicing system, such as IMP4, could have evolved from ancestral exosome proteins. Regardless of the degree to which links between cellular systems previously thought to function independently are conserved between archaea and eukaryotes, these connections seem to deserve investigation in both the archaeal and the eukaryotic system.
Finally, the comparative analysis of the archaeal genes encoding
proteins implicated in the exosome activity, and particularly the
exosomal superoperon, reveal interesting cases of apparent concerted
loss of groups of functionally linked genes (Aravind et al. 2000
) in
three archaea: M. jannaschii, Halobacterium sp., and T. acidophilum. The former two species show striking
parallel loss of three core subunits of the predicted exosome, Csl4p
and one of the Lhr-like helicases; the gene for the IMP4 ortholog is
additionally missing in Halobacterium sp (Fig. 2A). There is no indication of a general phylogenetic affinity between
Methanococcus and Halobacterium, and therefore, the
nearly identical patterns of apparent gene loss most likely result from
independent series of evolutionary events, in a striking support of the
notion of concerted gene loss (Aravind et al. 2000
). Notably, the
partial conservation of the gene order in the potential exosomal
superoperon in M. jannaschii (Fig. 2A) appears to be
indicative of direct excision of the genes for three core exosome
subunits. T. acidophilum shows a complementary pattern of
apparent gene loss that involves two predicted Rnase P subunits, IMP4,
one of the uncharacterized conserved genes, and RPC10 (Fig. 2A),
although it seems premature to predict specific functional connections
between these genes on the basis of this single genome structure.
The prediction of the archaeal exosome, variations in its composition,
and its interactions with the proteasome and the translational and
transcriptional machineries illustrates context analysis, an approach
that is becoming increasingly popular in genomics, whereby gene
functions are predicted by a combination of detailed sequence analysis,
comparison of protein domain architectures, and operon organization and
examination of phyletic patterns (Marcotte et al. 1999
; Aravind 2000
;
Galperin and Koonin 2000
; Huynen and Snel 2000
; Huynen et al. 2000
).
This case is rare because combined application of the above analyses
enabled us to predict an entire functional system and its structural
organization in archaea, opening up several lines of experimental
investigation, the results of which might have significant implications
for the corresponding eukaryotic systems.
| |
METHODS |
|---|
|
|
|---|
Genome Sequences, Databases, and Sequence Analysis
The annotated archaeal genome sequences: A. fulgidus
(Klenk et al. 1997
), M. thermoautotrophicum (Smith et al.
1997
), M. jannaschii (Bult et al. 1996
), Pyrococcus
horikoshii (Kawarabayasi et al. 1998
), Pyrococcus abyssi
(Heilig, R., Genoscope; GenBank NC_000868), Halobacterium sp.
(Ng et al. 2000
), and T. acidophilum (Ruepp et al. 2000
)
(Euryarchaeota), and A. pernix (Kawarabayasi et al. 1999
)
(Crenarchaeota), with the accompanying information on the positions and
transcription directions of all protein-coding genes were retrieved
from the Genomes division of the Entrez system (Tatusova et al. 1999
).
The partial genome sequence of the Crenarchaeon S. solfataricus
(Charlebois et al. 2000
) was from GenBank.
The nonredundant database of protein sequences at the National Center
for Biotechnology Information (NIH, Bethesda) was iteratively searched
using the PSI-BLAST program (Altschul et al. 1997
; Altschul and Koonin
1998
). The cut-off of E < 0.01 was typically employed for
inclusion of sequences in the position-specific weight matrices. Nucleotide sequences of archaeal genomes translated in all six reading
frames were searched using the TBLASTN program (Altschul et al. 1997
).
Protein sequences were also compared to the database of COGs of
proteins (http://www.ncbi.nlm.nih.gov/COG/) using the COGNITOR program
(Tatusov et al. 1997
, 2000
).
Conserved domains in protein sequences were identified by searching the
NCBI's CD collection of domain-specific, position-dependent weight
matrices using the reversed PSI-BLAST program
(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). Multiple
alignments of protein sequences were constructed using the Clustal_X
program (Thompson et al. 1997
) and corrected on the basis of PSI-BLAST
results. Protein secondary structure was predicted using the PHD
program, with a multiple alignment submitted as the query (Rost and
Sander 1994
). The construction of gene-by-gene pairwise and
template-anchored local alignments of gene orders using the Lamarck
program is described in Wolf et al. (2000)
.
| |
ACKNOWLEDGMENTS |
|---|
We thank Roman Tatusov and Darren Natale for help with the COG analysis.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL koonin{at}ncbi.nlm.nih.gov; FAX (301) 480-9241.
Article and publication are at www.genome.org/cgi/doi/10.1101/gr.162001.
| |
REFERENCES |
|---|
|
|
|---|
A tool for making discoveries in sequence databases.
Trends Biochem. Sci.
23:
444-447[CrossRef][Medline].Received August 23, 2000; accepted in revised form December 7, 2000.
This article has been cited by other articles:
![]() |
H. Lange, S. Holec, V. Cognat, L. Pieuchot, M. Le Ret, J. Canaday, and D. Gagliardi Degradation of a Polyadenylated rRNA Maturation By-Product Involves One of the Three RRP6-Like Proteins in Arabidopsis thaliana Mol. Cell. Biol., May 1, 2008; 28(9): 3038 - 3044. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhang, T. Nakashima, Y. Kakuta, M. Yao, I. Tanaka, and M. Kimura Crystal structure of an archaeal Ski2p-like protein from Pyrococcus horikoshii OT3 Protein Sci., January 1, 2008; 17(1): 136 - 145. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hundt, A. Zaigler, C. Lange, J. Soppa, and G. Klug Global Analysis of mRNA Decay in Halobacterium salinarum NRC-1 at Single-Gene Resolution Using DNA Microarrays J. Bacteriol., October 1, 2007; 189(19): 6936 - 6944. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Gil, K. E. Sherwood, and J. A. Maupin-Furlow Transcriptional linkage of Haloferax volcanii proteasomal genes with non-proteasomal gene neighbours including RNase P, MOSC domain and SAM-methyltransferase homologues Microbiology, September 1, 2007; 153(9): 3009 - 3022. [Abstract] [Full Text] [PDF] |
||||
![]() |
|