|
|
|
|
Vol. 10, Issue 7, 1011-1019, July 2000
LETTER
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Hammerhead ribozymes previously were found in satellite RNAs from plant viroids and in repetitive DNA from certain species of newts and schistosomes. To determine if this catalytic RNA motif has a wider distribution, we decided to scrutinize the GenBank database for RNAs that contain hammerhead or hammerhead-like motifs. The search shows a widespread distribution of this kind of RNA motif in different sequences suggesting that they might have a more general role in RNA biology. The frequency of the hammerhead motif is half of that expected from a random distribution, but this fact comes from the low CpG representation in vertebrate sequences and the bias of the GenBank for those sequences. Intriguing motifs include those found in several families of repetitive sequences, in the satellite RNA from the carrot red leaf luteovirus, in plant viruses like the spinach latent virus and the elm mottle virus, in animal viruses like the hepatitis E virus and the caprine encephalitis virus, and in mRNAs such as those coding for cytochrome P450 oxidoreductase in the rat and the hamster.
| |
INTRODUCTION |
|---|
|
|
|---|
The hammerhead ribozyme originally was discovered
as a self-cleaving motif in viroids and satellite RNAs. These RNAs
replicate using the rolling circle mechanism, which generates long
multimeric replication intermediates. They use the cleavage reaction to
resolve the multimeric intermediates into monomeric forms. The region able to self-cleave has three base paired helices (I-III) connected by
two conserved single stranded regions and a bulged nucleotide (Forster
and Symons 1987
; for reviews see Symons 1992
; Bratty et al. 1993
;
Birikh et al. 1997
). The hammerhead ribozyme also seems to function in
the generation of unit length sequences from multimeric transcripts of
repetitive DNA sequences. Two of these RNAs have been characterized:
one in several newt species (Epstein and Gall 1987
) and the other one
in three Schistosome species (Ferbeyre et al. 1998
). Among the
repetitive sequences of these two organisms, note that not all
contained a bona fide hammerhead ribozyme. Indeed, many mutations also
were found creating variants of the original motif. Overall, the rather
limited distribution of this motif contrast with the simplicity of its
secondary structure in which only a core of 14 nucleotides is
absolutely required for cleavage.
We recently have conducted an extensive research of different RNA
motifs in the GeneBank database (Bourdeau et al. 1999
). The results
showed that most of the motifs were distributed randomly among gene
sequences suggesting that most RNA motifs originate by random drift. We
now wish to extend these observations to the self-cleaving hammerhead
ribozyme and its variants in which either an essential nucleotide in
the single strand positions is allowed to be random or the identity
of a conserved base pair from helices II and III is changed. We found
that most of the hammerhead motifs are apparently underrepresented
among gene sequences, but this comes from the bias of the GenBank for
sequences with low CpG representation. We also report the finding of
intriguing motifs in several repetitive sequences and mRNAs.
| |
RESULTS |
|---|
|
|
|---|
Searching for Self-cleaving RNA Motifs of the Hammerhead Type in the GenBank
The hammerhead ribozyme can be described by three helices separated by three single stranded regions of conserved nucleotides. There are three equivalent conformations of the self-cleaving hammerhead depending on which helix bears the 5' and 3' end of the motif. We named them HH-I, HH-II, and HH-III (Figure 1). The descriptors composed as input for the search program are presented beside each motif and described in the legend of Figure 1 (see also Methods). They were designed to detect any sequence with all the minimal nucleotide requirements to have some catalytic activity and with the possibility to fold like the hammerhead. In this context, it is expected that sequences will be found that combine several nonoptimum features and be inactive for this reason, i.e., a non-GUC cleavage, a C in position 4, short helices, and long loops. It is also possible that they contain all the requirements for being catalytically active but the active conformation is inaccessible because the RNA molecule that bears them folds into an alternative secondary structure.
|
The search for hammerhead self-cleaving motifs through the GenBank
database (Benson et al. 1999
) was performed using the program RNAMOT
(Gautheret et al. 1990
; Laferrière et al. 1994
). The sequences detected with our descriptors are referred to as occurrences. The
ability of the descriptors to identify the hammerhead motifs already
characterized is illustrated in Table 1. The
program recognizes most of the known plant derived hammerheads (Symons 1997
; see also
http://callisto.si.usherb.ca/~jpperra/organisms.html; Bussière et al. 1996
; Lafontaine et al. 1999
) and all those
present in satellite DNA sequences. Note that there is no known natural incidence of a hammerhead of the HH-II type.
|
Table 2 presents the frequencies of occurrences
of potential hammerhead motifs in the different sections of the GenBank
as well as the expected frequencies calculated from the number of occurrences obtained in a database of random sequences. In general, the
number of occurrences observed are half of the frequency expected if
our motifs were randomly distributed among the sequences of the
GenBank. HH-I and HH-II detect twice as many motifs as HH-III because
we designed the motifs in a way that Helix III had a 2-base pair
requirement in HH-I and HH-II descriptors versus 3 base pairs in the
HH-III descriptor (see Methods). This increase was predicted by the
number of occurrences obtained in the random database.
|
The Frequency of Mutated Versions of the Hammerhead Self-cleaving RNAs
We also composed descriptors for variants of the hammerhead ribozyme motif. Substitutions were made by replacing, one at a time, each of the essential nucleotides located in the single stranded regions of the ribozyme core by N (boldface in Fig. 1) or by changing the identity of each one of the 2 conserved base pairs of the hammerhead motif (also boldface in Fig. 1).
Table 2 presents the data on the distribution of the mutated variants of HH-I, HH-II, and HH-III from the single stranded region. It is expected that every mutant will increase the frequency of occurrences by a factor of four because we changed the requirements in every position from only one to all four nucleotides except in position 4 where C and U already were allowed and in the cleavage site where only G originally was excluded. Thus, in position 4 we expected to double the frequency, and in the cleavage site we expected a 25% increase. The results are mostly those anticipated based on these calculations. However the mutants of position 12 doubled the expected increase in all the orientations. This effect was not uniformly observed in the different subdivisions of the GenBank. Actually, most of the extra occurrences are located in the files containing ESTs (Expressed Sequence Tags) and mammalian sequences. These preferences were not observed in the random database in which the mutants showed the anticipated increase in their frequency in comparison with the original motif. The number of occurrences obtained in the virus section of the GenBank for the HH-III-8 variant was 722 instead of the 113 expected (HH-III has 3774 expected occurrences and viruses represent 3% of the GenBank). However, a quick analysis of the occurrences obtained with this descriptor revealed that most of them are the same motif repeated in 679 hepatitis C sequences.
Table 3 presents the frequencies obtained with
the mutant hammerhead ribozymes using a different identity for the
conserved base pair of helices II or III (positions 10.1:11.1 and
15.1:16.1, respectively). One striking observation is that all the
mutants in Helix II (iiNN) have total occurrences two to six times
higher than expected whereas the mutants in Helix III (iiiNN) have half the expected frequency. One more interesting point is the high number
of occurrences obtained with the three orientations of the hammerhead
ribozyme having a A:U base pair in Helix II (10.1:11.1) instead of
the usual G:C.
|
The mutants in position 12 and the mutants of the conserved base pair
of Helix II have in common that they disrupt the presence of a
dinucleotide CpG in the resulting sequence. It is well known that CpG
is underrepresented in vertebrate sequences (Karlin and Mrazek 1997
).
The GenBank is biased for those sequences mainly owing to human and
rodent entries. In those files, the mutants that disrupt the CpG
requirement have a higher frequency. To confirm that the overall
frequency of the hammerhead motifs containing CpG dinucleotides is half
of the expected one because of the low CpG content of vertebrate
sequences, we built a new random database in which the frequency of CpG
was reduced by half in favor of either AG, CA, CC, CT, GG, or TG to
simulate the frequencies observed by Karlin and Mrazek (1997
; see
Methods). In this database, we observed an overall doubling of the
original expected frequencies for all the motifs needing a CpG but not
for the others (data not shown).
Still, the mutants with a A:U base pair in position 10.1:11.1 of the Helix II have a very high frequency in all three conformations of the motif: two to three times higher than expected even considering the CpG effect discussed above. So far, we have no explanation for this intriguing observation.
Finally, we made three more searches by changing the cleavage site from
NUH to NHH based on the report of Kore et al. (1998)
that such
hammerheads were still active. We obtained for these new mutants a
number of occurrences corresponding to half of what we expected
according to the search in an equal A-C-G-T random database. Moreover,
as for the previous motifs, the number of occurrences in the GenBank is
comparable to the expected frequency according to the search in the
reduced for CpG database. All the occurrences found in the GenBank are
available in our web site at http://www.centrcn.umontreal.ca/~bourdeav/HH.
Some Intriguing Hammerhead Motifs that Might Have Functional Significance
This section presents a sample of motifs considered interesting either because of their location or because their structure is optimal for self cleavage. The hammerhead ribozyme occurs naturally in satellite RNAs, viroids, and transcripts from repetitive sequences. The probability of finding an active hammerhead should be higher among these genetic elements. Several potential hammerhead motifs were found in distinct families of repetitive DNA.
Hammerhead ribozymes were found in the satellite DNA from
Dolichopoda schiavazzii (cricket) by using the HH-I descriptor
(example in Figure 2A). Fourteen have a conserved
HH-I motif and two have a HH-I-iiGU motif (G:U in position
10.1:11.1 instead of G:C). This ribozyme cleaves after CUA (A.A.
Rojas, A. Vazques-Tello, G. Ferbeyre, F. Venanzetti, L. Bachmann, B. Paquin, and R. Cedergren, in prep.). Helix I has the GG:CC base pairs
and the internal loop common to the hammerhead motifs in schistosomes
(Ferbeyre et al. 1998
) and newts (Pabon-Peña et al. 1991
). It is
noteworthy that among the 20 similar sequences submitted to GenBank,
the four sequences not found through the search contained either
mismatches in one of the helices or combined two point mutations.
|
A hammerhead-like motif was detected in the Kpn-13 family of human repetitive DNA by using the descriptor HH-I-4 (Fig. 2B). The motif is found in several ESTs containing Kpn-repetitive sequences (also known as L1-repetitive elements) indicating its expression at the RNA level. All the occurrences contain a disabling A at position 4, but one (AA564135) possesses a C. The latter motif is inactivated by a G per A substitution at position 12. Variants of this motif also are found in genomic clones containing Kpn repetitive sequences. Intriguingly, the L1 motif interrupting the dystrophin gene of a muscular dystrophy patient (accession number HSU09115) also has a disruption in Helix I. Four additional hammerhead-like motifs were found in the satellite DNA array from the rodent Microtus chrotorrhinus (accession number MICSATB, position 921-1079, not shown), in the repetitive DNA from the protozoan parasite Theileria parva (accession number S37077, position 84-223, not shown) with the descriptor HH-I-7 and in mouse repetitive DNA with descriptors for the HH-I-iiUA and HH-III-iiAU motifs (Fig. 2C,D). The first two motifs are predicted to be inactive because they contain A instead of G in position 12.
Viruses are good candidates for using catalytic RNA motifs. We have found several new intriguing hammerhead motifs in different viruses (Fig. 2E). Two similar hammerhead ribozyme motifs were found in the 5' untranslated region of two viruses of the Ilarvirus genus, family of Bromoviridae, which are single stranded positive RNA viruses. One motif is in the spinach latent virus (accession number PMOVRNA3, position 252-331) and the other in the Elm mottle virus (accession number SLU57048, position 250-329) (Fig. 2E). Both motifs were found using the HH-III descriptor. The region containing the hammerhead is highly conserved among these viruses. The hammerhead motif found with HH-II in an RNA associated to carrot red luteovirus that is also very interesting because satellite RNAs were the first molecules found to contain hammerhead ribozymes (Fig. 2F). This motif is predicted to cleave after AUA. Mammalian viruses also contain potential hammerhead ribozymes, and two of them found with HH-II are illustrated in Figure 2G,H, one in the hepatitis E virus, and the other in the caprine encephalitis virus.
Two hammerhead motifs in human mRNAs also are presented in Figure 2I,J. Self-cleaving motifs in mRNA might regulate gene expression by promoting RNA decay. The genes coding for the interferon-induced DAP1 and the neuroleukin gene possess potentially active hammerhead motifs found with HH-III that are predicted to cleave after UUC and CUC, respectively. Perhaps even more remarkable are the conserved hammerhead motifs found in the genes coding for NADPH-cytochrome P450 oxidoreductase both in the rat and the hamster (Fig. 2K,L). All together, the motifs presented here suggest that the hammerhead ribozyme might have functions other than those previously suggested for satellite RNA and transcripts for repetitive sequences.
| |
DISCUSSION |
|---|
|
|
|---|
We have used the search engine RNAMOT to scrutinize the GenBank for
potential self-cleaving hammerhead ribozyme motifs. Our search extends
earlier efforts to find a subset of potential hammerheads in
Escherichia coli sequences (Ruffner et al. 1990
). Because this motif has relatively few structural constraints, we designed an extensive set of descriptors for both the wild-type motif and variants
of its essential nucleotides. The results show a wide distribution of
potential hammerhead-like motifs in all regions of the GenBank with a
higher frequency for the variants that do not require the presence of a
CpG dinucleotide in the final sequence of the motifs. This CpG
dinucleotide in positions 11.1 and 12 is not absolutely required for
self-cleavage because other base pairs are acceptable in positions
10.1:11.1. We conclude that the reduction we observed in the
frequency of most hammerhead motifs in this search is fortuitous.
We expect that most of the motifs found here are inactive because we
designed descriptors that include mutations or nonoptimal features of
the hammerhead self-cleaving motif (Ruffner et al. 1990
). However, our
results illustrate the possibility that natural sequences might end up
forming self-cleaving motifs by random drift. In other words, it would
be sufficient to mutate one or two residues to activate the potential
hammerhead ribozymes described here. This is not only true for the
hammerhead ribozyme motif because other RNA motifs can be found
randomly in natural sequences (Fontana et al. 1993
; Reidys et al. 1997
;
Bourdeau et al. 1999
).
The use of variants of the hammerhead ribozyme was stimulated by
previous work that showed that satellite DNA encoding hammerhead ribozymes is enriched with mutated variants of the motif (Zhang and
Epstein 1996
; Ferbeyre et al. 1998
). The ribozyme motif found in the
cricket satellite DNA follows this rule because 14 of the 20 sequences
deposited until now in the GenBank contains an active motif. Other
mutant hammerheads were found in different families of repetitive DNA
by using descriptors for hammerhead-like motifs, raising the
possibility that other members of these families, not yet sequenced,
contain the active motifs. The occurrence of hammerhead ribozymes in
transcripts of repetitive DNA from different species suggests a
functional role for the self-cleavage reaction in the propagation
and/or the metabolism of these transcripts. We previously have proposed
that self-cleavage might limit the expansion of repetitive sequences
through the genome by retrotransposition (Ferbeyre et al. 1998
). This
model predicts that recent insertions of these elements will contain
disabling mutations in the hammerhead motif. The family of L1
repetitive elements for example contains mutated versions of the
hammerhead and members of this family still retrotranspose in humans,
sometimes causing genetic diseases (Holmes et al. 1994
). Another
intriguing possibility is that viroids and satellite RNAs originated
from transcripts of repetitive sequences when these transcripts
parasitizes a viral replication machinery. Subsequently, they might
jump from one organism to another using the virus as a vector, and as a
result their distribution will cross phylogenetic barriers.
Many ESTs and mRNAs were found here to possess hammerhead-like motifs.
To test any role of the hammerhead motifs identified in this work, we
need a combination of biochemical and genetic analysis. Our group has
finished the characterization of hammerhead motifs in repetitive DNA of
Schistosome (Ferbeyre et al. 1998
) and the cricket (A.A. Rojas, A. Vazques-Tello, G. Ferbeyre, F. Venanzetti, L. Bachmann, B. Paquin, and
R. Cedergren, in prep.). All the occurrences we found in the GenBank
are available at our web site (URL:
http://www.centrcn.umontreal.ca/~bourdeav/HH) for those interested
in finding where "hammers" can cut.
| |
METHODS |
|---|
|
|
|---|
The pattern searching for RNA secondary structures was performed by
RNAMOT (Gautheret et al. 1990
; Laferrière et al. 1994
). The
inputs for this program are nucleotide sequences, and a descriptor file
defining the structural motif to be searched. RNAMOT reports all the
occurrences of the motif as well as its positions along the sequence.
Two of the three helices defining the hammerhead self-cleaving motif
are closed by loops. The remaining helix connects the motif to the rest
of the RNA molecule. As a result, there are three ways of defining a
self-cleaving hammerhead ribozyme motif. We have built descriptors for
these three different orientations of the motif taking into account
the following constraints (Fig. 1):
| 1. | Three nucleotides in Helix I. Helix I has no specific nucleotide
requirements although the hammerhead motif found in the newt and in
Schistosome possess a conserved GG:CC base pairing, three nucleotides
downstream from the cleavage site as well as an internal loop farther
downstream (Pabon-Peña et al. 1991 |
| 2. | The conserved sequence CYGANGA. This sequence is part of the catalytic
core of the ribozyme and is entirely conserved with the exception of
position 7. In the latter, although all nucleotides are accepted, the
preferred ones are U then G or A and finally C. More recently, position
4 was reported to accept also U, so we have included this feature in
our search (Ambros and Flores 1998 |
| 3. | Three nucleotides in Helix II. There is a strong preference for a R:Y base pair in positions 10.1:11.1, but the pair G:C confers the better activity and was the only one allowed in our original descriptors. |
| 4. | The conserved sequence GAA is absolutely required for catalysis. In the
X-ray model of the hammerhead, nucleotides G12 and A13 form two reverse
Hoogsteen G-A base pairs with nucleotides A9 and G8, respectively,
whereas A14 form a non-Watson Crick base pair with N7 (Scott et al.
1995 |
| 5. | Helix III requires an A:U base pair which is also of non Watson Crick type and a minimum of one more pair in two of the orientations (HH-I and HH-II). When the helix is open as in HH-III, two more pairs are required. |
| 6. | The cleavage site was defined as NUH (H is any nucleotide but G).
However, natural ribozymes contain GUC, GUA, AUA, and AUC because they
allow the highest reaction rates (Shimayama et al. 1995 |
| 7. | The loops closing the helices were allowed to have from 0 to 100 nucleotides. |
Sixty-three additional mutants also were included in the study.
These were derived from the original motifs shown in Figure 1 by
changing either one base in the conserved single stranded regions for
an N (any nucleotide; 30 mutants), the identity of one of the
constrained base pair (positions 10.1:11.1 and 15.1:16.1; 30 mutants), or by changing the cleavage site from NUH to NHH (three more
motifs; Kore et al. 1998
).
The search was performed in the July 15, 1998 release of the GenBank sequence database (National Center for Biotechnology Information-GenBank flat file release 108.0). Searches were performed on both strands and all occurrences of motifs involving unidentified bases denoted by N in the database were disregarded. A Power Challenge XL with 32 CPUs IP 19, R4400, 150-MHz processor (3072 Mbytes) running UNIX IRIX 6.2 was used.
To help establish the significance of their presence, frequencies of
each motif in the database were compared with frequencies in a random
sequence database generated by a uniform pseudo-random number generator
(L'Écuyer and Andres 1997
) with a period length near 2121. The
random sequence databases contained 1000 sequences of 100,000 nucleotides each; the four nucleotides A, C, G, and T were used with
equal probabilities. An "expected" frequency N in GenBank
was calculated from the number M of occurrences of each motif
in the random databases as follows: N = (a
×M)/(104 × 105), where a is the
number of nucleotides in GenBank (1.797 × 109 in the release
108.0).
The random database reduced in CpG dinucleotides was generated using
the same procedure, but each time a CpG dinucleotide was created a
second generator (evolving in parallel) would enter in function to
decide if yes or no (50% frequency) the dinucleotide would be changed.
If a change had to take place, a third generator (also evolving in
parallel) would be able to choose among six replacing dinucleotides:
AG, CA, CC, CT, GG, or TG (choices made according to the dinucleotide
frequencies reported by Karlin and Mrazek 1997
). The expected frequency
was evaluated as before.
| |
ACKNOWLEDGMENTS |
|---|
We thank Bruno Paquin for valuable comments, and to NSERC of Canada which financed this project. V.B. holds a doctoral fellowship from NSERC of Canada. The late R.C. was Richard Ivey Scholar of the Canadian Institute for Advanced Research (CIAR) program in Evolutionary Biology. We acknowledge previous efforts from Dr. Daniel Gautheret to search for hammerhead sequences with RNAMOT in our laboratory. In addition, we thank Bernard Lorazo, Daniel Raymond and André Fourrier of the DITER (Direction des infrastructures technologiques d'enseignement et de recherche) at the Université de Montréal for their assistance. P.M. wishes to thanks the hospitality of the Université de Montréal and the Institute of Physics, UNAM.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 These authors contributed equally to this work.
5 Deceased.
6 Corresponding author.
E-MAIL ferbeyre{at}cshl.org; FAX (516) 367 8454.
| |
REFERENCES |
|---|
|
|
|---|
Received December 17, 1999; accepted in revised form May 3, 2000.
This article has been cited by other articles:
![]() |
E. M. OSBORNE, J. E. SCHAAK, and V. J. DEROSE Characterization of a native hammerhead ribozyme derived from schistosomes RNA, February 1, 2005; 11(2): 187 - 196. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||