|
|
|
|
Genome Res. 15:1411-1420, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00 Letter Identification of programmed translational -1 frameshifting sites in the genome of Saccharomyces cerevisiae1 Institut de Génétique et Microbiologie CNRS UMR 8621, Université Paris-Sud, 91405 Orsay Cedex, France 2 Laboratoire Statistique et Génome, CNRS-INRA-Université d'Evry, 91000 Evry, France.
Frameshifting is a recoding event that allows the expression of two polypeptides from the same mRNA molecule. Most recoding events described so far are used by viruses and transposons to express their replicase protein. The very few number of cellular proteins known to be expressed by a -1 ribosomal frameshifting has been identified by chance. The goal of the present work was to set up a systematic strategy, based on complementary bioinformatics, molecular biology, and functional approaches, without a priori knowledge of the mechanism involved. Two independent methods were devised. The first looks for genomic regions in which two ORFs, each carrying a protein pattern, are in a frameshifted arrangement. The second uses Hidden Markov Models and likelihood in a two-step approach. When this strategy was applied to the Saccharomyces cerevisiae genome, 189 candidate regions were found, of which 58 were further functionally investigated. Twenty-eight of them expressed a full-length mRNA covering the two ORFs, and 11 showed a -1 frameshift efficiency varying from 5% to 13% (50-fold higher than background), some of which corresponds to genes with known functions. From other ascomycetes, four frameshifted ORFs are found fully conserved. Strikingly, most of the candidates do not display a classical viral-like frameshift signal and would have escaped a search based on current models of frameshifting. These results strongly suggest that -1 frameshifting might be more widely distributed than previously thought.
Sequencing programs, along with various projects in the pharmaceutical, agricultural, aquacultural, and forestry industries, are creating an explosion of DNA sequence data. With this abundance of data, there is a growing need for more effective tools and methods to extract vital information from raw DNA sequences. Algorithms for identifying protein-coding regions and predicting complete genes are of particular importance. Since the early 1990s, a number of computer programs for eukaryotic gene identification have been developed: GENMARK (Borodovsky and McIninch 1993
Twenty years ago, Jacks and Varmus described the first programmed -1 ribosomal frameshifting event, from which they established the canonical model of the eukaryotic -1 frameshifting site (Jacks and Varmus 1985
Although most translational recoding events are found in viruses and transposons, a few cellular genes have been identified that use this mode of expression (Namy et al. 2004
The goal of the present work was to set up a comprehensive strategy, based on complementary bioinformatics and molecular approaches, and on functional in vivo analyses, to identify -1 ribosomal frameshifting sites in cellular genomes, without a priori knowledge of the mechanism involved. We devised two independent methods to look for frameshifting sites in silico. The first is based on the search for genomic regions in which two domains, each carrying a protein pattern, can be associated on the same polypeptide by a single -1 frameshifting event. The second is performed by a two-step selection with HMM. The first step identifies potential candidates likely to possess a constrained coding region after their stop codon. The second step ranks the candidates by likelihood ratio, based on available biological knowledge. These two approaches do not rely on any model of the frameshifting site and thus are well adapted for de novo detection of frameshift events. We validated these methods by analyzing the genome of S. cerevisiae. Indeed, the sequence information about S. cerevisiae is highly reliable because of multiple sequencings and careful annotation maintenance. Furthermore, the availability of several other ascomycetes genome sequences offers a unique opportunity to explore eukaryotic genome evolution by comparative analysis of several species (Dujon et al. 2004 A total of 189 frameshifted candidate regions (fsORFs) were found. We assessed the presence of a full-length mRNA and quantified -1 frameshift efficiency for a subset of the highest ranked candidates. Among the 58 characterized regions, 28 were analyzed for their ability to induce -1 frameshifting in vivo; 11 showed a frameshift efficiency 50-fold higher than the background. Several of these candidates correspond to genes with known functions, which will allow further analysis of the physiological role of the frameshifting event. Overall, these results strongly suggest that -1 frameshift might be a more widely used strategy of controlling gene expression than previously thought.
General strategy Figure 1 shows the pipeline of our -1 identification strategy. We first download and parse the nucleic acid sequences, the intron/exon data, and their position on chromosomes. We stock them in a local database for more reliability. Our system seeks genomic configurations compatible with a -1 ribosomal frameshifting event using the following criteria: two open reading frames, one in the 0 frame (ORF0), the other in the -1 frame (ORF-1), that overlap along an intermediate shared region (Step 1). The second step was to filter undesirable low-complexity sequences that may overload the next levels. The remaining sequences were classified according to whether the 0 and/or -1 frames are already annotated as an ORF, in order to perform the subsequent HMM step. We define four classes, "left" (ORF0 is annotated), "right" (ORF -1 is annotated), "both" (both ORFs are annotated), and "none" for all the others (Step 2). This classification is necessary, as the model with a frameshift will be compared either to a coding one (if there is yet any annotation), or to a noncoding model. Two analyses were then carried out. Regions containing known protein motifs in both ORF0 and ORF-1 were retained (Step 3). In parallel, HMM filtering and estimation were performed to predict coding regions that may continue in the -1 frame after the stop codon of ORF0 (Step 3'). This was followed by a ranking step in which we compared the likelihood ratio of each selected candidate structure on the two following assumptions: "the sequence possesses a frameshift" and "the sequence does not possess any frameshift," taking into account the class of the candidate defined in Step 2 (Step 3'). We then tested the candidate regions for expression in vivo by looking for the presence of a full-length polyadenylated mRNA, using oligo(dT)-primed RT-PCR (Step 4). Finally, for the remaining candidates, -1 frameshifting efficiencies were determined in vivo, using a dual reporter system (Step 5).
Creating a data set of potential -1 frameshift regions
Assessing functional frameshifting by InterproScan All of the hit sequences were then subjected to a protein motif search. Each candidate sequence was kept only if it exhibited, in both frames, a pattern featured by the InterPro database and InterProScan (http://www.ebi.ac.uk.interpro/). Since this step was the most time-consuming of the whole analysis, it was first performed on the smallest of the two ORFs in each putative frameshifted candidate. This database includes BlastProDom, FPrintScan, HMMPIR, HMMPfam, HMMSmart, HMMTigr, ProfileScan, ScanRegExp, and SuperFamily. The default parameter settings were used for the search. This approach was validated since the only actual frameshifting region was retrieved from the L-A virus genome. Moreover, 84 candidates were found in the S. cerevisiae genome and only 11 in the S. cerevisiae artificial genome. Among these 84 S. cerevisiae genomic regions, three categories could be defined. In the first category, 69 exhibited domains that contain stretches of repeated amino acids in each of the two frames. These are not low-complexity sequences that were already discarded at Step 2, but correspond to an area with a high density of a given amino acid, not a linear repetition of the same amino acid. Notably, no such candidates were found in the random genome. The second category is composed of regions in which the two ORFs bear similar protein patterns, or two distinct but functionally compatible motifs (e.g., a sugar transporter and a sugar binding site). We found six such regions in the S. cerevisiae genome and none in the random genome. The third category includes eight regions that bear functional regions in one ORF and amino acid repetitions in the other ORF. All 11 candidate sequences from the random genome belong to this category.
Obtaining structure candidates by HMM For each step to be performed in a HMM framework, one has to completely specify a model, i.e., a probability law on the hidden state's structure and a law for the emission of observed letters within each state. One has to note that the aim here is not simply to detect genes, but rather to select candidates for which the extension after the stop, in the -1 frame, is similar to that of coding regions. As far as we know, existing software designed for gene detection does not offer such flexibility: At present they are designed to detect nonoverlapping genes and are surely not able to detect a coding sequence with a frameshifting site. The beginning of such a gene may be missed if the length between the start codon and the frameshift is too short. Even when it is found, the program will probably decide on a false end, based on the presence of a stop codon. In addition, the part after the frameshift will hardly be detected because of the lack of a start codon. In the following paragraph, we detail the construction of the HMM and the strategy used for detection and ranking.
First, one needs to describe a model fitting with gene structure constraints. The simplest structure is summarized in Figure 3, and corresponds to the one used by common gene detectors (Burge and Karlin 1997
Then, to adapt our model for the detection of frameshifted genes, we allowed coding regions to appear in the -1 frame after the stop. For this purpose, we inserted a transition from the state corresponding to the last base of the stop to the -1 coding frame of each coding type. We kept only those sequences for which the sum of the corresponding transition probabilities was >0.95, which corresponds to the clear-cut threshold shown in Figure 4.
As a positive control, we tested this step of our approach on the L-A virus. This virus is selected with a probability
Using this criterion, a final set of 110 candidates was retrieved. To incorporate for each selected candidate the known coding status of the two possible coding frames, we separately treated the sequences in the four classes defined above: left, right, both, and none. In each class, we then ranked the sequences according to the likelihood ratio, which is a measure of the confidence we may assign to the claim "X contains a frameshift" in comparison with "X does not contain a frameshift":
Where theta_fs and theta_nofs stand, respectively, for the parameters of the model under the two following assumptions: "a frameshift exists" and "no frameshift exists" conditionally on the status of the ORF. More details about the models used conditionally on the subset can be found in the Methods section. Candidates with their rank are summarized in Table 1. From these scores, we selected 23 candidates to be tested (seven from the none class, seven from the both class, five from the left class, and four from the right class). Figure 5 shows a representation of a "good" (fsORF 25) and a "bad" (fsORF 36) candidate.
Common candidates Finally, we crossed the results obtained using the protein motifs search and the HMM search. Five common candidates were identified by comparing the 84 regions obtained in the first approach with the 110 regions obtained in the second approach. As the two methods are independent, these five common candidates together with 25 candidates from the protein motifs approach and the 18 best ranked candidates from the HMM approach were selected for further biological investigation. We also selected the 10 worst candidates to serve as a control of the relevance of the ranking procedure (Table 1).
Genomic sequence of the candidates
Expression of candidate sequences
These results demonstrate that the same molecule of mRNA covers both ORFs and that these mRNAs are polyadenylated. The region of overlap of the cDNAs corresponding to all the bicistronic mRNAs was analyzed by gel electrophoresis and subsequently sequenced (data not shown). For three candidate regions, the presence of an unexpected intron was demonstrated (Table 1). Close examination of the sequence revealed that the regions harbor a degenerate intron boundary pattern. For the remaining candidates there was no evidence of length or sequence polymorphism, suggesting that no splicing or editing event had taken place.
Quantification of -1 frameshift efficiency
Ascomycetes conservation In order to determine if the organization of the 11 fragments directing frameshifting in vivo is preserved in other yeasts, we carried out alignments of the sequences against the genomic sequences of other ascomycetes. We found four structures in which only ORF0 is conserved (- e-value 4.3 x 10-23), one in which ORF0 is present only in the Candida glabrata genome (fsORF 12, e-value = 5.1 x 10-13), and two in which no homolog could be found (Table 3). Interestingly, four structures (fsORF 33, 44, 51, and 52) are completely preserved (ORF0, ORF -1, and frameshifted organization). Surprisingly, fsORF 33 and 35 were reported to have a polymorphism (frameshift mutation) in S. cerevisiae and to present only one open reading frame (Brachat et al. 2003
Here, we describe a comprehensive analysis of the S. cerevisiae genome that attempted to identify cellular recoding events occurring during translational -1 frameshifting. We developed a genomic approach, seeking genes with an extended coding potential, without prior constraint from existing ideas on the -1 frameshift mechanism.
In a first step, 22,445 genomic structures were extracted from the genome of S. cerevisiae. This value relies on two strong assumptions. First, we chose to collect only extensions of polypeptide but no premature ending, although biologically pertinent frameshifting events, such as in Escherichia coli DnaX, could lead to the synthesis of a shortened product (Tsuchihashi and Kornberg 1990
Our approach identified 189 candidates in the S. cerevisiae genome. None of them had previously been found using a similar approach developed by Harrison et al. (2002 Among the 189 candidate regions, 58 were analyzed further. Fifty of them showed the expected sequence, of which 31 directed transcription of an mRNA spanning the two overlapping ORFs. These 28 regions were cloned in a dual reporter vector, and 11 directed a -1 frameshifting efficiency 50-fold higher than background. To detect a possible mRNA editing mechanism, we sequenced the RT-PCR products for each of them. No RNA post-transcriptional modification was identified (Table 2). Moreover, from the amplification of the mRNA using a poly(dT) primer in the reverse transcription step, we concluded that these mRNAs are polyadenylated and not rapidly degraded.
No candidate conformed to the canonical model of the -1 frameshifting sites of Jacks et al. (1988
Among the candidates, three carry compatible protein patterns in the two ORFs, which suggests that they might actually be biologically significant. More precisely, Sco2 contains "electron transport" and "bipartite nuclear localization signal" motifs in ORF0 and ORF-1, respectively. It is similar to Sco1p and may have a redundant function with Sco1p in delivery of copper to cytochrome c oxidase; it interacts with Cox2p (Lode et al. 2002
In conclusion, the combination of two simple approaches has allowed us to identify several candidate genes potentially controlled by a -1 frameshift mechanism. Up to now frameshifting in chromosomal genes has been considered as a rare event, except in the case of +1 frameshifting found in a high proportion (>5%) in the ciliates such as Euplotes (Klobutcher and Farabaugh 2002
Data sources The system uses entire chromosome sequences from the GenBank/RefSeq database (Maglott et al. 2000
Random sequences
Implementation In terms of family coverage, the protein signature databases are similar in size but differ in content. While all the methods share a common interest in protein sequence classification, some focus on divergent domains (e.g., Pfam), some focus on functional sites (e.g., PROSITE), and others focus on families, specializing in hierarchical definitions from superfamily down to subfamily levels in order to pinpoint specific functions (e.g., PRINTS). TIGRFAMs focus on building HMMs for functionally equivalent proteins, and PIR SuperFamilies produces HMMs over the full length of a protein and have protein length restrictions to gather family members. SUPERFAMILY is based on structure using the SCOP superfamilies as a basis for building HMMs. ProDom uses PSI-BLAST to find homologous domains that are clustered in the same ProDom entry. The clustered resources are derived automatically from the UniProt databases.
Low-complexity filtering
HMM specification and estimation For the filter step (3'), the added links starting from the stop add three degrees of freedom to the model (the probabilities of shifting to the three possible coding states). In addition, three other parameters were added that correspond to the three coding state's length laws from STOP2 to STOP3. We chose to estimate these three new length parameters only on the left, right, and both subsets. It was necessary to set up such a conservative fashion, since an important proportion of the 22,445 sequences considered could possibly influence the length estimation through an atypical composition in their intergenic regions. More precisely, some intergenic regions appear to be better fitted by a mixture of two or three coding regions than by the intergenic law (Fig. 3B). Probabilities of transition from the stop to the shifted coding regions were then deduced with a classical forwardbackward algorithm on the 22,445 candidate structures to achieve step 3'. For the ranking step, the likelihood of filtered sequences was calculated under the two assumptions: "the sequence contains a frameshift" and "the sequence contains no frameshift." Whereas the first assumption corresponds to the same model for all of the candidates, different models were designed for each of the classes left, right, none, and both for the second assumption. These correspond to the following facts:
The sequences were then ranked within each class on the log odd-ratio of the two concerned assumptions, rescaled by their length.
Ascomycetes comparison
Yeast strains and media
Plasmids
Enzymatic activities and -1 frameshift efficiency
Molecular biology procedures and RT-PCR
Total RNA was extracted from 5 mL of exponential yeast culture (Schmitt et al. 1990
We are very grateful to Florent Bourassé, Alain Denise, Jean-Paul Forest, Christine Froidevaux, Michel Termier, and members of the G.M.T. laboratory for stimulating discussions. We are especially grateful to Anne-Lise Haenni for critically reading the manuscript and to Michael DuBow for proofreading the work prior to resubmission.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4258005.
3 Corresponding author.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410.[CrossRef][Medline] Baranov, P.V., Gesteland, R.F., and Atkins, J.F. 2002a. Recoding: Translational bifurcations in gene expression. Gene 286: 187-201.[CrossRef][Medline] . 2002b. Release factor 2 frameshifting sites in different bacteria. EMBO Rep. 3: 373-377.[CrossRef][Medline]
Baranov, P.V., Gurvich, O.L., Hammer, A.W., Gesteland, R.F., and Atkins, J.F. 2003. Recode 2003. Nucleic Acids Res. 31: 87-89. Bekaert, M. and Rousset, J.P. 2005. An extended signal involved in eukaryotic -1 frameshifting operates through modification of the E site tRNA. Mol. Cell 17: 61-68.[CrossRef][Medline]
Bekaert, M., Bidou, L., Denise, A., Duchateau-Nguyen, G., Forest, J.P., Froidevaux, C., Hatin, I., Rousset, J.P., and Termier, M. 2003. Towards a computational model for -1 eukaryotic frameshifting sites. Bioinformatics 19: 327-335. Bertrand, C., Prere, M.F., Gesteland, R.F., Atkins, J.F., and Fayet, O. 2002. Influence of the stacking potential of the base 3' of tandem shift codons on -1 ribosomal frameshifting used for gene expression. RNA 8: 16-28.[Abstract]
Birney, E., Thompson, J.D., and Gibson, T.J. 1996. PairWise and SearchWise: Finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res. 24: 2730-2739. Borodovsky, M. and McIninch, J. 1993. Recognition of genes in DNA sequence with ambiguities. Biosystems 30: 161-171.[CrossRef][Medline] Brachat, S., Dietrich, F.S., Voegeli, S., Zhang, Z., Stuart, L., Lerch, A., Gates, K., Gaffney, T., and Philippsen, P. 2003. Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii. Genome Biol. 4: R45.[CrossRef][Medline] Brierley, I., Digard, P., and Inglis, S.C. 1989. Characterization of an efficient coronavirus ribosomal frameshifting signal: Requirement for an RNA pseudoknot. Cell 57: 537-547.[CrossRef][Medline] Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78-94.[CrossRef][Medline]
Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., and Johnston, M. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301: 71-76.
Delneri, D., Gardner, D.C., and Oliver, S.G. 1999. Analysis of the seven-member AAD gene set demonstrates that genetic redundancy in yeast may be more apparent than real. Genetics 153: 1591-1600. Denise, A., Ponty, Y., and Termier, M. 2003. Random generation of structured genomic sequences. In Recomb'03, Berlin. Dujon, B., Sherman, D., Fischer, G., Durrens, P., Casaregola, S., Lafontaine, I., De Montigny, J., Marck, C., Neuveglise, C., Talla, E., et al. 2004. Genome evolution in yeasts. Nature 430: 35-44.[CrossRef][Medline]
Gelfand, M.S., Mironov, A.A., and Pevzner, P.A. 1996. Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. 93: 9061-9066.
Gesteland, R.F., Weiss, R.B., and Atkins, J.F. 1992. Recoding: Reprogrammed genetic decoding. Science 257: 1640-1641. Gurvich, O.L., Baranov, P.V., Zhou, J., Hammer, A.W., Gesteland, R.F., and Atkins, J.F. 2003. Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli. EMBO J. 22: 5941-5950.[CrossRef][Medline]
Hamada, H., Petrino, M.G., Kakunaga, T., Seidman, M., and Stollar, B.D. 1984. Characterization of genomic poly(dT-dG).poly(dC-dA) sequences: Structure, organization, and conformation. Mol. Cell. Biol. 4: 2610-2621.
Hammell, A.B., Taylor, R.C., Peltz, S.W., and Dinman, J.D. 1999. Identification of putative programmed -1 ribosomal frameshift signals in large DNA databases. Genome Res. 9: 417-427. Harrison, P., Kumar, A., Lan, N., Echols, N., Snyder, M., and Gerstein, M. 2002. A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution. J. Mol. Biol. 316: 409-419.[CrossRef][Medline]
Ito, H., Fukuda, Y., Murata, K., and Kimura, A. 1983. Transformation of intact yeast cells treated with alkali cations. J. Bacteriol. 153: 163-168.
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. 98: 4569-4574.
Jacks, T. and Varmus, H.E. 1985. Expression of the Rous sarcoma virus pol gene by ribosomal frameshifting. Science 230: 1237-1242. Jacks, T., Madhani, H.D., Masiarz, F.R., and Varmus, H.E. 1988. Signals for ribosomal frameshifting in the Rous sarcoma virus gag-pol region. Cell 55: 447-458.[CrossRef][Medline] Kellis, M., Patterson, N., Endrizzi, M., Birren, B., and Lander, E.S. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423: 241-254.[CrossRef][Medline] Klobutcher, L.A. and Farabaugh, P.J. 2002. Shifty ciliates: Frequent programmed translational frameshifting in euplotids. Cell 111: 763-766.[CrossRef][Medline] Liphardt, J. 1999. The mechanism of -1 ribosomal frameshifting: Experimental and theoretical analysis. Churchill College, Cambridge, UK. Lode, A., Paret, C., and Rodel, G. 2002. Molecular characterization of Saccharomyces cerevisiae Sco2p reveals a high degree of redundancy with Sco1p. Yeast 19: 909-922.[CrossRef][Medline]
Maglott, D.R., Katz, K.S., Sicotte, H., and Pruitt, K.D. 2000. NCBI's LocusLink and RefSeq. Nucleic Acids Res. 28: 126-128.
Manktelow, E., Shigemoto, K., and Brierley, I. 2005. Characterization of the frameshift signal of Edr, a mammalian example of programmed -1 ribosomal frameshifting. Nucleic Acids Res. 33: 1553-1563. Mironov, A.A., Roytberg, M.A., Pevzner, P.A., and Gelfand, M.S. 1998. Performance-guarantee gene predictions via spliced alignment. Genomics 51: 332-339.[CrossRef][Medline]
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., et al. 2003. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31: 315-318.
Namy, O., Hatin, I., Stahl, G., Liu, H., Barnay, S., Bidou, L., and Rousset, J.P. 2002. Gene overexpression as a tool for identifying new trans-acting factors involved in translation termination in Saccharomyces cerevisiae. Genetics 161: 585-594.
Namy, O., Duchateau-Nguyen, G., Hatin, I., Hermann-Le Denmat, S., Termier, M., and Rousset, J.P. 2003. Identification of stop codon readthrough genes in Saccharomyces cerevisiae. Nucleic Acids Res. 31: 2289-2296. Namy, O., Rousset, J.P., Napthine, S., and Brierley, I. 2004. Reprogrammed genetic decoding in cellular gene expression. Mol. Cell 13: 157-168.[CrossRef][Medline]
Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.D., Prum, B., and Bessieres, P. 2002. Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. Nucleic Acids Res. 30: 1418-1426.
Nielsen, H., Brunak, S., and von Heijne, G. 1999. Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng. 12: 3-9. Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Meth. Enzymol. 183: 63-98.[Medline] Rabiner, L.R. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77: 257-285.[CrossRef]
Sato, M., Umeki, H., Saito, R., Kanai, A., and Tomita, M. 2003. Computational analysis of stop codon readthrough in D. melanogaster. Bioinformatics 19: 1371-1380.
Schmitt, M.E., Brown, T.A., and Trumpower, B.L.. 1990. A rapid and simple method for preparation of RNA from Saccharomyces cerevisiae. Nucleic Acids Res. 18: 3091-3092.
Shigemoto, K., Brennan, J., Walls, E., Watson, C.J., Stott, D., Rigby, P.W., and Reith, A.D. 2001. Identification and characterisation of a developmentally regulated mammalian gene that utilises -1 programmed ribosomal frameshifting. Nucleic Acids Res. 29: 4079-4088. Snyder, E.E. and Stormo, G.D. 1995. Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248: 1-18.[CrossRef][Medline] Solovyev, V. and Salamov, A. 1997. The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5: 294-302.[Medline]
Solovyev, V.V., Salamov, A.A., and Lawrence, C.B. 1994. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22: 5156-5163. Sonnhammer, E.L., von Heijne, G., and Krogh, A. 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6: 175-182.[Medline]
Stahl, G., Bidou, L., Rousset, J.P., and Cassan, M. 1995. Versatile vectors to study recoding: Conservation of rules between yeast and mammalian cells. Nucleic Acids Res. 23: 1557-1560.
Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G.R., Korf, I., Lapp, H., et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12: 1611-1618.
Tsuchihashi, Z. and Kornberg, A. 1990. Translational frameshifting generates the
Zdobnov, E.M. and Apweiler, R. 2001. InterProScanAn integration platform for the signature-recognition methods in InterPro. Bioinformatics 17: 847-848.
http://bioperl.org/; The Bioperl Project. http://cbi.labri.fr/Genolevures/; Genolevure. http://www.lri.fr/~denise/GenRGenS/; GenRGenS home page. http://www.ebi.ac.uk/interpro/; InterPro database. http://mips.gsf.de/genre/proj/yeast/; Munish information center for protein sequences (MIPS). http://www.ncbi.nlm.nih.gov/; National Center for Biotechnology Information (NCBI). http://www.yeastgenome.org/; Saccharomyces Genome Database (SGD). http://www-mig.jouy.inra.fr/ssb/SHOW/; Structured HOmogeneities Watcher (SHOW).
Received June 10, 2005; accepted in revised format July 18, 2005.
|