|
|
|
|
Published online before print
January 8, 2007, 10.1101/gr.5509507 Genome Res. 17:175-183, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 OPEN ACCESS ARTICLE
Letter Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana1 Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan; 2 Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan; 3 Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan; 4 Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan; 5 EMBL OutstationEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom; 6 Biometrics and Bioinformatics Unit, International Rice Research Institute, DAPO Box 7777, Metro Manila, Philippines; 7 Department of Biology, McGill University, Montreal, Quebec H3A 1B1, Canada; 8 Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA; 9 Department of Genetics, The University of Georgia, Athens, Georgia, 30602-7223, USA; 10 Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey 08854, USA; 11 Institute for Bioinformatics, GSF National Research Center for Environment and Health, D-85764 Neuherberg, Germany; 12 Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China; 13 Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan; 14 Institute of Botany, Academia Sinica, Nankang, Taipei 11529, Taiwan; 15 Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Ibaraki 305-0032, Japan; 16 Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan; 17 Department of Plant Breeding, Cornell University, Ithaca, New York 14853, USA; 18 Department of Biological Sciences, Tokyo Metropolitan University, Hachioji-shi, Tokyo 192-0397, Japan; 19 Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi 110021, India; 20 National Institute of Crop Science, National Agriculture and Food Research Organization, Tsukuba, Ibaraki 305-8518, Japan; 21 SWISS-PROT Group, Swiss Institute of Bioinformatics, CH-1211 Geneva 4, Switzerland; 22 Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11723, USA; 23 Division of Biology, California Institute of Technology, Pasadena, California 91125, USA; 24 Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA; 25 RIKEN BioResource Center, RIKEN Tsukuba Institute, Tsukuba, Ibaraki 305-0074, Japan; 26 Department of Molecular Genetics and Microbiology, and Center for Infectious Diseases, The State University of New York at Stony Brook, Stony Brook, New York 11794, USA; 27 Genoscope, 91057 Evry Cedex, France; 28 Metabolomics Research Group, RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan; 29 Technische Universität München, Genome Oriented Bioinformatics, D-85354 Freising-Weihenstephan, Germany; 30 Plant Computational Biology, Max-Planck-Institute for Plant Breeding Research, D 50829 Cologne, Germany; 31 Plant Functional Genomics Research Group, RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan; 32 RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan; 33 National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi 110012, India; 34 National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA; 35 Rice Gene Discovery Unit, Kasetsart University, Nakorn Pathom 73140, Thailand; 36 The Institute for Genomic Research, Rockville, Maryland 20850, USA; 37 Arizona Genomics Institute, The University of Arizona, Tucson, Arizona 85721, USA; 38 National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan; 39 Bio-Oriented Technology Research Advancement Institution, Minato-ku, Tokyo 105-0001, Japan.
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is 32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.
The majority of the worlds population depends on cereal crops as their primary source of carbohydrate. Among the cultivated cereal crops, rice makes up 20% of the total calorific intake for the human population as a whole (http://www.irri.org/science/ricestat/index.asp). In order to cope with increasing global demand for food and because of its importance as a staple, many agrobiological studies have been performed with the aim of developing more efficient rice cultivars.
With the completion of the rice genome (Oryza sativa L. ssp. japonica cultivar Nipponbare) by the international consortium on rice genome sequencing (International Rice Genome Sequencing Project 2005
To cope with the enormous amount of information produced by large-scale sequencing, several automated annotation methods have been developed for the purpose of efficient data processing. However, it is acknowledged that automated annotation alone tends to result in a high proportion of erroneous annotations, and therefore annotation data results should be carefully curated by experts before any public release in order to cut down on the amount of these erroneous annotations. Currently, manual curation remains a necessary process for developing an accurate biological database (Misra et al. 2002
There are a large number of full-length cDNAs and expressed sequence tags (ESTs) available for rice and other cereals (Fernandes et al. 2002
Arabidopsis thaliana is one of the most well-studied model organisms. Comparison of rice with the dicotyledon may assist in developing a greater understanding of intrinsic mechanisms among cereals at the molecular level. Use of knowledge accumulated about A. thaliana genes to quantify their counterparts in rice is one example of such a comparative study (Izawa et al. 2003
Number of loci in the rice genome Early estimates of the total number of rice genes by various teams indicated that the rice genome probably contained between 40,000 and 60,000 protein-coding genes, many of which did not have any counterparts in the A. thaliana genome (Goff et al. 2002 93% of the mRNAs could be mapped onto the genome. For details about the unmapped mRNAs, see Supplemental Methods.
Despite advances in the field of ab initio gene-finding methods, it still remains a challenge to accurately predict the location of genes and exons among the genomes of higher eukaryotes including flowering plants (Schoof and Karlowski 2003 Thus, we decided to use the 6941 predicted loci to which rice mRNAs were not mapped but other cDNAs could be (Table 1). We did not use the loci that were supported only by ESTs and were detected by neither the ab initio prediction programs nor the mRNA-mapping, because there seem to be a multitude of aberrant transcripts that were possibly experimental artifacts. As a result, the candidate loci of our data set could be classified into two types: identified transcripts with mRNA (FLcDNA) clones and predicted transcripts with cDNA support. The number of loci predicted for the rice genome in this study was 29,550 including the unmapped mRNA clusters (Table 1).
However, loci may exist that ab initio predictions failed to detect or for which no cDNAs have been sequenced. In fact, 1728 (8.4%) of the 20,507 mapped-mRNA loci were not predicted in our analysis, suggesting that, in addition to the 6941 predicted loci (Table 1), there may be a further 637 loci that were not predicted. Furthermore, 3298 (16.1%) of the mapped-mRNA loci were not supported by any other cDNAs, so that 1332 predicted loci might be absent from our data set. Finally, 122 loci that were neither predicted nor supported by cDNAs should be added. If we consider all of these predicted loci, the estimated number of transcribed loci in the rice genome becomes 31,641. Recent total gene estimates have suggested that there are between 38,000 and 40,000 genes in rice (Yu et al. 2005
Comparison of transcript diversity between O. sativa and A. thaliana
Primary gene structures were found to be quite similar between the two species examined. There were on average five exons per transcript. The proportion of single-exon genes was
Curation of ORF functions
The ORFs were classified into five categories according to their level of sequence similarity (see Methods). The probable protein products of 7189 loci had functions identified or inferred by BLASTX searches (Categories I and II of Table 2). Functional domains were detected in 12,780 ORFs (Category III) by InterProScan (Zdobnov and Apweiler 2001
For the remaining sequences, the functions could not be inferred, but similarity to proteins of unknown function in the databases was detected for Category IV proteins. Since the proteins of Category V did not show any homology with proteins contained in the databases, many of the sequences classified in this category may be novel. It is also suspected that this category may contain a high percentage of spurious ORFs, produced by false predictions (Das et al. 1997
Identification of non-protein-coding RNAs
We identified 131 transcripts (11.2%) as putative npRNAs, and 108 of these were multi-exon transcripts with an average exon number of 2.8 (Supplemental Table 4). The remaining 23 npRNAs were single-exon transcripts with canonical 3'-end features and/or EST support. Interestingly, 55 putative npRNAs were found to overlap the exons and/or introns of sense genes (Supplemental Table 5) and may function as antisense npRNAs (as-npRNAs). For instance, the Os08g0103700 npRNA appears to overlap two predicted sense genes on the antisense strand. It overlaps the first exon of a BTP/POZ domain-containing protein (Os08g0103600) gene, and the last intron and exon of a NAM-like protein (Os08g0103900) gene (Supplemental Fig. 3). Previously, the NAC1 transcription factor gene, a member of the NAM family, was reported to be down-regulated by the small RNA gene miR164b (Guo et al. 2005 Most of the sense genes overlapped by as-npRNAs came under our classification of hypothetical proteins. However, using our annotation criteria, 27 predicted loci could be assigned a probable function. This set of candidates may constitute a good starting point for further analysis of plant as-npRNA mechanics.
Correlation between tRNA gene numbers and codon usage
The number of isoacceptors in the rice genome was estimated on the basis of tRNAscan-SE predictions (Supplemental Table 6). First, we plotted the frequency of each amino acid obtained from the entire rice protein set against the number of corresponding tRNAs (Fig. 1A). We found a positive linear correlation between amino acid usage and the number of corresponding tRNA genes in the rice genome, which suggests that rice controls the expression of tRNAs vital for efficient protein synthesis via corresponding tRNA gene copy number, that is, tRNA gene copy numbers are proportional to individual amino acid biases. This is in contrast to current thinking that complex eukaryotes such as rice might have a complex gene regulation system. Moreover, the A. thaliana tRNA genes showed a similar pattern (Fig. 1B). Hence, it is strongly suggested that the tRNA abundance in both O. sativa and A. thaliana is determined simply by the number of gene copies rather than by complicated tRNA transcriptional regulation. Since the same tendency was found in C. elegans (Duret 2000
Second, the numbers of isoacceptors and the RSCU were examined in rice, but a clear relationship between the two was not observed (Supplemental Table 6). It is currently thought that most tRNAs are modified after transcription, which allows two or more codons to be recognized by a single tRNA (Tranquilla et al. 1982
Evolutionary process of the genes in O. sativa and A. thaliana
The protein sets still contained those lacking counterparts in the other species. In order to extensively examine the lineage-specific gene candidates for these proteins, all the proteins were compared with the UniProt Knowledgebase (UniProtKB). In both species, >14,000 proteins showed significant similarity to those obtained from nonplant species (Fig. 4), which implies that these have evolved so conservatively that the sequences did not alter drastically under strong purifying selection. In addition, the number of plant-specific homologs found in each species was similar, while there were several transcripts that were found to be specific to Oryzeae (5663 proteins) and Arabidopsis (3402 proteins) (Fig. 4). However, we could not rule out the possibility that these lineage-specific proteins were produced by false predictions of ORFs. Many of the lineage-specific proteins of rice could only be classified into Category V (Supplemental Fig. 6). The skewed length distribution of the Oryzeae-specific proteins (Supplemental Fig. 7) supported the hypothesis that there may be several bogus ORFs included in the Category V set, as noted in "Curation of ORF Functions." The rice genome might contain a large number of species-specific short proteins, but it seems also possible that many of the transcripts unique to rice are non-protein-coding or are experimental artifacts. In addition, only a few monocotyledon- or eudicotyledon-specific proteins were detected (Fig. 4), suggesting that investigations into plant species other than O. sativa and A. thaliana, at the molecular level, may not have been as detailed as they could have been. Further DNA sequencing in a variety of plant species may reduce the number of apparent lineage-specific protein-coding genes found in this study.
The curated annotation presented and described in this study revealed that the functions of 19,969 (70.0%) ORFs could be inferred by either sequence similarity or motif searches (Categories I, II, and III) (Table 2). Since we aimed to provide basic annotation only in this study, further functional assignment will be assisted in the future by sophisticated methods such as a tertiary structure-based approach (e.g., see McDermott and Samudrala 2003
Most ORFs were predicted computationally. However, we could confirm the ORFs for 834 transcripts by comparison with the proteome data (Table 2). As the number of proteins directly determined by protein sequencing increases over time, we expect to be able to filter out a greater percentage of bogus ORFs from our data set. The proteome data will also provide experimentally validated evidence of any post-translational modifications, tissue-specificity, and cellular localization (Komatsu et al. 2004
Since we focused on those genes that were validated by the cDNAs currently available and since the cDNA data set is incomplete, the estimated gene number may be regarded as a lower estimate. The presence of a transcript may not necessarily be used as the only criterion for identifying genes. In particular, there may exist a substantial number of rare transcripts or non-protein-coding genes that are currently undetected in rice. The experience in mice has shown that as more cDNA sequences were obtained, an increasingly large number of novel genes with no coding potential could be detected (Carninci et al. 2005
Although we detected 5663 lineage-specific gene candidates in rice (Fig. 4), it is unlikely that all of them were newly derived from nonfunctional DNA sequences. There are several possibilities that could account for those genes that appear to be unique to rice. First, these genes may have diverged to such an extent that their homologs could no longer be detected by sequence similarity search. This is a probable scenario among the duplicated genes for which purifying selection is not strong. Second, independent gene deletions and insufficient data sampling could have led to an apparent uniqueness of genes (Salzberg et al. 2001 Since the distributions of gene duplicates were quite similar between O. sativa and A. thaliana (Fig. 2; Supplemental Fig. 6), there may be some common factor that accounts for this observed similarity. A probable candidate is natural selection enforcing limitations on the number of duplicate genes. If the duplication was selectively neutral, genes would be duplicated or remain as single copies at random in both species. In order to assess whether duplication in the two species was random, we calculated the ratio of those orthologs that have undergone intraspecific duplication events to those that have not, for both species (Table 4). We found that the numbers obtained were different from those that would be expected if duplication and deletion events had been random (P < 1090, Fishers exact test). It seems that duplication of some genes may have been neutral or beneficial, while others were so deleterious that, if the gene was retained at all, it only remained as a single copy. Thus, the current gene composition of both O. sativa and A. thaliana seems to be partly due to natural selection, which shaped the similar genetic makeup of the genomes of these representative flowering plants.
The nucleotide sequence of the genome is so vast as to make it unreadable to human eyes alone. A representation of the underlying biology that is comprehensible to humans can only be inferred through analytical programs. The high-quality automated annotation polished by extensive manual curation provides a more sharply focused view of the genome that we hope will allow more accurately targeted experimental work and comparative analysis.
Automated annotation We used the genome sequence assembled by the International Rice Genome Sequencing Project (2005)
Curating ORF functional assignment
If a locus contained more than one gene structure, curators selected one of them as a representative transcript by examining exon numbers and some other features (for details, see Imanishi et al. 2004
All information regarding the ORF functions and gene positions in the genome can be downloaded at http://rapdownload.lab.nig.ac.jp/ (Ohyanagi et al. 2006
Comparison of the O. sativa and A. thaliana protein data sets
correction, see Ota and Nei (1994)
We are grateful to C. Robin Buell, Hisakazu Iwama, Satoshi Fukuchi, Craig Gough, Kumiko Suzuki, Junko Sugiyama, Emiko Saito, Masato Kawabata, Chikatada Satoh, Shigetoyo Furukawa, Satoshi Nobushima, Ryo Aono, Tomohiro Endo, and Michitoshi Nagamochi for their support. We thank all the participants of the First Rice Annotation Project Meeting (RAP1). We also thank the Computer Center for Agriculture, Forestry and Fisheries Research for assisting RAP1. This work was supported by a grant from the Special Coordination Funds for Promoting Science and Technology of the Ministry of Education, Culture, Sports, Science and Technology of Japan.
2 Corresponding author: Takashi Gojobori.
E-mail tgojobor{at}genes.nig.ac.jp; fax 81-55-981-6848. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5509507
The Arabidopsis Genome Initiative 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815.[CrossRef][Medline] Bennetzen, J.L., Coleman, C., Liu, R., Ma, J., and Ramakrishna, W. 2004. Consistent over-estimation of gene number in complex plant genomes. Curr. Opin. Plant Biol. 7: 732736.[CrossRef][Medline] Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., Mulder, N., Oinn, T., Maslen, J., and Cox, A., et al. 2003. The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 13: 662672. Carninci, P., Kasukawa, T., Katayama, S., Gough, J., Frith, M.C., Maeda, N., Oyama, R., Ravasi, T., Lenhard, B., and Wells, C., et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 15591563. Chaw, S.M., Chang, C.C., Chen, H.L., and Li, W.H. 2004. Dating the monocotdicot divergence and the origin of core eudicots using whole chloroplast genomes. J. Mol. Evol. 58: 424441.[CrossRef][Medline] Das, S., Yu, L., Gaitatzes, C., Rogers, R., Freeman, J., Bienkowska, J., Adams, R.M., Smith, T.F., and Lindelien, J. 1997. Biologys new Rosetta stone. Nature 385: 2930.[CrossRef][Medline] Duret, L. 2000. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. 16: 287289.[CrossRef][Medline] Fernandes, J., Brendel, V., Gai, X., Lal, S., Chandler, V.L., Elumalai, R.P., Galbraith, D.W., Pierson, E.A., and Walbot, V. 2002. Comparison of RNA expression profiles based on maize expressed sequence tag frequency analysis and micro-array hybridization. Plant Physiol. 128: 896910. Gardiner, J., Schroeder, S., Polacco, M.L., Sanchez-Villeda, H., Fang, Z., Morgante, M., Landewe, T., Fengler, K., Useche, F., and Hanafey, M., et al. 2004. Anchoring 9,371 maize expressed sequence tagged unigenes to the bacterial artificial chromosome contig map by two-dimensional overgo hybridization. Plant Physiol. 134: 13171326. Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller, P., and Varma, H., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92100. Grantham, R., Gautier, C., Gouy, M., Mercier, R., and Pave, A. 1980. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8: r49r62.[Medline] Guo, H.S., Xie, Q., Fei, J.F., and Chua, N.H. 2005. MicroRNA directs mRNA cleavage of the transcription factor NAC1 to downregulate auxin signals for Arabidopsis lateral root development. Plant Cell 17: 13761386. Hirochika, H., Guiderdoni, E., An, G., Hsing, Y.I., Eun, M.Y., Han, C.D., Upadhyaya, N., Ramachandran, S., Zhang, Q., and Pereira, A., et al. 2004. Rice mutant resources for gene discovery. Plant Mol. Biol. 54: 325334.[CrossRef][Medline] Huttenhofer, A., Schattner, P., and Polacek, N. 2005. Non-coding RNAs: Hope or hype? Trends Genet. 21: 289297.[CrossRef][Medline] Ikemura, T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 146: 121.[CrossRef][Medline] Ikemura, T. 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2: 1334.[Abstract] Imanishi, T., Itoh, T., Suzuki, Y., ODonovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., and Tanino, M., et al. 2004. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2: 856875. International Rice Genome Sequencing Project 2005. The map-based sequence of the rice genome. Nature 436: 793800.[CrossRef][Medline] Izawa, T., Takahashi, Y., and Yano, M. 2003. Comparative biology comes into bloom: Genomic and genetic comparison of flowering pathways in rice and Arabidopsis. Curr. Opin. Plant Biol. 6: 113120.[CrossRef][Medline] Jabbari, K., Cruveiller, S., Clay, O., Le Saux, J., and Bernardi, G. 2004. The new genes of rice: A closer look. Trends Plant Sci. 9: 281285.[CrossRef][Medline] Jantasuriyarat, C., Gowda, M., Haller, K., Hatfield, J., Lu, G., Stahlberg, E., Zhou, B., Li, H., Kim, H., and Yu, Y., et al. 2005. Large-scale identification of expressed sequence tags involved in rice and rice blast fungus interaction. Plant Physiol. 138: 105115. Kikuchi, S., Satoh, K., Nagata, T., Kawagashira, N., Doi, K., Kishimoto, N., Yazaki, J., Ishikawa, M., Yamada, H., and Ooka, H., et al. 2003. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301: 376379. Komatsu, S. and Tanaka, N. 2005. Rice proteome analysis: A step toward functional analysis of the rice genome. Proteomics 5: 938949.[CrossRef][Medline] Komatsu, S., Kojima, K., Suzuki, K., Ozaki, K., and Higo, K. 2004. Rice Proteome Database based on two-dimensional polyacrylamide gel electrophoresis: Its status in 2003. Nucleic Acids Res. 32: D388D392. Lai, J., Dey, N., Kim, C.-S., Bharti, A.K., Rudd, S., Mayer, K.F.X., Larkins, B.A., Becraft, P., and Messing, J. 2004. Characterization of the maize endosperm transcriptome and its comparison to the rice genome. Genome Res. 14: 19321937. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., and FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline] Li, L., Wang, X., Stolc, V., Li, X., Zhang, D., Su, N., Tongprasit, W., Li, S., Cheng, Z., and Wang, J., et al. 2006. Genome-wide transcription analyses in rice using tiling microarrays. Nat. Genet. 38: 124129.[Medline] Liu, Q., Feng, Y., Zhao, X.A., Dong, H., and Xue, Q. 2004. Synonymous codon usage bias in Oryza sativa. Plant Sci. 167: 101105.[CrossRef] Lynch, M. and Conery, J.S. 2000. The evolutionary fate and consequences of duplicate genes. Science 290: 11511155. MacIntosh, G.C., Wilkerson, C., and Green, P.J. 2001. Identification and analysis of Arabidopsis expressed sequence tags characteristic of non-coding RNAs. Plant Physiol. 127: 765776. McDermott, J. and Samudrala, R. 2003. Bioverse: Functional, structural and contextual annotation of proteins and proteomes. Nucleic Acids Res. 31: 37363737. Misra, S., Crosby, M., Mungall, C., Matthews, B., Campbell, K., Hradecky, P., Huang, Y., Kaminker, J., Millburn, G., and Prochnik, S., et al. 2002. Annotation of the Drosophila melanogaster euchromatic genome: A systematic review. Genome Biol. 3: research0083.10083.22. Miyao, A., Tanaka, K., Murata, K., Sawaki, H., Takeda, S., Abe, K., Shinozuka, Y., Onosato, K., and Hirochika, H. 2003. Target site specificity of the Tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. Plant Cell 15: 17711780. Nei, M. and Kumar, S. 2000. Molecular evolution and phylogenetics. Oxford University Press, Oxford. Ohyanagi, H., Tanaka, T., Sakai, H., Shigemoto, Y., Yamaguchi, K., Habara, T., Fujii, Y., Antonio, B.A., Nagamura, Y., and Imanishi, T., et al. 2006. The Rice Annotation Project Database (RAP-DB): Hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res. 34: D741D744. Ota, T. and Nei, M. 1994. Estimation of the number of amino-acid substitutions per site when the substitution rate varies among sites. J. Mol. Evol. 38: 642643. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., and Lopez, R. 2005. InterProScan: Protein domains identifier. Nucleic Acids Res. 33: W116W120. Salzberg, S.L., White, O., Peterson, J., and Eisen, J.A. 2001. Microbial genes in the human genome: Lateral transfer or gene loss? Science 292: 19031906. Sasaki, T., Matsumoto, T., Yamamoto, K., Sakata, K., Baba, T., Katayose, Y., Wu, J., Niimura, Y., Cheng, Z., and Nagamura, Y., et al. 2002. The genome sequence and structure of rice chromosome 1. Nature 420: 312316.[CrossRef][Medline] Schoof, H. and Karlowski, W.M. 2003. Comparison of rice and Arabidopsis annotation. Curr. Opin. Plant Biol. 6: 106112.[CrossRef][Medline] Schoof, H., Ernst, R., Nazarov, V., Pfeifer, L., Mewes, H.W., and Mayer, K.F. 2004. MIPS Arabidopsis thaliana Database (MAtDB): An integrated biological knowledge resource for plant genomics. Nucleic Acids Res. 32: D373D376. Stanhope, M.J., Lupas, A., Italia, M.J., Koretke, K.K., Volker, C., and Brown, J.R. 2001. Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature 411: 940944.[CrossRef][Medline] Sunkar, R., Girke, T., Jain, P.K., and Zhu, J.K. 2005. Cloning and characterization of microRNAs from rice. Plant Cell 17: 13971411. Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 46734680. Tranquilla, T.A., Cortese, R., Melton, D., and Smith, J.D. 1982. Sequences of four tRNA genes from Caenorhabditis elegans and the expression of C. elegans tRNALeu (anticodon IAG) in Xenopus oocytes. Nucleic Acids Res. 10: 79197934. Wu, J., Maehara, T., Shimokawa, T., Yamamoto, S., Harada, C., Takazaki, Y., Ono, N., Mukai, Y., Koike, K., and Yazaki, J., et al. 2002. A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell 14: 525535. Yamaguchi, T., Lee, D.Y., Miyao, A., Hirochika, H., An, G., and Hirano, H.-Y. 2006. Functional diversification of the two C-class MADS box genes OSMADS3 and OSMADS58 in Oryza sativa. Plant Cell 18: 1528. Yao, H., Guo, L., Fu, Y., Borsuk, L.A., Wen, T.J., Skibbe, D.S., Cui, X., Scheffler, B.E., Cao, J., and Emrich, S.J., et al. 2005. Evaluation of five ab initio gene prediction programs for the discovery of maize genes. Plant Mol. Biol. 57: 445460.[CrossRef] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||