Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Published online before print March 29, 2007, 10.1101/gr.5836207
Genome Res. 17:632-640, 2007
©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supplemental Research Data
Right arrow All Versions of this Article:
gr.5836207v1
17/5/632    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hanada, K.
Right arrow Articles by Shiu, S.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hanada, K.
Right arrow Articles by Shiu, S.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Letter

A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection

Kousuke Hanada1,2, Xu Zhang2, Justin O. Borevitz2, Wen-Hsiung Li2, and Shin-Han Shiu1,3

1 Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA; 2 Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA

Large-scale cDNA sequencing projects and tiling array studies have revealed the presence of many unannotated genes. For protein coding genes, small coding sequences may not be identified by gene finders because of the conservative nature of prediction algorithms. In this study, we identified small open reading frames (sORFs) with high coding potential by a simple gene finding method (Coding Index, CI) based on the nucleotide composition bias found in most coding sequences. Applying this method to 18 Arabidopsis thaliana and 84 yeast sORF genes with evidence of expression at the protein level gives 100% accurate prediction. In the A. thaliana genome, we identified 7159 sORFs that are likely coding sequences (coding sORFs) with the CI measure at the 1% false-positive rate. To determine if these coding sORFs are parts of functional genes, we evaluated each coding sORF for evidence of transcription or evolutionary conservation. At the 5% false-positive rate, we found that 2996 coding sORFs are likely expressed in at least one experimental condition of the A. thaliana tiling array data. In addition, the evolutionary conservation of each A. thaliana sORF was examined within A. thaliana or between A. thaliana and five plants with complete or partial genome sequences. In 3997 coding sORFs with readily identifiable homologous sequences, 2376 are subject to purifying selection at the 1% false-positive rate. After eliminating coding sORFs with similarity to known transposable elements and those that are likely missing exons of known genes, the remaining 3241 coding sORFs with either evidence of transcription or purifying selection likely belong to novel coding genes in the A. thaliana genome.


3 Corresponding author.

E-mail shius{at}msu.edu; fax (517) 353-7244.

[Supplemental material is available online at www.genome.org. The replicated tiling array experiment data has been deposited in Gene Expression Omnibus (GEO) with accession no. GSE6562.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5836207


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?





Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.
Copyright © 2007 by Cold Spring Harbor Laboratory Press.