Genome Research Econo tag

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Raychaudhuri, S.
Right arrow Articles by Altman, R. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Raychaudhuri, S.
Right arrow Articles by Altman, R. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 12, Issue 1, 203-214, January 2002

METHODS
Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature

Soumya Raychaudhuri,1 Jeffrey T. Chang,1 Patrick D. Sutphin,2 and Russ B. Altman1,3,4

Departments of 1 Genetics and 2 Radiation Oncology, Stanford University, Stanford, California 94305, USA

Functional characterizations of thousands of gene products from many species are described in the published literature. These discussions are extremely valuable for characterizing the functions not only of these gene products, but also of their homologs in other organisms. The Gene Ontology (GO) is an effort to create a controlled terminology for labeling gene functions in a more precise, reliable, computer-readable manner. Currently, the best annotations of gene function with the GO are performed by highly trained biologists who read the literature and select appropriate codes. In this study, we explored the possibility that statistical natural language processing techniques can be used to assign GO codes. We compared three document classification methods (maximum entropy modeling, naïve Bayes classification, and nearest-neighbor classification) to the problem of associating a set of GO codes (for biological process) to literature abstracts and thus to the genes associated with the abstracts. We showed that maximum entropy modeling outperforms the other methods and achieves an accuracy of 72% when ascertaining the function discussed within an abstract. The maximum entropy method provides confidence measures that correlate well with performance. We conclude that statistical methods may be used to assign GO codes and may be useful for the difficult task of reassignment as terminology standards evolve over time.


3 Present address: Stanford Medical Informatics, 251 Campus Drive, MSOB X-215, Stanford, CA 94305, USA.

4 Corresponding author.


12:203-214 ©2002 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/02 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
I. N. Sarkar
Biodiversity informatics: organizing and linking information across the spectrum of life
Brief Bioinform, September 1, 2007; 8(5): 347 - 357.
[Abstract] [Full Text] [PDF]


Home page
Proc Am Thorac SocHome page
Y. A. Lussier and Y. Liu
Computational Approaches to Phenotyping: High-Throughput Phenomics
Proceedings of the ATS, January 1, 2007; 4(1): 18 - 25.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J.-B. Lee, J.-j. Kim, and J. C. Park
Automatic extension of Gene Ontology with flexible identification of candidate terms
Bioinformatics, March 15, 2006; 22(6): 665 - 670.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Khatri, B. Done, A. Rao, A. Done, and S. Draghici
A semantic analysis of the annotations of the human genome
Bioinformatics, August 15, 2005; 21(16): 3416 - 3421.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Koike, Y. Niwa, and T. Takagi
Automatic extraction of gene/protein biological functions from biomedical text
Bioinformatics, April 1, 2005; 21(7): 1227 - 1236.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. M. Wilkinson and B. A. Huberman
A method for finding communities of related genes
PNAS, April 6, 2004; 101(suppl_1): 5241 - 5248.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Raychaudhuri, J. T. Chang, F. Imam, and R. B. Altman
The computational analysis of scientific literature to define and recognize gene expression clusters
Nucleic Acids Res., August 1, 2003; 31(15): 4553 - 4560.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
T. Nagashima, D. G. Silva, N. Petrovsky, L. A. Socha, H. Suzuki, R. Saito, T. Kasukawa, I. V. Kurochkin, A. Konagaya, and C. Schonbach
Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS
Genome Res., June 1, 2003; 13(6): 1520 - 1533.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
O. D. King, R. E. Foulger, S. S. Dwight, J. V. White, and F. P. Roth
Predicting Gene Function From Patterns of Annotation
Genome Res., May 1, 2003; 13(5): 896 - 904.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
S. Raychaudhuri, H. Schutze, and R. B. Altman
Using Text Analysis to Identify Functionally Coherent Gene Groups
Genome Res., October 1, 2002; 12(10): 1582 - 1590.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.