Genome Research cityscape

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Raychaudhuri, S.
Right arrow Articles by Altman, R. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Raychaudhuri, S.
Right arrow Articles by Altman, R. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 12, Issue 10, 1582-1590, October 2002

METHODS
Using Text Analysis to Identify Functionally Coherent Gene Groups

Soumya Raychaudhuri,1,2 Hinrich Schütze,3 and Russ B. Altman1,2,4

1 Department of Genetics and 2 Stanford Medical Informatics, Stanford University, Stanford, California 94305-5479, USA; 3 Novation Biosciences, San Mateo, California 94403, USA

The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how "functionally coherent" the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function.


4 Corresponding author.


12:1582-1590 ©2002 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/02 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
M. Kankainen, G. Brader, P. Toronen, E. T. Palva, and L. Holm
Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana
Nucleic Acids Res., October 6, 2006; 34(18): e124 - e124.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Santos, D. Eggle, and David. J. States
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction
Bioinformatics, April 15, 2005; 21(8): 1653 - 1658.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Herrero, J. M. Vaquerizas, F. Al-Shahrour, L. Conde, A. Mateos, J. S. R. Diaz-Uriarte, and J. Dopazo
New challenges in gene expression data analysis and the extended GEPAS
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W485 - W491.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. M. Wilkinson and B. A. Huberman
A method for finding communities of related genes
PNAS, April 6, 2004; 101(suppl_1): 5241 - 5248.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
U. Karaoz, T. M. Murali, S. Letovsky, Y. Zheng, C. Ding, C. R. Cantor, and S. Kasif
Whole-genome annotation by using evidence integration in functional-linkage networks
PNAS, March 2, 2004; 101(9): 2888 - 2893.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Raychaudhuri, J. T. Chang, F. Imam, and R. B. Altman
The computational analysis of scientific literature to define and recognize gene expression clusters
Nucleic Acids Res., August 1, 2003; 31(15): 4553 - 4560.
[Abstract] [Full Text] [PDF]


Home page
Mol. Endocrinol.Home page
S. Albert, S. Gaudan, H. Knigge, A. Raetsch, A. Delgado, B. Huhse, H. Kirsch, M. Albers, D. Rebholz-Schuhmann, and M. Koegl
Computer-Assisted Generation of a Protein-Interaction Database for Nuclear Receptors
Mol. Endocrinol., August 1, 2003; 17(8): 1555 - 1567.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
O. G. Troyanskaya, K. Dolinski, A. B. Owen, R. B. Altman, and D. Botstein
A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)
PNAS, July 8, 2003; 100(14): 8348 - 8353.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.