Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mateos, A.
Right arrow Articles by Stolovitzky, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mateos, A.
Right arrow Articles by Stolovitzky, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 12, Issue 11, 1703-1715, November 2002

LETTER
Systematic Learning of Gene Functional Classes From DNA Array Expression Data by Using Multilayer Perceptrons

Alvaro Mateos,1 Joaquín Dopazo,1 Ronald Jansen,2 Yuhai Tu,3 Mark Gerstein,2 and Gustavo Stolovitzky3,4

1 Bioinformatics Unit, Centro Nacional de Investigaciones Oncologicas (CNIO), 28039, Madrid, Spain; 2 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA; 3 IBM Computational Biology Center, T.J. Watson Research Center, Yorktown Heights, New York 10598, USA

Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for ~100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only ~10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.


4 Corresponding author.


12:1703-1715 ©2002 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/02 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information
Bioinformatics, March 1, 2008; 24(5): 621 - 628.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Yu, R. Jansen, G. Stolovitzky, and M. Gerstein
Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
Bioinformatics, August 15, 2007; 23(16): 2163 - 2173.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Massjouni, C. G. Rivera, and T. M. Murali
VIRGO: computational prediction of gene functions.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W340 - W344.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. V. Tetko, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, G. Fobo, A. Ruepp, A. V. Antonov, D. Surmeli, and H.-W. Mewes
MIPS bacterial genomes functional annotation benchmark dataset
Bioinformatics, May 15, 2005; 21(10): 2520 - 2521.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Kloster, C. Tang, and N.S. Wingreen
Finding regulatory modules through large-scale gene-expression data analysis
Bioinformatics, April 1, 2005; 21(7): 1172 - 1179.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y. Kluger, D. P. Tuck, J. T. Chang, Y. Nakayama, R. Poddar, N. Kohya, Z. Lian, A. B. Nasr, H. R. Halaban, D. S. Krause, et al.
Lineage specificity of gene expression patterns
PNAS, April 27, 2004; 101(17): 6508 - 6513.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. B. Owen, J. Stuart, K. Mach, A. M. Villeneuve, and S. Kim
A Gene Recommender Algorithm to Identify Coexpressed Genes in C. elegans
Genome Res., August 1, 2003; 13(8): 1828 - 1837.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein
Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions
Genome Res., April 1, 2003; 13(4): 703 - 716.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.