|
Vol. 12, Issue 11, 1703-1715, November 2002
LETTER
Systematic Learning of Gene Functional Classes From DNA Array Expression Data by Using Multilayer Perceptrons
Alvaro
Mateos,1
Joaquín
Dopazo,1
Ronald
Jansen,2
Yuhai
Tu,3
Mark
Gerstein,2 and
Gustavo
Stolovitzky3,4
1 Bioinformatics Unit, Centro Nacional de Investigaciones
Oncologicas (CNIO), 28039, Madrid, Spain; 2 Department of
Molecular Biophysics and Biochemistry, Yale University, New Haven,
Connecticut 06520, USA; 3 IBM Computational Biology Center,
T.J. Watson Research Center, Yorktown Heights, New York 10598, USA
Recent advances in microarray technology have opened new ways for
functional annotation of previously uncharacterised genes on a genomic
scale. This has been demonstrated by unsupervised clustering of
co-expressed genes and, more importantly, by supervised learning
algorithms. Using prior knowledge, these algorithms can assign
functional annotations based on more complex expression signatures
found in existing functional classes. Previously, support vector
machines (SVMs) and other machine-learning methods have been applied to
a limited number of functional classes for this purpose. Here we
present, for the first time, the comprehensive application of
supervised neural networks (SNNs) for functional annotation. Our study
is novel in that we report systematic results for ~100 classes in the
Munich Information Center for Protein Sequences (MIPS) functional
catalog. We found that only ~10% of these are learnable (based on
the rate of false negatives). A closer analysis reveals that false
positives (and negatives) in a machine-learning context are not
necessarily "false" in a biological sense. We show that the high
degree of interconnections among functional classes confounds the
signatures that ought to be learned for a unique class. We term this
the "Borges effect" and introduce two new numerical indices for its
quantification. Our analysis indicates that classification systems with
a lower Borges effect are better suitable for machine learning.
Furthermore, we introduce a learning procedure for combining false
positives with the original class. We show that in a few iterations
this process converges to a gene set that is learnable with
considerably low rates of false positives and negatives and contains
genes that are biologically related to the original class, allowing for
a coarse reconstruction of the interactions between associated
biological pathways. We exemplify this methodology using the
well-studied tricarboxylic acid cycle.
4
Corresponding author.
12:1703-1715 ©2002 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/02 $5.00

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information
Bioinformatics,
March 1, 2008;
24(5):
621 - 628.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Yu, R. Jansen, G. Stolovitzky, and M. Gerstein
Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
Bioinformatics,
August 15, 2007;
23(16):
2163 - 2173.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Massjouni, C. G. Rivera, and T. M. Murali
VIRGO: computational prediction of gene functions.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W340 - W344.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. V. Tetko, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, G. Fobo, A. Ruepp, A. V. Antonov, D. Surmeli, and H.-W. Mewes
MIPS bacterial genomes functional annotation benchmark dataset
Bioinformatics,
May 15, 2005;
21(10):
2520 - 2521.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Kloster, C. Tang, and N.S. Wingreen
Finding regulatory modules through large-scale gene-expression data analysis
Bioinformatics,
April 1, 2005;
21(7):
1172 - 1179.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Kluger, D. P. Tuck, J. T. Chang, Y. Nakayama, R. Poddar, N. Kohya, Z. Lian, A. B. Nasr, H. R. Halaban, D. S. Krause, et al.
Lineage specificity of gene expression patterns
PNAS,
April 27, 2004;
101(17):
6508 - 6513.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. B. Owen, J. Stuart, K. Mach, A. M. Villeneuve, and S. Kim
A Gene Recommender Algorithm to Identify Coexpressed Genes in C. elegans
Genome Res.,
August 1, 2003;
13(8):
1828 - 1837.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein
Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions
Genome Res.,
April 1, 2003;
13(4):
703 - 716.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|