Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lyons-Weiler, J.
Right arrow Articles by Bhattacharya, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lyons-Weiler, J.
Right arrow Articles by Bhattacharya, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Vol 13, Issue 3, 503-512, March 2003

METHODS

A Classification-Based Machine Learning Approach for the Analysis of Genome-Wide Expression Data

James Lyons-Weiler1,2, Satish Patel and Soumyaroop Bhattacharya

Department of Biological Sciences/Graduate Program in Biochemistry/Center for Bioinformatics and Computational Biology, University of Massachusetts, Lowell, Lowell, Massachusetts 01854, USA

Three important areas of data analysis for global gene expression analysis are class discovery, class prediction, and finding dysregulated genes (biomarkers). The clinical application of microarray data will require marker genes whose expression patterns are sufficiently well understood to allow accurate predictions on disease subclass membership. Commonly used methods of analysis include hierarchical clustering algorithms, t-, F-, and Z-tests, and machine learning approaches. We describe an approach called the maximum difference subset (MDSS) algorithm that combines classification algorithms, classical statistics, and elements of machine learning and provides a coherent framework. By integrating prediction accuracy, the MDSS algorithm learns the critical threshold of statistical significance (the {alpha} or P-value), eliminating the arbitrariness of setting a threshold of statistical significance and minimizing the effect of the normality assumptions. To reduce the false positive rate and to increase external validity of the predictive gene set, a jackknife step is used. This step identifies and removes genes in the initial MDSS with low combined predictive utility. The overall MDSS provides a prediction that is less dependent on an arbitrary study design (sample inclusion or exclusion) and should thus have high external validity. We demonstrate that this approach, unlike other published methods, identifies biomarkers capable of predicting the outcome of anthracycline-cytarabine chemotherapy in cases of acute myeloid leukemia. By incorporating two criteria—statistical significance and predictive utility—the approach learns the significance level relevant for a given data set. The MDSS approach can be used with any test and classifier operator pair.


1 Present address: Department of Pathology/Center for Pathology Informatics/Benedum Center for Oncology Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15232, USA.

2 Corresponding author.

E-MAIL lyonsweilerj{at}msx.upmc.edu; FAX (412) 647-5380.

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.104003.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Bhattacharya and T. J. Mariani
Transformation of expression intensities across generations of Affymetrix microarrays using sequence matching and regression modeling
Nucleic Acids Res., October 13, 2005; 33(18): e157 - e157.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
A. C. Borczuk, L. Shah, G. D. N. Pearson, K. L. Walter, L. Wang, J. H. M. Austin, R. A. Friedman, and C. A. Powell
Molecular Signatures in Biopsy Specimens of Lung Cancer
Am. J. Respir. Crit. Care Med., July 15, 2004; 170(2): 167 - 174.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.