Genome Research songbird

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Herwig, R.
Right arrow Articles by O'Brien, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Herwig, R.
Right arrow Articles by O'Brien, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 9, Issue 11, 1093-1105, November 1999

LETTER
Large-Scale Clustering of cDNA-Fingerprinting Data

Ralf Herwig,1,3 Albert J. Poustka,1 Christine Müller,2 Christof Bull,1 Hans Lehrach,1 and John O'Brien1

1 Max-Planck Institut für Molekulare Genetik, Ihnestrasse 73, D-14195 Berlin, Germany; 2 Institut für Mathematische Stochastik, Georg-August-Universität, D-37083 Göttingen, Germany

Clustering is one of the main mathematical challenges in large-scale gene expression analysis. We describe a clustering procedure based on a sequential k-means algorithm with additional refinements that is able to handle high-throughput data in the order of hundreds of thousands of data items measured on hundreds of variables. The practical motivation for our algorithm is oligonucleotide fingerprinting---a method for simultaneous determination of expression level for every active gene of a specific tissue---although the algorithm can be applied as well to other large-scale projects like EST clustering and qualitative clustering of DNA-chip data. As a pairwise similarity measure between two p-dimensional data points, x and y, we introduce mutual information that can be interpreted as the amount of information about x in y, and vice versa. We show that for our purposes this measure is superior to commonly used metric distances, for example, Euclidean distance. We also introduce a modified version of mutual information as a novel method for validating clustering results when the true clustering is known. The performance of our algorithm with respect to experimental noise is shown by extensive simulation studies. The algorithm is tested on a subset of 2029 cDNA clones coming from 15 different genes from a cDNA library derived from human dendritic cells. Furthermore, the clustering of these 2029 cDNA clones is demonstrated when the entire set of 76,032 cDNA clones is processed.


3 Corresponding author.


9:1093-1105 ©1999 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/99 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Biol. Reprod.Home page
A.P. Hess, A.E. Hamilton, S. Talbi, C. Dosiou, M. Nyegaard, N. Nayak, O. Genbecev-Krtolica, P. Mavrogianis, K. Ferrer, J. Kruessel, et al.
Decidual Stromal Cell Response to Paracrine Signals from the Trophoblast: Amplification of Immune and Angiogenic Modulators
Biol Reprod, January 1, 2007; 76(1): 102 - 117.
[Abstract] [Full Text] [PDF]


Home page
EndocrinologyHome page
S. Talbi, A. E. Hamilton, K. C. Vo, S. Tulac, M. T. Overgaard, C. Dosiou, N. Le Shay, C. N. Nezhat, R. Kempson, B. A. Lessey, et al.
Molecular Phenotyping of Human Endometrium Distinguishes Menstrual Cycle Phases and Underlying Biological Processes in Normo-Ovulatory Women
Endocrinology, March 1, 2006; 147(3): 1097 - 1121.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Gupta, C. D. Maranas, and R. Albert
Elucidation of directionality for co-expressed genes: predicting intra-operon termination sites
Bioinformatics, January 15, 2006; 22(2): 209 - 214.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C.-J. Wu and S. Kasif
GEMS: a web server for biclustering analysis of expression data
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W596 - W599.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Wang, F. S. Makedon, J. C. Ford, and J. Pearlman
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data
Bioinformatics, April 15, 2005; 21(8): 1530 - 1537.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Balasubramaniyan, E. Hullermeier, N. Weskamp, and J. Kamper
Clustering of gene expression data using a local shape-based similarity measure
Bioinformatics, April 1, 2005; 21(7): 1069 - 1077.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. J. Poustka, D. Groth, S. Hennig, S. Thamm, A. Cameron, A. Beck, R. Reinhardt, R. Herwig, G. Panopoulou, and H. Lehrach
Generation, Annotation, Evolutionary Analysis, and Database Integration of 20,000 Unique Sea Urchin EST Clusters
Genome Res., December 1, 2003; 13(12): 2736 - 2746.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Xu, V. Olman, L. Wang, and Y. Xu
EXCAVATOR: a computer program for efficiently mining gene expression data
Nucleic Acids Res., October 1, 2003; 31(19): 5582 - 5589.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
F. Katagiri and J. Glazebrook
Local Context Finder (LCF) reveals multidimensional relationships among mRNA expression profiles of Arabidopsis responding to pathogen infection
PNAS, September 16, 2003; 100(19): 10842 - 10847.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
G. Panopoulou, S. Hennig, D. Groth, A. Krause, A. J. Poustka, R. Herwig, M. Vingron, and H. Lehrach
New Evidence for Genome-Wide Duplications at the Origin of Vertebrates Using an Amphioxus Gene Set and Completed Animal Genomes
Genome Res., June 1, 2003; 13(6): 1056 - 1066.
[Abstract] [Full Text] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
T. R. Sutter, X.-R. He, P. Dimitrov, L. Xu, G. Narasimhan, E. O. George, C. H. Sutter, C. Grubbs, R. Savory, M. Stephan-Gueldner, et al.
Multiple Comparisons Model-based Clustering and Ternary Pattern Tree Numerical Display of Gene Response to Treatment: Procedure and Application to the Preclinical Evaluation of Chemopreventive Agents
Mol. Cancer Ther., December 1, 2002; 1(14): 1283 - 1292.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
W. Zhang, P. M. Laborde, K. R. Coombes, D. A. Berry, and S. R. Hamilton
Cancer Genomics: Promises and Complexities
Clin. Cancer Res., August 1, 2001; 7(8): 2159 - 2167.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
M. D. Clark, S. Hennig, R. Herwig, S. W. Clifton, M. A. Marra, H. Lehrach, S. L. Johnson, and t. W.-G. E. Group
An Oligonucleotide Fingerprint Normalized and Expressed Sequence Tag Characterized Zebrafish cDNA Library
Genome Res., September 1, 2001; 11(9): 1594 - 1602.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.