Vol 13, Issue 4, 693-702, April 2003
METHODS
Informatics for Unveiling Hidden Genome Signatures
Takashi Abe1,2,3,
Shigehiko Kanaya3,4,5,
Makoto Kinouchi3,5,6,
Yuta Ichiba1,3,
Tokio Kozuki2,3 and
Toshimichi Ikemura1,3,7
1Division of Evolutionary Genetics, Department of
Population Genetics, National Institute of Genetics, The Graduate
University for Advanced Studies, Mishima, Shizuoka-ken 411-8540, Japan;2
Xanagen Inc., Sakado, Takatsu-ku, Kawasaki, Kanagawa-ken
213-0012, Japan; 3ACT-JST (Applying Advanced
Computational Science and Technology, Japan Science and Technology
Corp.), Kawaguchi, Saitama-ken, 332-0012, Japan; 4Department
of Bioinformatics and Genomes, Graduate School of Information Science,
Nara Institute of Science and Technology, Takayama, Ikoma, Nara-ken
630-0101, Japan; 5CREST JST (Core Research for
Evolutional Science and Technology, Japan Science and Technology
Corp.), Kawaguchi, Saitama-ken, 332-0012, Japan; 6Department
of Bio-System Engineering, Faculty of Engineering, Yamagata University,
Yonezawa, Yamagata-ken 992-8510, Japan
With the increasing amount of available genome sequences, novel
tools are needed for comprehensive analysis of species-specific
sequence characteristics for a wide variety of genomes. We used an
unsupervised neural network algorithm, a self-organizing map (SOM), to
analyze di-, tri-, and tetranucleotide frequencies in a wide variety of
prokaryotic and eukaryotic genomes. The SOM, which can cluster complex
data efficiently, was shown to be an excellent tool for analyzing
global characteristics of genome sequences and for revealing key
combinations of oligonucleotides representing individual genomes. From
analysis of 1- and 10-kb genomic sequences derived from 65 bacteria (a
total of 170 Mb) and from 6 eukaryotes (460 Mb), clear species-specific
separations of major portions of the sequences were obtained with the
di-, tri-, and tetranucleotide SOMs. The unsupervised algorithm could
recognize, in most 10-kb sequences, the species-specific
characteristics (key combinations of oligonucleotide frequencies) that
are signature features of each genome. We were able to classify DNA
sequences within one and between many species into subgroups that
corresponded generally to biological categories. Because the
classification power is very high, the SOM is an efficient and
fundamental bioinformatic strategy for extracting a wide range of
genomic information from a vast amount of
sequences.
[Supplemental material is available online at
www.genome.org.]
7 Corresponding author.
E-MAIL tikemura{at}lab.nig.ac.jp; FAX 81-55-981-6794.
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.634603.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
A. Oikawa, Y. Nakamura, T. Ogura, A. Kimura, H. Suzuki, N. Sakurai, Y. Shinbo, D. Shibata, S. Kanaya, and D. Ohta
Clarification of Pathway-Specific Inhibition by Fourier Transform Ion Cyclotron Resonance/Mass Spectrometry-Based Metabolic Phenotyping Studies
Plant Physiology,
October 1, 2006;
142(2):
398 - 413.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Ricke, M. Kube, S. Nakagawa, C. Erkel, R. Reinhardt, and W. Liesack
First Genome Data from Uncultured Upland Soil Cluster Alpha Methanotrophs Provide Further Evidence for a Close Phylogenetic Relationship to Methylocapsa acidiphila B2 and for High-Affinity Methanotrophy Involving Particulate Methane Monooxygenase
Appl. Envir. Microbiol.,
November 1, 2005;
71(11):
7472 - 7482.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Y. Hirai, M. Klein, Y. Fujikawa, M. Yano, D. B. Goodenowe, Y. Yamazaki, S. Kanaya, Y. Nakamura, M. Kitayama, H. Suzuki, et al.
Elucidation of Gene-to-Gene and Metabolite-to-Gene Networks in Arabidopsis by Integration of Metabolomics and Transcriptomics
J. Biol. Chem.,
July 8, 2005;
280(27):
25590 - 25595.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Mahony, D. Hendrix, A. Golden, T. J. Smith, and D. S. Rokhsar
Transcription factor binding site identification using the self-organizing map
Bioinformatics,
May 1, 2005;
21(9):
1807 - 1814.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Fankhauser and P. Maser
Identification of GPI anchor attachment signals by a Kohonen self-organizing map
Bioinformatics,
May 1, 2005;
21(9):
1846 - 1852.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Abe, H. Sugawara, M. Kinouchi, S. Kanaya, and T. Ikemura
Novel Phylogenetic Studies of Genomic Sequence Fragments Derived from Uncultured Microbe Mixtures in Environmental and Clinical Samples
DNA Res,
January 1, 2005;
12(5):
281 - 290.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. A. Wetmore and B. A. Merrick
Invited Review: Toxicoproteomics: Proteomics Applied to Toxicology and Pathology
Toxicol Pathol,
October 1, 2004;
32(6):
619 - 642.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Y. Hirai and K. Saito
Post-genomics approaches for the elucidation of plant adaptive mechanisms to sulphur deficiency
J. Exp. Bot.,
August 1, 2004;
55(404):
1871 - 1879.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Y. Hirai, M. Yano, D. B. Goodenowe, S. Kanaya, T. Kimura, M. Awazuhara, M. Arita, T. Fujiwara, and K. Saito
From The Cover: Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana
PNAS,
July 6, 2004;
101(27):
10205 - 10210.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|