Genome Res. 13:2178-2189, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Methods
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Li Li,
Christian J. Stoeckert, Jr. and
David S. Roos1
Departments of Biology and Genetics, Center for Bioinformatics, and
Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania
19104, USA
The identification of orthologous groups is useful for genome annotation,
studies on gene/protein evolution, comparative genomics, and the
identification of taxonomically restricted sequences. Methods successfully
exploited for prokaryotic genome analysis have proved difficult to apply to
eukaryotes, however, as larger genomes may contain multiple paralogous genes,
and sequence information is often incomplete. OrthoMCL provides a scalable
method for constructing orthologous groups across multiple eukaryotic taxa,
using a Markov Cluster algorithm to group (putative) orthologs and paralogs.
This method performs similarly to the INPARANOID algorithm when applied to two
genomes, but can be extended to cluster orthologs from multiple species.
OrthoMCL clusters are coherent with groups identified by EGO, but improved
recognition of "recent" paralogs permits overlapping EGO groups
representing the same gene to be merged. Comparison with previously assigned
EC annotations suggests a high degree of reliability, implying utility for
automated eukaryotic genome annotation. OrthoMCL has been applied to the
proteome data set from seven publicly available genomes (human, fly, worm,
yeast, Arabidopsis, the malaria parasite Plasmodium
falciparum, and Escherichia coli). A Web interface allows
queries based on individual genes or user-defined phylogenetic patterns
(http://www.cbil.upenn.edu/gene-family).
Analysis of clusters incorporating P. falciparum genes identifies
numerous enzymes that were incompletely annotated in first-pass annotation of
the parasite genome.
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.1224503.
1 Corresponding author. E-MAIL
droos{at}sas.upenn.edu;
FAX (215) 746-6697.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
T. Saito, M. Nishi, M. I. Lim, B. Wu, T. Maeda, H. Hashimoto, T. Takeuchi, D. S. Roos, and T. Asai
A Novel GDP-dependent Pyruvate Kinase Isozyme from Toxoplasma gondii Localizes to Both the Apicoplast and the Mitochondrion
J. Biol. Chem.,
May 16, 2008;
283(20):
14041 - 14052.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M.-R. Ho, W.-J. Jang, C.-h. Chen, L.-Y. Ch'ang, and W.-c. Lin
Designating eukaryotic orthology via processed transcription units
Nucleic Acids Res.,
April 29, 2008;
(2008)
gkn227v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. D. Cutter, J. D. Wasmuth, and N. L. Washington
Patterns of Molecular Evolution in Caenorhabditis Preclude Ancient Origins of Selfing
Genetics,
April 1, 2008;
178(4):
2093 - 2104.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S.-H. Kim and S. V. Yi
Mammalian Nonsynonymous Sites Are Not Overdispersed: Comparative Genomic Analysis of Index of Dispersion of Mammalian Proteins
Mol. Biol. Evol.,
April 1, 2008;
25(4):
634 - 642.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. D. Cutter
Divergence Times in Caenorhabditis and Drosophila Inferred from Direct Estimates of the Neutral Mutation Rate
Mol. Biol. Evol.,
April 1, 2008;
25(4):
778 - 786.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
O. Sakarya, K. S. Kosik, and T. H. Oakley
Reconstructing ancestral genome content based on symmetrical best alignments and Dollo parsimony
Bioinformatics,
March 1, 2008;
24(5):
606 - 612.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Salgado, G. Gimenez, F. Coulier, and C. Marcelle
COMPARE, a multi-organism system for cross-species data comparison and transfer of information
Bioinformatics,
February 1, 2008;
24(3):
447 - 449.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Frickey, V. A. Benedito, M. Udvardi, and G. Weiller
AffyTrees: Facilitating Comparative Analysis of Affymetrix Plant Microarray Chips
Plant Physiology,
February 1, 2008;
146(2):
377 - 386.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. J. Jensen, P. Julien, M. Kuhn, C. von Mering, J. Muller, T. Doerks, and P. Bork
eggNOG: automated construction and annotation of orthologous groups of genes
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D250 - D254.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. M. R. Davila, P. N. Mendes, G. Wagner, D. A. Tschoeke, R. R. C. Cuadrat, F. Liberman, L. Matos, T. Satake, K. A. C. S. Ocana, O. Triana, et al.
ProtozoaDB: dynamic visualization and exploration of protozoan genomes
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D547 - D552.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Rogers, I. Antoshechkin, T. Bieri, D. Blasiar, C. Bastiani, P. Canaran, J. Chan, W. J. Chen, P. Davis, J. Fernandes, et al.
WormBase 2007
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D612 - D617.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Brilli, R. Fani, and P. Lio
Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes
Brief Bioinform,
January 1, 2008;
9(1):
34 - 45.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. E. Hunt, D. Gevers, N. M. Vahora, and M. F. Polz
Conservation of the Chitin Utilization Pathway in the Vibrionaceae
Appl. Envir. Microbiol.,
January 1, 2008;
74(1):
44 - 51.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Rannikko, C. Ortutay, and M. Vihinen
Immunity genes and their orthologs: a multi-species database
Int. Immunol.,
December 1, 2007;
19(12):
1361 - 1370.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. Cornell, I. Alam, D. M. Soanes, H. M. Wong, C. Hedeler, N. W. Paton, M. Rattray, S. J. Hubbard, N. J. Talbot, and S. G. Oliver
Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi
Genome Res.,
December 1, 2007;
17(12):
1809 - 1822.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Chakrabarti, M. Pearson, L. Grate, T. Sterne-Weiler, J. Deans, J. P. Donohue, and M. Ares Jr
Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis
RNA,
November 1, 2007;
13(11):
1923 - 1939.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. Zhong and P. W. Sternberg
Automated data integration for developmental biological research
Development,
September 15, 2007;
134(18):
3227 - 3238.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Schneider, C. Dessimoz, and G. H. Gonnet
OMA Browser Exploring orthologous relations across 352 complete genomes
Bioinformatics,
August 15, 2007;
23(16):
2180 - 2182.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev
Automatic genome-wide reconstruction of phylogenetic gene trees
Bioinformatics,
July 1, 2007;
23(13):
i549 - i558.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Kee, B. J. Hwang, P. W. Sternberg, and M. Bronner-Fraser
Evolutionary conservation of cell migration genes: from nematode neurons to vertebrate neural crest
Genes & Dev.,
February 15, 2007;
21(4):
391 - 396.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Z. Szabo, A. O. Stahl, S.-V. Albers, J. C. Kissinger, A. J. M. Driessen, and M. Pohlschroder
Identification of Diverse Archaeal Proteins with Class III Signal Peptides Cleaved by Distinct Archaeal Prepilin Peptidases
J. Bacteriol.,
February 1, 2007;
189(3):
772 - 778.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Watanabe, H. Wakaguri, M. Sasaki, Y. Suzuki, and S. Sugano
Comparasite: a database for comparative study of transcriptomes of parasites defined by full-length cDNAs
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D431 - D438.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F. Wu, L. A. Mueller, D. Crouzillat, V. Petiard, and S. D. Tanksley
Combining Bioinformatics and Phylogenetics to Identify Large Sets of Single-Copy Orthologous Genes (COSII) for Comparative, Evolutionary and Systematic Studies: A Test Case in the Euasterid Plant Clade
Genetics,
November 1, 2006;
174(3):
1407 - 1420.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Chaudhary, L. M. Ting, K. Kim, and D. S. Roos
Toxoplasma gondii Purine Nucleoside Phosphorylase Biochemical Characterization, Inhibitor Profiles, and Comparison with the Plasmodium falciparum Ortholog
J. Biol. Chem.,
September 1, 2006;
281(35):
25652 - 25658.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Mica, L. Gianfranceschi, and M. E. Pe
Characterization of five microRNA families in maize
J. Exp. Bot.,
August 1, 2006;
57(11):
2601 - 2612.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. J. Penkett, J. A. Morris, V. Wood, and J. Bahler
YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W330 - W334.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S.-H. Kim and S. V. Yi
Correlated Asymmetry of Sequence and Functional Divergence Between Duplicate Proteins of Saccharomyces cerevisiae
Mol. Biol. Evol.,
May 1, 2006;
23(5):
1068 - 1075.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. V. Date and C. J. Stoeckert Jr.
Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale
Genome Res.,
April 1, 2006;
16(4):
542 - 549.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Jothi, E. Zotenko, A. Tasneem, and T. M. Przytycka
COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations
Bioinformatics,
April 1, 2006;
22(7):
779 - 788.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Bandyopadhyay, R. Sharan, and T. Ideker
Systematic identification of functional orthologs based on protein network comparison
Genome Res.,
March 1, 2006;
16(3):
428 - 435.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. Uchiyama
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
Nucleic Acids Res.,
January 25, 2006;
34(2):
647 - 658.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Che, G. Li, F. Mao, H. Wu, and Y. Xu
Detecting uber-operons in prokaryotic genomes.
Nucleic Acids Res.,
January 1, 2006;
34(8):
2418 - 2427.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Rattei, R. Arnold, P. Tischler, D. Lindner, V. Stumpflen, and H. W. Mewes
SIMAP: the similarity matrix of proteins
Nucleic Acids Res.,
January 1, 2006;
34(suppl_1):
D252 - D256.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F. Chen, A. J. Mackey, C. J. Stoeckert Jr, and D. S. Roos
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
Nucleic Acids Res.,
January 1, 2006;
34(suppl_1):
D363 - D368.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Li, A. Coghlan, J. Ruan, L. J. Coin, J.-K. Heriche, L. Osmotherly, R. Li, T. Liu, Z. Zhang, L. Bolund, et al.
TreeFam: a curated database of phylogenetic trees of animal gene families
Nucleic Acids Res.,
January 1, 2006;
34(suppl_1):
D572 - D580.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Bevan and S. Walsh
The Arabidopsis genome: A foundation for plant research
Genome Res.,
December 1, 2005;
15(12):
1632 - 1642.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. W. Zhou, B. F. C. Kafsack, R. N. Cole, P. Beckett, R. F. Shen, and V. B. Carruthers
The Opportunistic Pathogen Toxoplasma gondii Deploys a Diverse Legion of Invasion and Survival Proteins
J. Biol. Chem.,
October 7, 2005;
280(40):
34233 - 34244.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. E. Donald and E. I. Shakhnovich
Determining functional specificity from protein sequences
Bioinformatics,
June 1, 2005;
21(11):
2629 - 2635.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Barta, E. Sebestyen, T. B. Palfy, G. Toth, C. P. Ortutay, and L. Patthy
DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants
Nucleic Acids Res.,
January 1, 2005;
33(suppl_1):
D86 - D90.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Muller, M. Denis, L. Gentzbittel, and T. Faraut
The Iccare web server: an attempt to merge sequence and mapping information for plant and animal species
Nucleic Acids Res.,
July 1, 2004;
32(suppl_2):
W429 - W434.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Bandyopadhyay, R. Sharan, and T. Ideker
Systematic identification of functional orthologs based on protein network comparison
Genome Res.,
March 1, 2006;
16(3):
428 - 435.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|