Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hegyi, H.
Right arrow Articles by Gerstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hegyi, H.
Right arrow Articles by Gerstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 11, Issue 10, 1632-1640, October 2001

LETTER
Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins

Hedi Hegyi, and Mark Gerstein1

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA

Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.


1 Corresponding author.


11:1632-1640 ©2001 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/01 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
H. Yu, R. Jansen, G. Stolovitzky, and M. Gerstein
Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
Bioinformatics, August 15, 2007; 23(16): 2163 - 2173.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Wang and G. Caetano-Anolles
Global Phylogeny Determined by the Combination of Protein Domains in Proteomes
Mol. Biol. Evol., December 1, 2006; 23(12): 2444 - 2454.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Lin, L. Zhu, and D.-Y. Zhang
An initial strategy for comparing proteins at the domain architecture level
Bioinformatics, September 1, 2006; 22(17): 2081 - 2086.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Lozada-Chavez, S. C. Janga, and J. Collado-Vides
Bacterial regulatory networks are extremely flexible in evolution
Nucleic Acids Res., July 13, 2006; 34(12): 3434 - 3445.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
M. K. Ashby and J. Houmard
Cyanobacterial Two-Component Proteins: Structure, Diversity, Distribution, and Evolution
Microbiol. Mol. Biol. Rev., June 1, 2006; 70(2): 472 - 509.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
A. Sivakumar, C. Wilton, and L. Holm
From sequences to a functional unit
Physiol Genomics, March 13, 2006; 25(1): 1 - 8.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. A. George, R. V. Spriggs, G. J. Bartlett, A. Gutteridge, M. W. MacArthur, C. T. Porter, B. Al-Lazikani, J. M. Thornton, and M. B. Swindells
Inaugural Article: Effective function annotation through catalytic residue conservation
PNAS, August 30, 2005; 102(35): 12299 - 12304.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Espadaler, R. Aragues, N. Eswar, M. A. Marti-Renom, E. Querol, F. X. Aviles, A. Sali, and B. Oliva
Detecting remotely related proteins by their interactions and sequence similarity
PNAS, May 17, 2005; 102(20): 7151 - 7156.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
B. E. Shakhnovich, E. Deeds, C. Delisi, and E. Shakhnovich
Protein structure and evolutionary history determine sequence space topology
Genome Res., March 1, 2005; 15(3): 385 - 392.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. Guldener, G. Mannhaupt, M. Munsterkotter, et al.
The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes
Nucleic Acids Res., October 14, 2004; 32(18): 5539 - 5545.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
H. Yu, N. M. Luscombe, H. X. Lu, X. Zhu, Y. Xia, J.-D. J. Han, N. Bertin, S. Chung, M. Vidal, and M. Gerstein
Annotation Transfer Between Genomes: Protein-Protein Interologs and Protein-DNA Regulogs
Genome Res., June 1, 2004; 14(6): 1107 - 1118.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
C. Vogel, S. A. Teichmann, and C. Chothia
Looking at the bigger picture
Development, May 15, 2004; 131(10): 2238 - 2240.
[Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
S. M. Baxter, J. S. Rosenblum, S. Knutson, M. R. Nelson, J. S. Montimurro, J. A. Di Gennaro, J. A. Speir, J. J. Burbaum, and J. S. Fetrow
Synergistic Computational and Experimental Proteomics Approaches for More Accurate Detection of Active Serine Hydrolases in Yeast
Mol. Cell. Proteomics, March 1, 2004; 3(3): 209 - 225.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Thompson, V. Prigent, and O. Poch
LEON: multiple aLignment Evaluation Of Neighbours
Nucleic Acids Res., February 24, 2004; 32(4): 1298 - 1307.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Madera, C. Vogel, S. K. Kummerfeld, C. Chothia, and J. Gough
The SUPERFAMILY database in 2004: additions and improvements
Nucleic Acids Res., January 1, 2004; 32(90001): D235 - 239.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. Muller, R. M. MacCallum, and M. J.E. Sternberg
Structural Characterization of the Human Proteome
Genome Res., November 1, 2002; 12(11): 1625 - 1641.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Lin, J. Qian, D. Greenbaum, P. Bertone, R. Das, N. Echols, A. Senes, B. Stenger, and M. Gerstein
GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing
Nucleic Acids Res., October 15, 2002; 30(20): 4574 - 4582.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
S. A. Benner, M. D. Caraco, J. M. Thomson, and E. A. Gaucher
Planetary Biology--Paleontological, Geological, and Molecular Histories of Life
Science, May 3, 2002; 296(5569): 864 - 868.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. Schug, S. Diskin, J. Mazzarelli, B. P. Brunk, and C. J. Stoeckert Jr.
Predicting Gene Ontology Functions from ProDom and CDD Protein Domains
Genome Res., April 1, 2002; 12(4): 648 - 655.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.