|
Vol. 12, Issue 11, 1625-1641, November 2002
Structural Characterization of the Human Proteome
Arne
Müller,1
Robert M.
MacCallum,1,4 and
Michael J.E.
Sternberg1,2,3
1 Biomolecular Modelling Laboratory, Cancer Research UK,
London, United Kingdom; 2 Department of Biological Sciences,
Structural Bioinformatics Group, Imperial College of Science,
Technology and Medicine, South Kensington, London, United Kingdom
This paper reports an analysis of the encoded proteins (the
proteome) of the genomes of human, fly, worm, yeast, and
representatives of bacteria and archaea in terms of the
three-dimensional structures of their globular domains together with a
general sequence-based study. We show that 39% of the human proteome
can be assigned to known structures. We estimate that for 77% of the
proteome, there is some functional annotation, but only 26% of the
proteome can be assigned to standard sequence motifs that characterize function. Of the human protein sequences, 13% are transmembrane proteins, but only 3% of the residues in the proteome form
membrane-spanning regions. There are substantial differences in the
composition of globular domains of transmembrane proteins between the
proteomes we have analyzed. Commonly occurring structural superfamilies are identified within the proteome. The frequencies of these
superfamilies enable us to estimate that 98% of the human proteome
evolved by domain duplication, with four of the 10 most duplicated
superfamilies specific for multicellular organisms. The zinc-finger
superfamily is massively duplicated in human compared to fly and worm,
and occurrence of domains in repeats is more common in metazoa than in
single cellular organisms. Structural superfamilies over- and underrepresented in human disease genes have been identified. Data and
results can be downloaded and analyzed via web-based applications at
http://www.sbg.bio.ic.ac.uk.
[Supplemental material is
available online at http://www.genome.org.]
3
Present address: Stockholm Bioinformatics Center,
Department of Biochemistry and Biophysics, Stockholm University, S-106
91 Stockholm, Sweden.
4
Corresponding author.
12:1625-1641 ©2002 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/02 $5.00

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
K.-J. Hwang, F. Mahmoodian, J. A. Ferretti, E. D. Korn, and J. M. Gruschus
Intramolecular interaction in the tail of Acanthamoeba myosin IC between the SH3 domain and a putative pleckstrin homology domain
PNAS,
January 16, 2007;
104(3):
784 - 789.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Kunin, S. A. Teichmann, M. A. Huynen, and C. A. Ouzounis
The properties of protein family space depend on experimental design
Bioinformatics,
June 1, 2005;
21(11):
2618 - 2622.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. C. Nicholson, M. Ayhan, N. J. Hoogenraad, and H. Zola
In silico evaluation of two mass spectrometry-based approaches for the identification of novel human leukocyte cell-surface proteins
J. Leukoc. Biol.,
February 1, 2005;
77(2):
190 - 198.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. J. McGuffin, S. A. Street, K. Bryson, S.-A. Sorensen, and D. T. Jones
The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms
Nucleic Acids Res.,
January 1, 2004;
32(90001):
D196 - 199.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Fleming, A. Muller, R. M. MacCallum, and M. J. E. Sternberg
3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes
Nucleic Acids Res.,
January 1, 2004;
32(90001):
D245 - 250.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Chothia, J. Gough, C. Vogel, and S. A. Teichmann
Evolution of the Protein Repertoire
Science,
June 13, 2003;
300(5626):
1701 - 1703.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|