|
|
|
|
|
Genome Res. 14:343-353, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Research Comparative Analysis of Protein Domain OrganizationProgram in Bioinformatics and Systems Biology, The Burnham Institute, La Jolla, California 92037, USA
ABSTRACT We have developed a set of graph theory-based tools, which we call Comparative Analysis of Protein Domain Organization (CADO), to survey and compare protein domain organizations of different organisms. In the language of CADO, the organization of protein domains in a given organism is shown as a domain graph in which protein domains are represented as vertices, and domain combinations, defined as instances of two domains found in one protein, are represented as edges. CADO provides a new way to analyze and compare whole proteomes, including identifying the consensus and difference of domain organization between organisms. CADO was used to analyze and compare >50 bacterial, archaeal, and eukaryotic genomes. Examples and overviews presented here include the analysis of the modularity of domain graphs and the functional study of domains based on the graph topology. We also report on the results of comparing domain graphs of two organisms, Pyrococcus horikoshii (an extremophile) and Haemophilus influenzae (a parasite with reduced genome) with other organisms. Our comparison provides new insights into the genome organization of these organisms. Finally, we report on the specific domain combinations characterizing the three kingdoms of life, and the kingdom "signature" domain organizations derived from those specific domain combinations.
With complete genomes of >100 organisms already known and hundreds of genomes in the final stages of assembly, there is less and less excitement associated with the completion of yet another genome. Genomic projects, to some extent, are victims of their own successthe pace of sequencing is outstripping our ability to analyze and comprehend all the new information. We lack the right tools, and perhaps even the right paradigm, to fully understand the wealth of information contained in even the smallest genome. Most genome analyses do not go much beyond presenting simple statistics, overview of existing pathways, and perhaps some examples of novel or conspicuously missing elements (Frishman et al. 2003
Domain fusion/shuffling is one of the most important events in the evolution of modern proteins (Patthy 1999
Several applications of domain combination analysis, developed in the past few years, followed the realization that if two domains can be found in one protein their functions must somehow be related. For example, Bork et al. investigated the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization (Mott et al. 2002
Graph theory-based methods have been developed to study the global properties of domain graphs (Wuchty 2001
In the work presented here, we do not insist on any single interpretation of domain fusion. We believe that whatever the reasons are for two or more domains being fused into one protein, analysis of such fusions in the global picture of a genome domain graph may provide new insights into the function of specific domains, and into comparisons between organisms. At the same time, we do not limit the analysis to the global properties of domain graphs. Instead, we focus on detailed structures of domain graphs, the modularity, connectivity, and internal structure of the domain graph, which are applied to the functional study of protein domains. It should also be noted that the term "domain graph" in this paper describes the domain organization of proteins; this term is also used in the literature to denote a different type of relationship between proteins based on the structural similarity of domains (Dokholyan et al. 2002 The set of tools developed here, Comparative Analysis of Protein Domain Organization (CADO), has three major functions: (1) Provide a global view of domain organization in an entire genome. (2) Discover clusters of domains by domain clustering. (3) Compare domain graphs between genomes. These tools have been applied to survey and compare the domain graphs of 53 organisms. Among the questions we studied are the modularity of domain graphs and the functional homogeneity of the domains in a cluster, and the commonalities and differences of various organisms and kingdoms in terms of domain organization. RESULTS
Domain Graphs in a Single Genome
The number of domains, the number of domain combinations, and the size of the giant component (as measured by the number of domains it consists of) of each organism (Supplemental Table) increase with the complexity of the organisms, but very slowly compared with the increase in the number of predicted open reading frames (ORFs) in each genome (Fig. 1). As noticed many times before, all of the multicellular eukaryotes have many more domain combinations than prokaryotes or single-cellular eukaryotes (represented in our study by only one representative, yeast), even though the numbers of domains present in their genomes are not significantly different. This observation corresponds to a well-known characteristic of eukaryotic proteins that tend to be longer and contain more domains than archaeal or bacterial proteins. The rapid increase of the number of ORFs in these genomes may be the result of genome and gene duplications that are important for the evolution of complexity (Holland 1999
Modularity of Domain Graphs and Functional Homogeneity of Domain Clusters Domain graphs have higher modularity (see Methods) than random scale-free graphs, with an average clustering coefficient of 0.45 for the giant components and 0.14 for the overall domain graphs. This implies that some groups of domains form almost independent networks, and they connect weakly to the rest of the domain graph. Based on this observation, domain graphs can be further dissected into clusters of domains by clustering domains according to their topological overlap (see Methods).
In biological networks, clusters in a connected graph are often used to infer the relationship between its elements. For example, it has been shown that clusters of genes based on the genomic association have a homogeneous functional composition (Snel et al. 2002
Our results (see Fig. 2 and the following discussion) confirm that (1) the domains that are clustered together in the domain graph have similar functions; and (2) the clusters have higher functional homogeneity when the domain graphs are dissected into smaller clusters. In our study, the functional distance between domains was defined according to the Gene Ontology (GO) functional category (Ashburner et al. 2000
To show the functional homogeneity of domain clusters, we use S. cerevisiae as an example. In its domain graph, the FHI of all domain pairs is 4.7, the FHI of connected domain pairs is 4.0, and the FHI of domain combinations is 1.7, reflecting the fact that directly connected domains have more similar functions compared with indirectly connected or disconnected domains. The entire domain graph was divided into clusters at different levels. For instance, if we chose cluster size 5, the domain graph was divided into clusters each having no more than five domains. The functional homogeneity test reveals that the functional homogeneity of the domain clusters is correlated with the cluster size: smaller clusters have higher homogeneous functions. The small domain clusters have a stronger functional homogeneity when compared with the domain combinations between clusters (i.e., domains in two different clusters; Fig. 2). For instance, when the cluster size is 5, the FHI of the domain clusters is 1.1, even smaller than the FHI of all domain combinations (1.7). Such a result can be explained by the big difference between the functional homogeneity of domain combinations within clusters (FHI 0.70) and the functional homogeneity of domain combinations between clusters (FHI 2.54). This result illustrates the advantage of using clusters over single domain combinations to study the function of domains, because some domain combinations, that is, the domain combinations between clusters, do not necessarily imply similar functions (Fig. 2). The significance of functional homogeneity of domain clusters was calculated by a permutation test. The FHIs of domain clusters in real S. cerevisiae domain graphs at all levels (Fig. 2) are all significantly lower than those clusters in a random domain graph with the same graph topology but rearranged domains. For instance, for cluster size 20, the FHI of the domain clusters in this domain graph is 2.6 with a P-value of 1.8e 27 (using 10,000 simulations). The same calculation was run for all the domain graphs studied here, in all cases showing significant functional homogeneity of domain clusters (see Supplemental Table B).
Application of Modularity in Functional Annotation
Domain combinations that connect two domain clusters are also important, providing unique coupling between otherwise independent processes. In particular, we concentrated our attention on unique domain combinations providing the only connection between two clusters of domains, and we call these combinations bridges. For instance, the domain combination of ABC_tran and Acetyltransf in P. horikoshii described above (Figs. 3 and 4) is a bridge in this sense. We can expect that mutations or deletions in bridge domains would decouple the two networks represented by the clusters and result in significant phenotypes. We found 151 domain combinations to be bridges in several genomes that connect two clusters each having at least three domains (Supplemental Table C).
Comparing Domain Organizations of Various Organisms
In the second example, the domain graph of Haemophilus influenzae was compared with that of Escherichia coli K12. Both organisms belong to
Phylogenetic Profiling of Domains and Domain Combinations Phylogenetic profiling is a simple yet helpful tool for the functional study of domains and their combinations. Similar to tools used by other groups (Pellegrini et al. 1999
Comparing Domain Organizations of the Kingdoms Domain graphs of all the genomes studied here were compared with each other to extract common and specific domains and domain combinations of each of the kingdoms (Bacteria, Archaea, and Eukaryota; see Methods). Overall, many more specific domain combinations were found in eukaryotic genomes (280 in total) than in bacterial (40) and archaeal (7) genomes. The common and specific domain combinations were further mapped onto a "combined" domain graph, composed of all domains and combinations from all the genomes, to give an overview of the distribution of specific domains and domain combinations (Fig. 6). The analysis of this domain graph shows that both common and specific combinations are clustered into components, making it possible to derive kingdom "signature" domain organizations from those components. Selected examples are shown below, and a complete list of common and specific combinations is given in Supplemental Table D.
Common Domain Organization in All Genomes Only 13 domain combinations were detected in all the genomes, but this number increased to 50 when we adopted a weaker definition (i.e., combinations are found in at least 80% of the organisms). It is a surprisingly small number compared with the size of the entire domain graph (5236 combinations in total). Most of those domain combinations are found in fundamental proteins; also, some may be artifacts of domain definitions, where Pfam domain definitions do not correspond to structurally independent modules. One cluster of common combinations contains domains EFG_C, EFG_IV, GTP_EFTU, GTP_EFTU_D2, and GTP_EFTU_D3, which represent various elongation factors involved in DNA regulation. Another cluster of common combinations contains domains RNA_pol_Rpb1_1, RNA_pol_Rpb1_2, RNA_pol_Rpb1_3, RNA_pol_Rpb1_4, RNA_pol_Rpb1_5, RNA_pol_Rpb2_6, and RNA_pol_Rpb2_7, which are found in RNA polymerase Rpb1 and RNA polymerase Rpb2.
Eukaryotic "Signature" Domain Organizations
One cluster (middle in Fig. 7) is related to ubiquitination (Hershko et al. 2000
The domains in the second cluster are mostly related to DNA-binding activity and RNA-binding activity, both of which are involved in various functions including transcriptional regulation, alternative splicing, DNA-repair, and the like (bottom in Fig. 7). This cluster is a network of zinc fingers (Evans and Hollenberg 1988 We grouped the remaining domains in the third cluster (top in Fig. 7). Three highly connected domains, PH, pkinase, and ank, determine the topology of this cluster. This cluster might be involved in several different functions because the ank domain occurs in many functionally diverse proteins mainly from eukaryotes, and the PH domain occurs in a wide range of proteins involved in intracellular signaling or as constituents of the cytoskeleton.
The domain zf-C3HC4, a RING-finger domain of 40 to 60 residues (Borden and Freemont 1996
Bacterial Signature Domain Organizations DISCUSSION In this study, we have described a set of graph theory tools that can be used for characterizing domain graphs as well as comparing them among different organisms. By studying the topology of graph domains, we can derive important clues to the functional roles of domains and their combinations. Graph analysis can identify domain organization features, such as clusters or bridges, which are not available from a standard type of analysis provided by a list of domain combinations. One example, showing the central role of the zf_C3HC4 domain was discussed here in detail; many more examples can be found at the CADO Web site. By comparing domain graphs of several genomes using CADO, we identified not only universal but also kingdom-specific combinations. These combinations provide important clues to domain functions and relations between organisms. Certainly, the specificity study of domain organizations is not limited to the three kingdoms. We can compare any arbitrary set of organisms with others or even search for groups of organisms that share some specific domain combinations. We identified the domains and combinations specific for only two kingdoms, or, in other words, specifically missing in the other one. For example, domains PCRF and RF-1 (found in peptide chain release factors) are specific for bacterial and eukaryotic but not archaeal genomes, whereas domains eRF1_1, eRF1_2, and eRF1_3 (found in release factor eRF1) are specific for archaeal and eukaryotic but not bacterial genomes, although they are all involved with the ubiquitous process of protein biosynthesis. Such specific domains and combinations might provide additional clues to the study of the relationships between specific pathways in each of the three kingdoms. Other useful tests might be to compare the hyperthermophilic organisms with the other organisms to identify their specific domain organizations, or to compare the pathogenic organisms with nonpathogenic ones to extract domain organizations possibly related to virulence. As discussed above, domain graphs analyze one specific type of functional coupling of domainstheir fusion into one protein. We can easily imagine that other mechanisms, such as coregulation or cocompartmentalization, may play a similar functional role, but would be missed in the domain graph language. Yet the fact that some organisms chose this particular way of coupling of the two domains tells us about the functional relation between the domains and about the similarity between regulatory mechanisms for a given process in two (or more) organisms. CADO, applied here to analyze domain graphs, provides a general framework for dissecting, as well as comparing large networks. It can also be applied to biological networks other than domain graphs, or even to nonbiological networks.
The real challenge, however, of describing the functional relations between proteins in a genome comes from its multidimensional natureone can imagine that several other types of relations between proteins can also be defined and described in a similar language. Although there are some pioneering works, such as the inference of a complete proteinprotein interaction network of organisms by combining both experimental and computational results (Jeong et al. 2001 METHODS
Domain Graph
A graph has a "scale-free" topology if its degree distribution decays as a power law, P(k) K (Jeong et al. 2000
Domain Graph Dissection
Comparative Analysis of Domain Organization (CADO)
Domain Assignment
Functional Similarity Between Domains
Genome Data Acknowledgements We thank Slawek K. Grzechnik for his assistance in computations of domain assignments. We thank Bruce Worcester for help in editing. We appreciate all helpful comments from the anonymous reviewers, especially on the statistical test of the functional homogeneity of domain clusters, and the explanation of the connection between the ubiquitination system and gene expression. The research described in this manuscript was supported by NIH grant GM60049. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact. Footnotes E-MAIL yye{at}burnham.org; FAX (858) 646-3171. E-MAIL adam{at}burnham.org; FAX (858) 646-3171. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1610504. [Supplemental material is available online at www.genome.org and http://ffas.ljcrf.edu/DomainGraph.] REFERENCES
Aasland, R., Gibson, T.J., and Stewart, A.F. 1995. The PHD finger: Implications for chromatin-mediated transcriptional regulation. Trends Biochem. Sci. 20: 5659.[CrossRef][Medline] Anantharaman, V., Koonin, E.V., and Aravind, L. 2001. TRAM, a predicted RNA-binding domain, common to tRNA uracil methylation and adenine thiolation enzymes. FEMS Microbiol. Lett. 197: 215221.[CrossRef][Medline] Apic, G., Gough, J., and Teichmann, S.A. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310: 311325.[CrossRef][Medline] Aravind, L. and Koonin, E.V. 1998. The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem. Sci. 23: 469472.[CrossRef][Medline] Aravind, L. and Koonin, E.V. 2000. The U box is a modified RING fingerA common domain in ubiquitination. Curr. Biol. 10: R132R134.[CrossRef][Medline] Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 2529.[CrossRef][Medline] Bashton, M. and Chothia, C. 2002. The geometry of domain combination in proteins. J. Mol. Biol. 315: 927939.[CrossRef][Medline]
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276280.
Birney, E., Kumar, S., and Krainer, A.R. 1993. Analysis of the RNA-recognition motif and RS and RGG domains: Conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res. 21: 58035816. Borden, K.L. and Freemont, P.S. 1996. The RING finger domain: A recent example of a sequence-structure family. Curr. Opin. Struct. Biol. 6: 395401.[CrossRef][Medline] Bork, P., Hofmann, K., Bucher, P., Neuwald, A.F., Altschul, S.F., and Koonin, E.V. 1997. A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J. 11: 6876.[Abstract]
Carballo, E., Lai, W.S., and Blackshear, P.J. 1998. Feedback inhibition of macrophage tumor necrosis factor-
Dokholyan, N.V., Shakhnovich, B., and Shakhnovich, E.I. 2002. Expanding protein universe and its origin from the biological Big Bang. Proc. Natl. Acad. Sci. 99: 1413214136. Enright, A.J. and Ouzounis, C.A. 2001. Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2: RESEARCH0034.[Medline] Enright, A.J., Iliopoulos, I., Kyrpides, N.C., and Ouzounis, C.A. 1999. Protein interaction maps for complete genomes based on gene fusion events. Nature 402: 8690.[CrossRef][Medline] Evans, R.M. and Hollenberg, S.M. 1988. Zinc fingers: Gilt by association. Cell 52: 13.[CrossRef][Medline]
Frishman, D., Mokrejs, M., Kosykh, D., Kastenmuller, G., Kolesov, G., Zubrzycki, I., Gruber, C., Geier, B., Kaps, A., Albermann, K., et al. 2003. The PEDANT genome database. Nucleic Acids Res. 31: 207211. Galperin, M.Y. and Koonin, E.V. 2000. Who's your neighbor? New computational approaches for functional genomics. Nat. Biotechnol. 18: 609613.[CrossRef][Medline] George, D.G., Hunt, L.T., Yeh, L.S., and Barker, W.C. 1985. New perspectives on bacterial ferredoxin evolution. J. Mol. Evol. 22: 2031.[CrossRef][Medline] Guelzim, N., Bottani, S., Bourgine, P., and Kepes, F. 2002. Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet. 31: 6063.[CrossRef][Medline] Hershko, A., Ciechanover, A., and Varshavsky, A. 2000. The ubiquitin system. Nat. Med. 6: 10731081.[CrossRef][Medline] Hoch, J.A. 2000. Two-component and phosphorelay signal transduction. Curr. Opin. Microbiol. 3: 165170.[CrossRef][Medline] Holland, P.W.H. 1999. Gene duplication: Past, present and future. Cell Dev. Biol. 10: 541547. Holm, L. and Sander, C. 1998. Dictionary of recurrent domains in protein structures. Proteins 33: 8896.[CrossRef][Medline]
Hou, J., Sims, G.E., Zhang, C., and Kim, S.H. 2003. A global representation of the protein fold space. Proc. Natl. Acad. Sci. 100: 23862390. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabasi, A.L. 2000. The large-scale organization of metabolic networks. Nature 407: 651654.[CrossRef][Medline] Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. 2001. Lethality and centrality in protein networks. Nature 411: 4142.[CrossRef][Medline]
Joazeiro, C.A., Wing, S.S., Huang, H., Leverson, J.D., Hunter, T., and Liu, Y.C. 1999. The tyrosine kinase negative regulator c-Cbl as a RING-type, E2-dependent ubiquitin-protein ligase. Science 286: 309312. Jones, D., Crowe, E., Stevens, T.A., and Candido, E.P. 2002. Functional and phylogenetic analysis of the ubiquitylation system in Caenorhabditis elegans: Ubiquitin-conjugating enzymes, ubiquitin-activating enzymes, and ubiquitin-like proteins. Genome Biol. 3: RESEARCH0002.[Medline] Kanno, M., Hasegawa, M., Ishida, A., Isono, K., and Taniguchi, M. 1995. mel-18, a Polycomb group-related mammalian gene, encodes a transcriptional negative regulator with tumor suppressive activity. EMBO J. 14: 56725678.[Medline] Katz, R.A. and Jentoft, J.E. 1989. What is the role of the cyshis motif in retroviral nucleocapsid (NC) proteins? Bioessays 11: 176181.[CrossRef][Medline]
Kelley, B.P., Sharan, R., Karp, R.M., Sittler, T., Root, D.E., Stockwell, B.R., and Ideker, T. 2003. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. 100: 1139411399. Kriventseva, E.V., Koch, I., Apweiler, R., Vingron, M., Bork, P., Gelfand, M.S., and Sunyaev, S. 2003. Increase of functional diversity by alternative splicing. TIG 19: 124128.
Lovering, R., Hanson, I.M., Borden, K.L., Martin, S., O'Reilly, N.J., Evan, G.I., Rahman, D., Pappin, D.J., Trowsdale, J., and Freemont, P.S. 1993. Identification and preliminary characterization of a protein motif related to the zinc finger. Proc. Natl. Acad. Sci. 90: 21122116. Mande, S.S., Sarfaty, S., Allen, M.D., Perham, R.N., and Hol, W.G. 1996. Proteinprotein interactions in the pyruvate dehydrogenase multienzyme complex: Dihydrolipoamide dehydrogenase complexed with the binding domain of dihydrolipoamide acetyltransferase. Structure 4: 277286.[Medline]
Marchler-Bauer, A., Anderson, J.B., DeWeese-Scott, C., Fedorova, N.D., Geer, L.Y., He, S., Hurwitz, D.I., Jackson, J.D., Jacobs, A.R., Lanczycki, C.J., et al. 2003. CDD: A curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31: 383387.
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., and Eisenberg, D. 1999a. Detecting protein function and proteinprotein interactions from genome sequences. Science 285: 751753. Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O., and Eisenberg, D. 1999b. Combined algorithm for genome-wide prediction of protein function. Nature 402: 8386.[CrossRef][Medline] Minieka, E. 1978. Path algorithms. In Optimization algorithms for networks and graphs, pp. 4184. Marcel Dekkar, New York.
Mott, R., Schultz, J., Bork, P., and Ponting, C.P. 2002. Predicting protein cellular localization using a domain projection method. Genome Res. 12: 11681174.
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., et al. 2003. The InterPro database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31: 315318. Muratani, M. and Tansey, W.P. 2003. How the ubiquitin-proteasome system controls transcription. Nat. Rev. Mol. Cell. Biol. 4: 192201.[CrossRef][Medline] Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline] Musco, G., Stier, G., Joseph, C., Castiglione Morelli, M.A., Nilges, M., Gibson, T.J., and Pastore, A. 1996. Three-dimensional structure and stability of the KH domain: Molecular insights into the fragile X syndrome. Cell 85: 237245.[CrossRef][Medline] Neuwald, A.F. and Landsman, D. 1997. GCN5-related histone N-acetyltransferases belong to a diverse superfamily that includes the yeast SPT10 protein. Trends Biochem. Sci. 22: 154155.[CrossRef][Medline] Newman, M.E.J., Strogatz, S.H., and Watts, D.J. 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. 64: 026118. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton, J.M. 1997. CATHA hierarchic classification of protein domain structures. Structure 5: 10931108.[Medline]
Page, R.D.M. 1996. TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Appl. Biosci. 12: 357358. Pao, G.M. and Saier, M.H.J. 1995. Response regulators of bacterial signal transduction systems: Selective domain shuffling during evolution. J. Mol. Evol. 40: 136154.[CrossRef][Medline] Patthy, L. 1999. Protein evolution, pp. 142183. Blackwell Science, Malden.
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. 1999. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. 96: 42854288. Ponting, C.P., Blake, D.J., Davies, K.E., Kendrick-Jones, J., and Winder, S.J. 1996. ZZ and TAZ: New putative zinc fingers in dystrophin and other proteins. Trends Biochem. Sci. 21: 1113.[CrossRef][Medline]
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., and Barabasi, A.-L. 2002. Hierarchical organization of modularity in metabolic networks. Science 297: 15511555. Reizer, J. and Saier, M.H.J. 1997. Modular multidomain phosphoryl transfer proteins of bacteria. Curr. Opin. Struct. Biol. 7: 407415.[CrossRef][Medline] Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. 2000. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9: 232241.[Abstract] Saier, M.H.J. and Reizer, J. 1994. The bacterial phosphotransferase system: New frontiers 30 years later. Mol. Microbiol. 13: 755764.[Medline] Saurin, A.J., Borden, K.L., Boddy, M.N., and Freemont, P.S. 1996. Does this have a familiar RING? Trends Biochem. Sci. 21: 208214.[CrossRef][Medline]
Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. 1998. SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. 95: 58575864.
Servant, F., Bru, C., Carrere, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn, D. 2002. ProDom: Automated clustering of homologous domains. Brief Bioinform. 3: 246251. Shakhnovich, B.E., Dokholyan, N.V., DeLisi, C., and Shakhnovich, E.I. 2003. Functional fingerprints of folds: Evidence for correlated structurefunction evolution. J. Mol. Biol. 326: 19.[CrossRef][Medline] Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31: 6468.[CrossRef][Medline]
Snel, B., Bork, P., and Huynen, M.A. 2002. The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. 99: 58905895.
Sofia, H.J., Chen, G., Hetzler, B.G., Reyes-Spindola, J.F., and Miller, N.E. 2001. Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: Functional characterization using new analysis and information visualization methods. Nucleic Acids Res. 29: 10971106.
Sriskanda, V., Moyer, R.W., and Shuman, S. 2001. NAD+-dependent DNA ligase encoded by a eukaryotic virus. J. Biol. Chem. 276: 3610036109.
Studholme, D.J. and Dixon, R. 2003. Domain architectures of Tatusov, R.L., Mushegian, A.R., Bork, P., Brown, N.P., Hayes, W.S., Borodovsky, M., Rudd, K.E., and Koonin, E.V. 1996. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr. Biol. 6: 279291.[CrossRef][Medline] Wagner, A. and Fell, D.A. 2001. The small world inside large metabolic networks. Proc. R Soc. Lond. B Biol. Sci. 268: 18031810.[Medline]
Wuchty, S. 2001. Scale-free behavior in protein domain networks. Mol. Biol. Evol. 18: 16941702.
Yaseen, N.R. and Blobel, G. 1999. Two distinct classes of Ran-binding sites on the nucleoporin Nup-358. Proc. Natl. Acad. Sci. 96: 55165521. WEB SITE REFERENCES
ftp://ftp.ncbi.nih.gov/; NCBI GenBank. http://ffas.ljcrf.edu/DomainGraph; CADO. http://genome.jgi-psf.org/ciona4/ciona4.download.ftp.html; Ciona intestinalis. http://genome.jgi-psf.org/fugu6/fugu6.download.ftp.html; Fugu rubripes sequence. http://www.geneontology.org/; GO. http://www.graphviz.org/; graphviz.
Received June 2, 2003;
accepted in revised format December 10, 2003.
This article has been cited by other articles:
|