|
|
|
|
Published online before print
August 9, 2006, 10.1101/gr.5235706 Genome Res. 16:1169-1181, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00 OPEN ACCESS ARTICLE
Methods Græmlin: General and robust alignment of multiple large interaction networks1Department of Computer Science, Stanford University, Stanford, California 94305, USA; 2Department of Developmental Biology, Stanford University, Stanford, California 94305, USA; 3Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
The recent proliferation of protein interaction networks has motivated research into network alignment: the cross-species comparison of conserved functional modules. Previous studies have laid the foundations for such comparisons and demonstrated their power on a select set of sparse interaction networks. Recently, however, new computational techniques have produced hundreds of predicted interaction networks with interconnection densities that push existing alignment algorithms to their limits. To find conserved functional modules in these new networks, we have developed Græmlin, the first algorithm capable of scalable multiple network alignment. Græmlin's explicit model of functional evolution allows both the generalization of existing alignment scoring schemes and the location of conserved network topologies other than protein complexes and metabolic pathways. To assess Græmlin's performance, we have developed the first quantitative benchmarks for network alignment, which allow comparisons of algorithms in terms of their ability to recapitulate the KEGG database of conserved functional modules. We find that Græmlin achieves substantial scalability gains over previous methods while improving sensitivity.
The publication of the second complete genome sequence in 1995 (Kaneko et al. 1995
Today, the most direct analog of the exponential growth in sequence data is the rise of large-scale protein interaction network data (Uetz et al. 2000
One promising way of answering such questions is through network alignment, a systems-biological analog of sequence alignment intended to identify conserved functional modules (Hartwell et al. 1999 In this paper we describe Græmlin, a novel network alignment framework that is fast, scalable, and capable of searching large sets of dense networks for conserved functional modules. Græmlin's probabilistic formulation of the topology-matching problem eliminates earlier restrictions on the possible architecture of conserved modules. Most important, Græmlin is the first program capable of multiple alignment of an arbitrary number of networks.
To assess Græmlin's ability to find conserved functional modules, we have performed the first quantitative comparison of network aligners. Using data sets containing known biological modules as a benchmark (Ashburner et al. 2000 Græmlin is available through a Web interface located at http://graemlin.stanford.edu, where users can search for conserved functional modules within a large database of microbial networks. Source code is also available under the GNU Public License.
Græmlin is a network alignment algorithm capable of searching large sets of dense interaction networks for evolutionarily conserved functional modules, which are groups of homologous proteins with conserved pairwise interactions. Græmlin supports both global and local search; it can be used either to generate an exhaustive list of conserved modules in a set of networks (network-to-network alignment) or to find matches to a particular module within a database of interaction networks (query-to-network alignment). Depending on the context of a study, one may wish to find functional modules that are present within all species or simply those that are enriched within a particular clade. Græmlin enables both kinds of comparative analysis, as it can rapidly search a large number (N > 3) of interaction networks to find functional modules that are significantly conserved in two or more species.
The efficient performance of Græmlin is due to the use of several strategies common in sequence alignment (Batzoglou 2005 Below we outline our definition of a network alignment, the scoring model used by Græmlin, and its algorithm for finding high-scoring pairwise and multiple alignments.
Definition of an alignment We represent each input network as a weighted graph G i = (V i ,E i), where nodes correspond to proteins and each weighted edge specifies the probability that two proteins interact. We define a network alignment as a set of subgraphs chosen from the interaction networks of different species, together with a mapping between corresponding, or aligned, proteins. To uniquely specify an alignment, we require that the mapping be transitive; that is, if protein A is aligned to proteins B and C, then protein B must also be aligned to protein C. Mathematically, this means that the mapping is an equivalence relation; consequently, the groups of aligned proteins are disjoint, and we refer to them as equivalence classes for this reason.
We also require that all aligned proteins be homologous. Therefore, proteins in the same equivalence class are in general members of the same protein family (Andreeva et al. 2004 This definition affords important advantages. Because the members of a protein family descend from a common ancestor, we can reconstruct the evolutionary events leading from each ancestral protein to its extant descendants. By combining this with a reconstruction of the evolutionary history of each pairwise interaction, we can interpret each network alignment as a hypothesis about the evolution of a conserved ancestral module. Intuitively, network alignments should receive high scores if their evolutionary dynamics resemble those of known, conserved functional modules rather than those of random collections of proteins. With this definition, there are two core problems in network alignment. First, we must devise a scoring framework that captures the knowledge we have about module evolution. Then, we must find a way to rapidly identify high-scoring alignments meaning conserved functional modulesfrom among the exponentially large set of possible alignments. We address each problem in turn.
Scoring of an alignment
Græmlin individually scores each equivalence class and each edge of an alignment. To score equivalence classes, it uses a straightforward scheme that reconstructs the most parsimonious ancestral history of an equivalence class, based on five types of evolutionary events: protein sequence mutations, protein insertions and deletions, protein duplications, and protein divergences; a protein divergence occurs when a paralogous protein loses its function and is the inverse of a duplication. The models
To determine the probabilities for sequence mutations, Græmlin uses weighted sum-of-pairs scoring (Altschul et al. 1989
As with equivalence classes, we define edge scores as the log-ratio of two probabilities: Each edge e is assigned a score
The random model
To address these issues, Græmlin uses a novel scoring scheme that allows a user to specify the desired ancestral topology; this generalizes previous edge-scoring approaches (Sharan et al. 2005b
Figure 1C shows three examples of an ESM, as well as the functions used by each to score the edges in a sample alignment; these include two special cases, pathways and multiprotein complexes, that have been the subjects of past studies (Kelley et al. 2003
The third type of ESM we consider is automatically generated when a user searches a large network for matches to a small query network. We refer to this as a Module ESM because the query network will often consist of a hypothetical or known biological module. In this case, Græmlin creates a label in the ESM for each node in the query and generates the alignment distribution based on the edges that are present or absent in the query. For each cell in the ESM, it defines the distribution based on the weight of the edge between the two corresponding proteins in the query. When aligning a query to multiple species, Græmlin refines the ESM as more species are added to the alignment; in this case, rather than creating a label for each protein, it creates a label for each equivalence class and uses kernel density estimation (Duda et al. 2000
Alignment algorithm Figure 3 shows an outline of the Græmlin algorithm, including the methodology it uses for pairwise and multiple alignment.
Pairwise alignment To search for high-scoring alignments between a pair of networks efficiently, Græmlin first generates a set of "seeds," which it uses to restrict the size of the search space. We refer to the structures used for seed generation as "d-clusters," which consist of d proteins that are close together in a network and are analogs of k-mers in seeded local alignment search. For each network, Græmlin constructs one d-cluster for every node by finding the d 1 nearest neighbors of that node, where the length of an edge is the negative logarithm of its weight. Græmlin compares two d-clusters D 1 and D 2 by mapping a subset of nodes in D 1 to a subset of nodes in D 2 and reporting a score equal to the sum of all pairwise scores induced by the mapping; the score of two d-clusters is the highest-scoring such mapping. Græmlin identifies pairs of d-clusters, one from each network, that score higher than a threshold T and uses these as seeds. Figure 3B shows a sample set of d-clusters generated from two networks, as well as a high-scoring pair.
The benefits of the d-cluster seeding technique are severalfold. First, Græmlin can compare d-clusters rapidly, since the comparison neglects edge scores. Second, the parameters d and T allow for a speed-sensitivity trade-off. As an example, a lower value of T will achieve higher sensitivity but require increased running time; this adjustable trade-off is not present in previous techniques (Koyuturk et al. 2005
Given two networks, Græmlin enumerates the set of seeds between them and tries to transform each, in turn, into a high-scoring alignment. In a manner similar to that used in existing methods (Koyuturk et al. 2005
Multiple alignment To enable comparisons of unaligned parts of a network to more distant species as it traverses the phylogenetic tree, rather than construct a network only from the high-scoring alignments, Græmlin also maintains two additional networks composed of the unaligned nodes from the two original networks. For example, in Figure 3D, Græmlin constructs three networks from the original two that it aligns; as a result, in Figure 3E, the parent of these two networks contains one network for each possible subset of its children. The end result is that after completion of the entire multiple alignment, Græmlin produces multiple alignments of all possible subsets of species. It avoids an exponential running time in practice because after each pairwise alignment, the networks it constructs have small overlaps. The total number of nodes in all networks therefore does not increase significantly.
We measured the performance of Græmlin by assessing its ability to align known biologically functional modules. We compared it to two alignment algorithms, NetworkBLAST (Sharan et al. 2005b
We assessed the sensitivity of each method by counting the number of KEGG pathways that it aligned between two species (Kanehisa and Goto 2000 We did not use all KEGG pathways for these comparisons, as SRINI does not accurately recapitulate each one. We therefore curated the set of KEGG pathways by ignoring all that did not have a connected component of size at least three in each of the assessed networks. To avoid biasing results toward a specific algorithm, we did not further curate the set by examining the conservation of the pathways. We refer to each pathway in the curated set as an "alignable" KEGG pathway.
As one measure of specificity, we computed the number of "enriched" alignments. To calculate enrichment, we first assigned to each protein all of its annotations from level eight or deeper in the GO hierarchy (Ashburner et al. 2000 As a second measure of specificity, we counted the fraction of nodes that have KEGG orthologs but were aligned to any nodes other than their KEGG orthologs. Both this measure and calculations of enrichment are imperfect measures of specificity, but they work as rough guides to ensure that an aligner is not completely sacrificing specificity to increase sensitivity. We also assessed multiple alignment algorithms using these metrics. When evaluating the sensitivity metric, we identified a KEGG pathway as hit if a method aligned at least three proteins in each species to their correct counterparts in all other species. We measured specificity by computing enrichments and counting misaligned nodes exactly as in the case of pairwise alignments.
To our knowledge, the only other quantitative tests of alignment quality measured the accuracy of predictions of interactions and protein function in eukaryotic networks (Kelley et al. 2003
One issue with the networks constructed by SRINI is that they are complete; that is, SRINI assigns an interaction probability to every pair of proteins. Because these networks are intractably large for any existing algorithm, we thresholded them by removing low-weight edges before running our tests. We generated two sets of networks: one with an edge threshold of P
We did not test on the eukaryotic networks because our sensitivity metric is inapplicable on them; as Table 1 shows, they recapitulate essentially no KEGG pathways. While in principle one could define other sensitivity metrics, the greater quality of the SRINI networks provides a much simpler and straightforward test scenario. In addition, the SRINI networks are much larger than the eukaryotic networks and consequently provide a better test of the scalability of an algorithm.
For all tests and all alignment algorithms, we considered alignments with P-values <0.01 as high-scoring; for each test case, we calculated P-values by sampling from a large number of runs on random data sets. We constructed the random data sets using techniques similar to those used in previous approaches by redistributing the edges of a real network while maintaining the original node degree distribution (Kelley et al. 2003
Network-to-network alignment
With all three methods, we aligned the networks of Escherichia coli K12 and Caulobacter crescentus; E. coli and Mycoplasma tuberculosis H37Rv; E. coli and Vibrio cholerae; and E. coli and Streptomyces coelicolor. Owing to its excessive running time on networks with the lower edge threshold, we report results for NetworkBLAST only on networks with a threshold of 0.5. In addition, MaWISh cannot perform alignments on S. coelicolor because for each input species it requires COG data (Tatusov et al. 1997
These results show that there is a legitimate reason for using networks with a lower edge threshold, since the sensitivities of both MaWISh and Græmlin drop dramatically on networks with a higher threshold without a corresponding increase in specificity. Consequently, the ability of both methods to scale to data sets of this size is important. When searching for highly connected components, or multiprotein complexes, Græmlin is significantly more sensitive than the other two methods, both with respect to the number of KEGGs hit and with respect to the average coverage of a KEGG, without sacrificing specificity. It also aligns significantly more nodes overall than the other two methods without misaligning a higher number of proteins.
When searching for pathways, Græmlin is more sensitive than NetworkBLAST, although both methods are not as sensitive as those that search for multiprotein complexes. This is predominantly because a pathway alignment must be much larger than an alignment of multiprotein complexes to be statistically significant. For example, if an alignable KEGG pathway contains a clique of four proteins, it will score highly as both a multiprotein complex and a pathway. However, because four-node cliques are much less likely to occur in unrelated networks than four-node pathways, the multiprotein complex alignment will be more statistically significant. As most alignable KEGGs appear as cliques in the SRINI microbial networks, searches for highly connected components are consequently more sensitive than pathway searches with respect to the metric of hitting KEGG pathways. However, past studies have shown that pathway searches do have uses beyond identifying conserved modules (Kelley et al. 2003 Of all the test cases, Græmlin and NetworkBLAST take the longest to run on E. coli versus S. coelicolor, primarily because of the size of the S. coelicolor network and the large number of homologs between these species. In this case, Græmlin can sacrifice sensitivity for speed by adjusting the parameters it uses for d-clusters. Figure 5A demonstrates the impact of T on running time and sensitivity. Running with its default parameters (d = 4, T = 7) on networks thresholded at 0.25, it finds 25 KEGG pathways in 1224 sec, but a slight increase in T yields 21 KEGGs in only 339 sec.
Multiple network alignment We also performed complete three-way alignments of (E. coli, C. crescentus, V. cholerae) and (E. coli, Campylobacter jejuni, Helicobacter pylori 26,695) using Græmlin and NetworkBLAST. Table 4 shows the results of these tests.
On the networks thresholded at 0.5, NetworkBLAST hits slightly more KEGGs than Græmlin; however, Græmlin covers a much higher fraction of each KEGG and also misaligns fewer nodes. In addition, Græmlin is orders of magnitude faster than NetworkBLAST; on one of our test cases, the latter did not complete after running for more than 2 mo. Because Græmlin scales effectively to large network sizes, it can efficiently multiply align networks with a low edge threshold. This is important because the networks with a low edge threshold contain many more conserved KEGGs than the high-thresholded networks, as evinced by the dramatically increased sensitivity of Græmlin on this data set. To further stress the scalability of Græmlin with respect to the number of species in a multiple alignment, Figure 5B shows the running times of Græmlin as it includes more species in the alignment. The roughly linear relation of running time to the number of species demonstrates the benefit of the progressive alignment technique.
Query-to-network alignment Our final set of tests assessed the performance of each method on query-to-network search; we searched E. coli against C. crescentus, C. crescentus against E. coli, E. coli against M. tuberculosis, and M. tuberculosis against E. coli. Table 5 shows the results of these tests; more detailed results are included in the Supplemental material.
For this test, sensitivity and specificity results are similar to those in the case of complete network alignment. One major difference is the relative running times of Græmlin and MaWISh; they are comparable when the database is C. crescentus, the smallest network, but Græmlin is much faster on the other networks. This is because Græmlin can amortize its indexing step over all queries and shows Græmlin's strength as a database search tool. In this test case, Græmlin performs comparably when using both the Pathway ESM and the Complex ESM. The Module ESM does not offer dramatic improvements over the other two ESMs, but it does give slightly higher KEGG coverage and misalign slightly fewer nodes. This is because most KEGGs that are alignable are highly connected, making the Complex ESM close to optimal under our metrics.
Biological applications
Functional annotation
Figure 6 shows an example of functional annotation through both pairwise and multiple network alignment. The pairwise alignment between E. coli and C. crescentus (Fig. 6A) shows a conserved DNA replication module. This includes the components of the primosome (dnaB, dnaA, gyrA, gyrB), the subunits of topoisomerase IV (parE, parC), and the
Two aspects of this alignment are of interest. First, we see that the recF repair protein is linked to DNA replication in both organisms. Although this is not the primary annotation of the protein, a link to DNA replication was, in fact, found fairly recently (Kogoma 1997 The multiple alignment diagram in Figure 6B extends the pairwise alignment to a multiple alignment of E. coli, S. typhimurium, V. cholerae, C. crescentus, C. jejuni, H. pylori, M. tuberculosis, S. pneumoniae, and Synechocystis. While some proteins from the pairwise alignment are absent, the core remains the same. The presence of the trmE protein in all nine species provides a compelling argument in favor of its role in DNA replication. This multiple alignment also offers another opportunity for landmark extension; the 60-kDa inner membrane protein yidC is present in all nine species and is highly connected to the other proteins in the alignment. Although known to be involved in protein secretion, the multiple alignment indicates that it is also likely to be linked to DNA replication.
Figure 7 shows an example of a 10-way multiple alignment relevant to bacterial cell division and cell envelope biogenesis. The alignment includes ftsZ, ftsW, and ftsI, well-known proteins involved in cell division, along with several other proteins from the mur and mra families known to be involved in peptidoglycan biogenesis. Many of these proteins are in contiguous operons in some species (Hara et al. 1997
Module identification While support for functional annotation of proteins is currently the primary application of network alignment, the availability of numerous interaction networks may provide a resource for the study of functional modules. For example, Figure 8 shows that in
E. coli, S. typhimurium, V. cholerae, C. jejuni, H. pylori, and C. crescentus, several proteins from the exb/tol family of biopolymer transporters are predicted to interact with a set of proteins involved in DNA recombination and integration. While the cooperation of these proteins is somewhat weak in any given species, the sum total of interactions in six distinct species suggests that DNA itself is the biopolymer transported through the tol channels before integrating into the chromosome. This alignment therefore may represent a part of a conserved module determining whether a cell is naturally competent for transformation (Dubnau 1999
Interaction networks will soon constitute a vast store of data, as exemplified by the upcoming availability of hundreds of microbial interaction networks (Srinivasan et al. 2006 As our test results show, Græmlin is a promising method for network alignment. It scales efficiently to large inputs, particularly when searching databases, and is the first method capable of performing multiple alignments of an arbitrary number of networks. In addition, Græmlin uses a novel, flexible scoring scheme that can incorporate biologically trained parameters, and introduces the Module ESM framework, which offers the potential to search for subnetworks of arbitrary structure. In contrast, existing methods are limited to searching for multiprotein complexes, which are represented as fully connected graphs, or pathways, which are represented as ordered lists of proteins. As biological networks become less noisy and more complete, the Module ESM framework will allow fine-grained searches and analyses, and it will also offer the potential to refine models of known biological modules by quantifying the level of conservation of individual parts and interactions.
Our analyses of four conserved subnetworks accentuates several applications of network alignment, one of which is the analysis of proteins that lack functional annotation. Network alignment can do this by using conserved interactions and sequence to align an unknown protein to one with known function in another species. Alternatively, even if a protein has no homolog of known function, its occurrence as part of an alignment near well-known "landmark" proteins permits inferences about its function (Srinivasan et al. 2006
As networks improve in quality and completeness, attention will focus on the functional annotation of modules in addition to proteins. Network alignment will play a key role by discovering groups of proteins that interact in more than one species, and it will thus offer additional evidence that such proteins work together to perform a common cellular function. As more networks become available, query-to-database network alignment will fulfill a similar role for modules as does BLAST for proteins (Altschul et al. 1997
Although multiple network alignment is still in its infancy, it offers the potential to study modules in the context of functional evolution. Græmlin is a first step in the development of tools that will permit such studies, as it is capable of aligning many networks simultaneously and uses an evolutionarily based scoring scheme. Further algorithmic development will undoubtedly lead to data-motivated population genetic models for network evolution (McAdams et al. 2004
Although there is an extensive literature (Conte et al. 2004 With the impressive recent advances in sequencing, high-throughput techniques for gathering biological data, and computational methodologies for integrating such information into networks of protein interactions, comparisons of networks should become an increasingly important methodology for the molecular biologist. As our results show, Græmlin is a general and systematic methodology for comparing an arbitrary number of large networks. Many important challenges remain; for instance, the ability to reason about directed edges and align different types of interactions, such as physical contact and gene regulation, will allow more detailed analyses of biochemical pathways and regulatory cascades. On a more practical note, the ability to automatically identify interesting alignments for further study will be an important research topic unto itself.
J.F. was supported in part by a Stanford Graduate Fellowship; A.N. was supported in part by NLM training grant LM-07033 and NIH grant 5-T15-LM007033. J.F., A.N., and B.S.S. were funded by NSF grant EF-0312459, NIH grant UO1-HG003162, the NSF CAREER Award, and the Alfred P. Sloan Fellowship. B.S.S. was supported in part by a Department of Defense National Defense Science and Engineering Graduate Fellowship through the Army Research Office, and B.S.S. and H.H.M. were supported by NIH grant 1 R24 GM073011-01 and DOE Office of Science grant DEFG02-01ER63219. We thank Andreas Sundquist for helpful comments on the manuscript.
4 These authors contributed equally to this work.
E-mail serafim{at}cs.stanford.edu; fax (650) 725-1449. Supplemental material is available online at www.genome.org. Græmlin is available at http://graemlin.stanford.edu, and source code is available under the GNU Public License. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5235706. Freely available online through the Genome Research Open Access option.
Alexandersson M., Cawley S., Pachter L. 2003. SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13: 496502. Altschul S.F., Carroll R.J., Lipman D.J. 1989. Weights for data related by a tree. J. Mol. Biol. 207: 647653.[CrossRef][Medline] Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403410.[CrossRef][Medline] Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 33893402. Andreeva A., Howorth D., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin A.G. 2004. SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Res. 32: D226D229. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T. et al. 2000. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 2529.[CrossRef][Medline] Bafna V. and Huson D.H. 2000. The conserved exon method for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8: 312.[Medline] Barabasi A.-L. and Oltvai Z.N. 2004. Network biology: Understanding the cell's functional organization. Nat. Rev. Genet. 5: 101113.[CrossRef][Medline] Batzoglou S. 2005. The many faces of sequence alignment. Brief. Bioinform. 6: 622. Batzoglou S., Pachter L., Mesirov J.P., Berger B., Lander E.S. 2000. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 10: 950958. Bejerano G., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D. 2004. Ultraconserved elements in the human genome. Science 304: 13211325. Blanchette M., Kent W.J., Riemer C., Elnitski L., Smit A.F.A., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D. et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708715. Bloom J.D. and Adami C. 2003. Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in proteinprotein interactions data sets. BMC Evol. Biol. 3:21. Boyle E.I., Weng S., Gollub J., Jin H., Botstein D., Cherry J.M., Sherlock G. 2004. GO::TermFinderOpen source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20: 37103715. Bray N. and Pachter L. 2004. MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14: 693699. Brudno M., Do C.B., Cooper G.M., Kim M.F., Davydov E., Green E.D., Sidow A., Batzoglou S. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721731. Chiaromonte F., Yap V.B., Miller W. 2002. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 115126. Christie K.R., Weng S., Balakrishnan R., Costanzo M.C., Dolinski K., Dwight S.S., Engel S.R., Feierbach B., Fisk D.G., Hirschman J.E. et al. 2004. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32: D311D314. Conte D., Foggia P., Sansone C., Vento M. 2004. Thirty years of graph matching in pattern recognition. IJPRAI 18: 265298. Cooper G.M., Brudno M., Stone E.A., Dubchak I., Batzoglou S., Sidow A. 2004. Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14: 539548. Cooper G.M., Stone E.A., Asimenos G., Green E.D., Batzoglou S., Sidow A. 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15: 901913. Dandekar T., Schuster S., Snel B., Huynen M., Bork P. 1999. Pathway alignment: Application to the comparative analysis of glycolytic enzymes. Biochem. J. 343: 115124. Deeds E.J., Ashenberg O., Shakhnovich E.I. 2006. From The Cover: A simple physical model for scaling in proteinprotein interaction networks. Proc. Natl. Acad. Sci. 103: 311316. Drummond D.A., Bloom J.D., Adami C., Wilke C.O., Arnold F.H. 2005. Why highly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. 102: 1433814343. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||