|
|
|
|
Published online before print
September 25, 2007, 10.1101/gr.6678707 Genome Res. 17:1626-1633, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter Systematic condition-dependent annotation of metabolic genes1 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel; 2 Department of Bioengineering, University of California, San Diego, California 92093-0412, USA; 3 School of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel
The task of deriving a functional annotation for genes is complex as their involvement in various processes depends on multiple factors such as environmental conditions and genetic backup mechanisms. This study employs a large-scale model of the metabolism of Saccharomyces cerevisiae to investigate the function of yeast genes and derive a condition-dependent annotation (CDA) for their involvement in major metabolic processes under various genetic and environmental conditions. The resulting CDA is validated on a large scale and is shown to be superior to the corresponding Gene Ontology (GO) annotation, by showing that genes annotated with the same CDA term tend to be more coherently conserved in evolution and display greater expression coherency than those annotated with the same GO term. The CDA gives rise to new kinds of functional condition-dependent metabolic pathways, some of which are described and further examined via substrate auxotrophy measurements of knocked-out strains. The CDA presented is likely to serve as a new reference source for metabolic gene annotation.
In recent years, high-throughput techniques have provided a wealth of data on the expression and activity of genes and proteins. The task of inferring the involvement of gene products in various cellular processes, commonly referred to as functional annotation, is a major goal of current biological research. It involves the definition of a set of biological functions, termed "ontology," and the association of the gene products with ontology terms. The most comprehensive and commonly used ontology is the Gene Ontology (GO), consisting of 20,000 terms and numerous associated gene products (Ashburner et al. 2000
The involvement of a gene product in a specific process depends on multiple factors, such as the environmental conditions (Giaever et al. 2002 The annotation of metabolic genes is a particularly difficult task, arising from the high level of dependency between the function of individual metabolic enzymes that form the overall complex network of biochemical reactions. Yet, inspecting the sources of GO annotation for metabolic genes reveals that most annotations (56%) are based solely on the involvement of genes in classical biochemical pathways (Traceable Author Statement evidence code; Supplemental Fig. 1). Twenty-six percent of the annotations arise from gene knockout experiments measuring various metabolic phenotypes (Inferred from Mutant Phenotype evidence code). As no single "normal" condition is enforced, these experiments span a large range of environmental conditions and genetic backgrounds, whose identities are not reflected in the annotation. More complex phenotypic experiments involving the knockout of multiple genes to identify genetic backup mechanisms account for only 3% of the annotations (Inferred from Genetic Interaction evidence code). Notably, such experiments that determine epistatic interactions between genes are still difficult to conduct on a genome-wide scale especially in multiple growth conditions.
In this study, we employ a genome-scale model of cellular metabolism to investigate the function of genes under multiple environmental and genetic conditions, deriving a condition-dependent annotation (CDA) of metabolic genes. The CDA associates genes with terms representing metabolic processes under multiple conditions. Specifically, we employ constraint-based modeling that uses stoichiometric, thermodynamic, and flux capacity constraints to predict a space of possible flux distributions attainable by the metabolic network under various environmental and genetic conditions. Flux Balance Analysis (FBA) is a specific constraint-based optimization method that is commonly used to find flux distributions that minimize or maximize a defined cellular objective such as its biomass production rate. Here we employ numerous optimization criteria to explore the organisms ability to synthesize metabolites that contribute to biomass formation under multiple conditions. Constraints-based models have been successfully used previously for predicting various metabolic phenotypes such as growth, uptake rates, by-product secretion, knockout lethality, and pathway activity across different conditions (Edwards and Palsson 2000 We derive a CDA for metabolic genes of the yeast S. cerevisiae. The resulting CDA spans two dimensions representing growth media (minimal/rich) and the availability of oxygen, and accounts for genetic backups in the form of isozymes and alternative pathways. The CDA obtained is compared with the standard GO annotation, generating novel annotation predictions, which are validated in a large-scale manner using gene conservation and expression data. To gain insight on the dynamic metabolic mechanisms underlying the annotation, we derive functional pathways, which are condition-dependent, network-based representations of the biosynthetic processes. We then conduct growth phenotyping of single and double gene deletion strains in auxotrophic growth conditions to examine these novel functional pathways.
Deriving a condition-dependent annotation of yeast metabolism We have used a genome-scale, metabolic network model of the yeast S. cerevisiae (Duarte et al. 2004
We define a gene's multifunctionality level as the number of processes it contributes to in a given condition. The distribution of the multifunctionality levels of all genes under single-knockout conditions exhibits a bimodal shape that peaks at levels 1 and 38. That is, most genes are annotated as being involved either in a small number of processes or in almost all processes (Supplemental Fig. 2). A similar bimodal distribution of environmental specificity of predicted synthetic lethal interactions was previously observed (Harrison et al. 2007 Examining the annotation within the different CDA conditions reveals the importance of each annotation dimension (Table 1). The annotation obtained with single knockout simulations in rich media significantly varies between aerobic and anaerobic conditions, with the aerobic condition providing 85% of the total annotations (obtained in either aerobic or anaerobic condition), and the anaerobic condition providing only 62%. Similarly, the dimension representing growth media is important for single-knockout experiments in anaerobic conditions, as we get only 87% and 60% of the annotations when considering either poor or rich media, respectively. Interestingly, single knockouts in aerobic conditions, which form the basis for much of the existing GO annotations for yeast, are somewhat insensitive to the specific choice of growth medium, as both poor and rich media provide >88% of the total annotations in these conditions. The double-knockout analysis more than doubled the number of functional annotations and exhibits even higher variation between different conditions. These results clearly show the importance of a CDA that considers the involvement of genes in various processes, under multiple environmental conditions and genetic backup levels.
The similarity and differences between CDA and GO Gene Ontology contains specific terms representing the process of synthesizing each of the 38 essential biomass compounds that are required for growth according to the metabolic network model. Yeast GO annotation consists of 361 associations between 199 metabolic genes and these ontology terms (Methods). The overlap between GO and the CDA, when considering all conditions, is highly significant with 179 common annotations (hyper-geometric P-value <1 x 10–300). Focusing on specific CDA conditions, we find that 60% of the CDA annotations in single-knockout aerobic conditions appear in GO, covering 33% of the latter annotations. In the single-knockout anaerobic conditions (which are less common within the experiments underlying GO) only 49% of the annotations appear also in GO, covering 19% of the latter. In the double-knockout conditions, <29% of annotations appear also in GO, suggesting that double-knockout experiments would significantly enrich functional annotations by revealing functional contributions masked by genetic backup mechanisms. An inspection of GO annotations that are not included in the CDA shows that in some cases they were identified in experiments conducted under growth media other than the poor or rich media considered here, or with experiments involving high-order knockouts (e.g., triple or quadruple knockouts) (Johnson et al. 1994 The number of novel annotations varies across the different metabolic processes and the different dimensions of the annotation (Supplemental Table 2; Supplemental Fig. 3). For example, CDA extends the current GO annotation for amino acid biosynthetic processes by 83% under anaerobic conditions, while extending it by only 60% in aerobic conditions. Examining the known GO annotation of the novel CDA predictions, we find that 43% of the novel CDA predictions have a corresponding GO annotation within a category of closely related terms (Fig. 2). Almost all of these novel predictions (96%) are within the amino acids category (Supplemental Fig. 4). For example, several genes annotated in GO as involved in methionine biosynthesis are also annotated as involved in cysteine and isoleucine biosynthesis in the CDA. Other genes are annotated very differently than in GO. For example, we find several genes annotated in GO as involved in nucleotide biosynthesis and are predicted to be involved in the production of the amino acids histidine, cysteine, and methionine.
A network representation of the CDA under single knockout, poor media, aerobic and anaerobic conditions, and its comparison with GO is shown in Figure 3. The network is clustered with distinct sets of genes annotated in each category. In the lipid biosynthesis cluster, we see that many known GO annotations for ergosterol biosynthesis are found in the CDA only in aerobic conditions. These condition-specific annotations reflect the known dependency of the ergosterol biosynthesis pathway on the availability of oxygen (Deytieux et al. 2005
Conservation coherency, expression coherency, multifunctionality, and evolutionary rate of CDA vs. GO annotated genes Genes involved in the same biological processes have been previously shown to be coherently conserved in evolution (Pellegrini et al. 1999
We next compared the expression coherency of CDA and GO annotated genes. Expression coherency is defined as the mean similarity in expression patterns across different conditions between genes annotated with the same term, with a significant P-value computed by comparison to the expression coherency of a random annotation (Methods). CDA has a significantly higher expression coherency score than that of GO under all single-knockout conditions (Fig. 4B; Supplemental Table 5). CDA obtained with double knockouts is significantly coherently expressed, but to a lower extent (Supplemental Table 6). This lower expression coherency score is partially attributed to the availability of isozymes that tend to have anti-correlated expression patterns (Ihmels et al. 2004
Previously it was shown that gene multifunctionality level (as reflected by GO) is correlated with its degree of pleiotropy, the latter measured by the extent to which its deletion affects survival under multiple different environmental conditions (He and Zhang 2006
CDA-derived functional pathways The functional pathway for alanine biosynthesis under poor media for single and double, aerobic and anaerobic conditions is shown in Figure 5A. Using single knockouts, only the gene ALT2 (a putative cytoplasmic alanine transaminase) is predicted to contribute to alanine biosynthesis in both aerobic and anaerobic conditions. Under aerobic conditions, six additional genes belonging to the tryptophan biosynthetic pathway and the kynurenine pathway for tryptophan degradation are predicted to contribute to alanine biosynthesis based on the double-knockout analysis. Under anaerobic conditions their contribution vanishes due to oxygen dependence of the tryptophan degradation pathway. To identify potential additional alanine biosynthesis pathways, we tested whether the ALT2 deletion strain is an alanine auxotroph in anaerobic conditions (data not shown). The deletion strain showed no growth defect in these conditions, suggesting the existence of an additional alternative pathway. The most likely backup for ALT2 would be provided by ALT1, a mitochondrial isozyme of ALT2. ALT1 was previously considered to be noncontributing to alanine production in the cytoplasm, due to the lack of a known mitochondrial alanine transporter. However, our experimental results suggest that there is an uncharacterized alanine transporter that allows mitochondrially synthesized alanine to be utilized outside the mitochondria in an ALT2 deletion strain.
The functional pathway of proline biosynthesis consists of all the genes annotated in GO as being involved in proline biosynthesis as well as several novel CDA predictions (Fig. 5B). Proline is synthesized from L-glutamate gamma-semialdehyde, which in turn can be synthesized either through the standard proline biosynthesis pathway involving PRO1 and PRO2 gene products, or through the arginine catabolic pathway involving the CAR1 and CAR2 gene products. The arginine catabolic pathway is inactive when a preferred nitrogen source (e.g., ammonium sulphate) is available, but in the absence of preferred nitrogen sources it allows yeast to utilize arginine as a nitrogen source in aerobic conditions (Dubois et al. 1978
Utilizing a metabolic network model of S. cerevisiae, we derive a systematic condition-dependent annotation of yeast metabolic genes, associating genes with major metabolic processes under various genetic and environmental conditions. The resulting CDA, which promises to serve as a new reference source for metabolic gene annotation, is highly dependent on the growth media (poor or rich), the availability of oxygen, and whether single or double knockouts are employed. Under these conditions, the CDA maps annotations for 233 genes that are specifically contributing to some biosynthetic processes, and additional annotations for 62 genes that contribute to many processes and whose contribution is hence considered nonspecific (out of a total of 750 genes in the model). Considering that the annotation of many genes in the CDA was determined based on specific conditions (Table 1), we expect that extending the CDA to a variety of growth media and employing high-order knockouts (Deutscher et al. 2006 The common view of metabolic pathways as static, distinct entities with a defined function may be misleading, considering the interconnectivity of different pathways through shared cofactors and metabolites. Here, we extend the classical notion of metabolic pathways into functional pathways, which are condition-dependent, network-based representations of biosynthetic processes. We examine in detail the functional pathways for alanine, glutamine, and proline biosynthesis, and demonstrate their condition specificity. Experimental growth phenotyping of single and double gene deletion strains has verified the presence of the predicted, previously uncharacterized full and partial backup mechanisms in these pathways.
The CDA is inspired but different from the multidimensional genome annotation framework of Reed et al. (2006)
Major efforts have been recently made to identify genetic interactions on a large scale, both experimentally (Tong et al. 2004
Metabolic network analysis The metabolic network model of Duarte et al. (2004)
Flux Balance Analysis (FBA) was used to compute the production rate of each biomass precursor under various growth media and genetic environments. To simulate the production of a given metabolite, a new exchange reaction representing the secretion of this metabolite is added to the model, and the flux through this reaction is maximized. For the single-knockout annotation we systematically knocked out each gene and considered it as contributing to the production of a certain metabolite if its knockout reduced the metabolites production rate in >20%. For the double-knockout annotation, we knocked out all gene pairs whose genes were noncontributing in the single-knockout experiments and considered a pair as contributing if the joint knockout reduced the metabolites production rate in >20% (a similar threshold was used in Deutscher et al. 2006
GO annotation extraction
Phylogenetic profiling analysis
Expression coherency analysis
Functional pathways construction A functional pathway describing the production of a target compound under a given condition is derived via the following algorithm:
Experimental procedures
We thank Kai Tan for providing the phylogentic profiles, Pep Charusanti for help with verifying knockout strains, and David Deutscher and Elhanan Borenstein for very helpful discussions. T.S. is supported by an Eshkol Fellowship from the Israeli Ministry of Science. R.S. was supported by an Alon Fellowship. E.R.s research is supported by the Yishayahu Horowitz Center for Complexity Science, the Israeli Science Foundation (ISF), the German-Israeli Foundation for scientific research and development (GIF), and the Tauber fund. M.H. and V.P. are supported by the National Institutes of Health (RO1 GM071808) and the National Science Foundation (BES-0331342).
4 Corresponding authors. E-mail shlomito{at}post.tau.ac.il; fax +972-3-640-9357.
E-mail ruppin{at}post.tau.ac.il; fax +972-3-640-9357. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6678707
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 25–29.[CrossRef][Medline] Ball, C.A., Awad, I.A., Demeter, J., Gollub, J., Hebert, J.M., Hernandez-Boussard, T., Jin, H., Matese, J.C., Nitzberg, M., Wymore, F., et al. 2005. The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res. 33: D580–D582. doi: 10.1093/nar/gki006. Baudin, A., Ozier-Kalogeropoulos, O., Denouel, A., Lacroute, F., and Cullin, C. 1993. A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acids Res. 21: 3329–3330. doi: 10.1093/nar/21.14.3329. Brown, J.A., Sherlock, G., Myers, C.L., Burrows, N.M., Deng, C., Wu, H.I., McCann, K.E., Troyanskaya, O.G., and Brown, J.M. 2006. Global analysis of gene function in yeast by quantitative phenotypic profiling. Mol. Syst. Biol. 2: doi: 10.1038/msb4100043. Cherest, H., Thomas, D., and Surdin-Kerjan, Y. 1993. Cysteine biosynthesis in Saccharomyces cerevisiae occurs through the transsulfuration pathway which has been built up by enzyme recruitment. J. Bacteriol. 175: 5366–5374. Christianson, T.W., Sikorski, R.S., Dante, M., Shero, J.H., and Hieter, P. 1992. Multifunctional yeast high-copy-number shuttle vectors. Gene 110: 119–122.[CrossRef][Medline] Deutscher, D., Meilijson, I., Kupiec, M., and Ruppin, E. 2006. Multiple knockouts analysis of genetic robustness in the yeast metabolic metwork. Nat. Genet. 38: 993–998.[CrossRef][Medline] Deytieux, C., Mussard, L., Biron, M.J., and Salmon, J.M. 2005. Fine measurement of ergosterol requirements for growth of Saccharomyces cerevisiae during alcoholic fermentation. Appl. Microbiol. Biotechnol. 68: 266–271.[CrossRef][Medline] Dolan, M.E., Ni, L., Camon, E., and Blake, J.A. 2005. A procedure for assessing GO annotation consistency. Bioinformatics 21 (Suppl. 1): i136–i143. Duarte, N.C., Herrgard, M.J., and Palsson, B.O. 2004. Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res. 14: 1298–1309. Dubois, E., Hiernaux, D., Grennon, M., and Wiame, J.M. 1978. Specific induction of catabolism and its relation to repression of biosynthesis in arginine metabolism of Saccharomyces cerevisiae. J. Mol. Biol. 122: 383–406.[CrossRef][Medline] Edwards, J.S. and Palsson, B.O. 2000. The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. 97: 5528–5533. Edwards, J.S., Ibarra, R.U., and Palsson, B.O. 2001. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol. 19: 125–130.[CrossRef][Medline] Famili, I., Forster, J., Nielsen, J., and Palsson, B.O. 2003. Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proc. Natl. Acad. Sci. 100: 13134–13139. Farkas, I., Hardy, T.A., Goebl, M.G., and Roach, P.J. 1991. Two glycogen synthase isoforms in Saccharomyces cerevisiae are coded by distinct genes that are differentially controlled. J. Biol. Chem. 266: 15602–15607. Forster, J., Famili, I., Palsson, B.O., and Nielsen, J. 2003. Large-scale evaluation of in silico gene deletions in Saccharomyces cerevisiae. OMICS 7: 193–202.[CrossRef][Medline] Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418: 387–391.[CrossRef][Medline] Harrison, R., Papp, B., Pal, C., Oliver, S.G., and Delneri, D. 2007. Plasticity of genetic interactions in metabolic networks of yeast. Proc. Natl. Acad. Sci. 104: 2307–2312. Hartman, J.L.T., Garvik, B., and Hartwell, L. 2001. Principles for the buffering of genetic variation. Science 291: 1001–1004. He, X. and Zhang, J. 2006. Toward a molecular understanding of pleiotropy. Genetics 173: 1885–1891. Herrgard, M.J., Lee, B.S., Portnoy, V., and Palsson, B.O. 2006. Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. Genome Res. 16: 627–635. Ibarra, R.U., Edwards, J.S., and Palsson, B.O. 2002. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420: 186–189.[CrossRef][Medline] Ihmels, J., Levy, R., and Barkai, N. 2004. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat. Biotechnol. 22: 86–92.[CrossRef][Medline] Johnson, D.R., Knoll, L.J., Levin, D.E., and Gordon, J.I. 1994. Saccharomyces cerevisiae contains four fatty acid activation (FAA) genes: An assessment of their role in regulating protein N-myristoylation and cellular lipid metabolism. J. Cell Biol. 127: 751–762. Kafri, R., Bar-Even, A., and Pilpel, Y. 2005. Transcription control reprogramming in genetic backup circuits. Nat. Genet. 37: 295–299.[CrossRef][Medline] Kuepfer, L., Sauer, U., and Blank, L.M. 2005. Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res. 15: 1421–1430. Mahadevan, R. and Schilling, C.H. 2003. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab. Eng. 5: 264–276.[CrossRef][Medline] Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. 1999. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. 96: 4285–4288. Reed, J.L., Famili, I., Thiele, I., and Palsson, B.O. 2006. Towards multidimensional genome annotation. Nat. Rev. Genet. 7: 130–141.[CrossRef][Medline] Schilling, C.H., Letscher, D., and Palsson, B.O. 2000. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J. Theor. Biol. 203: 229–248.[CrossRef][Medline] Schuldiner, M., Collins, S.R., Thompson, N.J., Denic, V., Bhamidipati, A., Punna, T., Ihmels, J., Andrews, B., Boone, C., Greenblatt, J.F., et al. 2005. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123: 507–519.[CrossRef][Medline] Schuster, S., Fell, D.A., and Dandekar, T. 2000. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat. Biotechnol. 18: 326–332.[CrossRef][Medline] Segre, D., Deluna, A., Church, G.M., and Kishony, R. 2005. Modular epistasis in yeast metabolism. Nat. Genet. 37: 77–83.[CrossRef][Medline] Shlomi, T., Berkman, O., and Ruppin, E. 2005. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc. Natl. Acad. Sci. 102: 7695–7700. Tong, A.H., Lesage, G., Bader, G.D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G.F., Brost, R.L., Chang, M., et al. 2004. Global mapping of the yeast genetic interaction network. Science 303: 808–813. Van Hoek, P., Van Dijken, J.P., and Pronk, J.T. 1998. Effect of specific growth rate on fermentative capacity of baker's yeast. Appl. Environ. Microbiol. 64: 4226–4233. van Swinderen, B. and Greenspan, R.J. 2005. Flexibility in a gene network affecting a simple behavior in Drosophila melanogaster. Genetics 169: 2151–2163. Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D., Bussey, H., et al. 1999. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285: 901–906.
Received May 5, 2007; accepted in revised format August 6, 2007.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||