|
|
|
|
Genome Res. 14:1797-1805, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Methods Genome-Scale In Silico Models of E. coli Have Multiple Equivalent Phenotypic States: Assessment of Correlated Reaction Subsets That Comprise Network StatesDepartment of Bioengineering, University of California, San Diego, San Diego, California 92092-0412, USA
The constraint-based analysis of genome-scale metabolic and regulatory networks has been successful in predicting phenotypes and useful for analyzing high-throughput data sets. Within this modeling framework, linear optimization has been used to study genome-scale metabolic models, resulting in the enumeration of single optimal solutions describing the best use of the network to support growth. Here mixed-integer linear programming was used to calculate and study a subset of the alternate optimal solutions for a genome-scale metabolic model of Escherichia coli (iJR904) under a wide variety of environmental conditions. Analysis of the calculated sets of optimal solutions found that: (1) only a small subset of reactions in the network have variable fluxes across optima; (2) sets of reactions that are always used together in optimal solutions, correlated reaction sets, showed moderate agreement with the currently known transcriptional regulatory structure in E. coli and available expression data, and (3) reactions that are used under certain environmental conditions can provide clues about network regulatory needs. In addition, calculation of suboptimal flux distributions, using flux variability analysis, identified reactions which are used under significantly more environmental conditions suboptimally than optimally. Together these results demonstrate the utilization of reactions in genome-scale models under a variety of different growth conditions.
Constraint-based modeling of reconstructed genome-scale metabolic networks has proven useful for understanding and predicting the genotype-phenotype relationship in microbes (Ibarra et al. 2002
Mixed-integer linear programming (MILP) has been used to study optimal solutions by identifying the minimum number of reactions needed for optimal growth (minimum reaction sets; Burgard and Maranas 2001
Here we applied the recursive MILP algorithm (Lee et al. 2000
We first computed a sample of 56,756 optimal solutions under 136 growth environments. We then assessed the properties of these solutions and found the correlated reaction subsets in these optimal solutions. The correlated reaction subsets were then compared to known regulon structures and expression profiling data sets. For aerobic and anaerobic growth on glucose, we compared the utilization of reactions in the alternative optima to described transcriptional regulation of the genes that are associated with some of these fluxes. Finally, we compared the reaction usage in optimal growth solutions to those in suboptimal growth solutions.
Comparing Results From MILP and Flux Variability Analysis
To investigate how well the calculated condition-specific optima spanned or represented the actual range of optimal solutions, the number of varying fluxes and the range of those fluxes were calculated from the different sets of condition-specific optima and compared to the results using flux variability analysis. Figure 1 shows that for the 88 aerobic conditions, the first 150 optimal solutions are sufficient to identify all of the variable fluxes among the set of condition-specific optima, whereas determining the numerical range of these fluxes in the set sometimes required the calculation of more optima. For some carbon sources, the magnitude of the flux ranges when looking at all 500 optimal solutions was smaller than the actual flux ranges calculated using flux variability analysis. These results imply that the first 500 optimal solutions are adequate for getting a sample of the full set of condition-specific optima, while still remaining computationally tractable. With adequate sampling, the set of condition-specific optima can be further analyzed.
Properties of Alternate Optima The average optimal flux distribution, found among the 136 growth conditions considered, used 294 of the 931 internal reactions in iJR904, and for a given growth condition the average number of variable fluxes was 49 (with the number of variable fluxes typically being higher under aerobic conditions). Interestingly, with the exception of anaerobic growth on adenosine and deoxyadenosine, none of the exchange fluxes with the environment varied across alternate optima. This result indicates that the external state or phenotype of the cell is normally unique for a set of condition-specific optima, whereas flexibility in the internal fluxes accounts for the different optima. In all, there were 140 internal fluxes and five exchange fluxes that were variable under at least one of the 136 tested environmental conditions (see Supplemental material). Most of the flexibility in the network resides with reactions using different electron carriers, nucleotide salvage reactions, and central metabolic reactions. There were quite a few instances in central metabolism where the flux through a reaction was variable across the different optima (most of glycolysis, TCA cycle, anaplerotic reactions, and oxidative phosphorylation reactions). The complete set of alternate optima, both aerobic and anaerobic, are a set of 56,756 flux distributions that were further studiedgrouped together they are referred to here as the `mixed optima'. For each reaction in the network, the fraction of the optimal solutions, in the set of mixed optima, which use that particular reaction (fopt) was calculated (Fig. 2A, also available in Supplemental material). Reactions in the iJR904 model were previously assigned to different metabolic subsystems based on metabolic function, and Figure 2B shows for each subsystem how many reactions are assigned to that subsystem and the distribution of fractional usage within this subsystem. For example, there are 40 oxidative phosphorylation reactions: 65% of these are never used in any of the mixed optima, 20% have a fopt between 0 and 0.25, 3% have a fopt between 0.25 and 0.5, 8% have a fopt between 0.75 and 1.0, and 5% are used in all optima.
A total of 201 reactions were used in all optimal solutions across the different environmental conditions; these include reactions involved in amino acid metabolism (cys, his, ile, leu, lys, met, phe, and tyrthe amino acids that cannot serve as carbon sources), folate metabolism, and membrane lipid biosynthesis. These 201 reactions are needed for optimal growth across all 136 simulated growth environments, and are related to the intersection of minimal reactions sets (Burgard and Maranas 2001
In contrast to the 201 reactions that are used across all of the mixed optima, a number of reactions (351) were never utilized by any of the optimal solutions for any of the tested environments. These include 185 blocked reactionsreactions that will never be used by the network even if all exchange fluxes are free (Burgard et al. 2004
Correlated Reaction Sets
The regulatory structure in E. colitaken from EcoCyc (Karp et al. 2002
As a further comparison, expression data were used to determine whether genes in a correlated reaction set have similar expression patterns under different conditions. Using publicly available AffyMetrix data, from the ASAP database, (Allen et al. 2003
The transcription units appear to have more significant correlation than the correlated reaction sets: 18 out of 66 (27.3%) correlated reaction sets have P-values less than 0.05, whereas 159 of 321 (49.5%) transcription units have P-values less than 0.05. Almost all of the correlated reaction sets with significant correlation were classified as having evidence of being part of the same regulon (17). The number of times a correlated reaction set is used in the mixed optima does not seem to affect how well the set correlates with the expression data.
Condition-Dependent Fluxes
A previously developed regulatory model, iMC1010v2, was used to compare these predictions with the transcriptional regulatory structure in E. coli. The iMC1010v2 model was constructed based on established regulatory mechanisms and went through one iteration of improvement using transcription factor knockout strains whose expression was measured under glucose aerobic and glucose anaerobic conditions (Covert et al. 2004 Most of the reactions that have predicted changes in the expression of associated genes, based on iMC1010v2, are not used in any of the condition-specific optimal solutions for glucose aerobic or glucose anaerobic conditions. For both the aerobic and anaerobic reaction sets, nine reactions in each set fell on the predicted half of the graph. Discrepancies occurred in both reaction sets, where more isozymes are expressed under the condition that utilizes a reaction the least under optimal conditions. One of the two discrepancies in the aerobic reaction set is used only slightly more in the anaerobic case (fanaerobic = 0.681 vs. faerobic = 0.676); further sampling of the alternate optima could result in higher aerobic usage (faerobic). The other aerobic reaction set discrepancy is for the glycolate transport reaction (GLYCLT2r) which is used in all glucose anaerobic optimal solutions and in no glucose aerobic optimal solutions. Deletion of GLYCLT2r from the network only drops the maximal anaerobic growth rate by 0.1%, so even if the transporter is not expressed there would be negligible differences in observed growth rate. It is also important to note that the model predicts that glycolate cannot serve as a carbon source under anaerobic conditions, explaining why this transporter might be expressed aerobically. There are also six discrepancies for reactions in the aerobic reaction set: hydrogenase reactions (HYD1, HYD2, HYD3), fumarate reductase (FRD2), formate-hydrogen lyase (FHL), and formate dehydrogenase (FDH2). Like the glycolate transporter, all six reactions can be deleted simultaneously with negligible effect on the predicted maximal growth rate (drops only by 0.1%). The fumarate reductase enzyme is responsible for two reactions, FRD2 and FRD3 (where different quinones are used as electron donors in the two reactions). The FRD3 reaction is used in all glucose anaerobic optima, explaining why the fumarate reductase enzyme is expressed under anaerobic conditions.
Suboptimal Fluxes
Looking at the usage patterns of exchange fluxes in optimal and suboptimal aerobic solutions (these are not shown in Fig. 6) gives insights into how certain metabolites are used by the cell (Table 2). Some metabolites can only serve as carbon sources and will never be secreted either optimally or suboptimally; this is true for 52 extracellular metabolites. Thirteen extracellular metabolites can be secreted suboptimally and cannot be used as carbon sources (they can only serve as by-products). Another set of 30 metabolites can be used as a sole carbon source and can be secreted during suboptimal growth on any of the other carbon sources, and additionally six carbon sources can be secreted during suboptimal growth on only a limited number of other carbon sources. Finally, the metabolites urea, xanthine, uracil, and indole cannot serve as carbon sources and are secreted optimally under aerobic growth conditions on a select number of carbon sources.
Interestingly, a set of 147 metabolic reactions are used suboptimally but never optimally (Fig. 6), so the presence of the responsible enzymes under the tested conditions would drive the cell towards a less optimal state. These reactions include ABC transporters, when alternate transport mechanisms are available; the first step in the Entner Dourdoroff pathway; phospholipid recycling; nucleotide degradation; transporters associated with by-products that cannot be utilized as carbon sources; and oxidative phosphorylation reactions.
Alternate optimal growth solutions exist in genome-scale models, and these solutions need to be studied. Herein, application of an MILP algorithm to a genome-scale metabolic model of E. coli revealed that the number of alternate optima for genome-scale models is often large, making it computationally challenging to enumerate all of the optima for some conditions. Analysis of the calculated sets of optimal solutions showed that: (1) only a small subset of reactions in the network have variable fluxes across optima, (2) correlated reaction sets showed moderate agreement with the regulatory structure elucidated to date using classical methods and with expression data, and (3) condition-dependent reactions help provide clues about network regulatory needs. The additional calculation of suboptimal flux distributions identified reactions which are used under more environmental conditions suboptimally than optimally. There are many more alternate optima than the number of fluxes that can vary across the set of alternate optimal solutions. Only a relatively small subset of reactions in the network (140 of 931 internal reactions) has variable fluxes across optimal solutions. For only a few environmental conditions (2 of 136) will the exchange fluxes vary across optima, indicating that the internal state of the cell is where the variability lies. In some cases it will be difficult to exactly determine which optima a cell might use based on expression data or protein levels, as some variable reactions are carried out by the same enzymes. For example, there are 45 nucleotide salvage reactions whose fluxes can vary under at least one of the tested environmental conditions; however, there are only 19 different enzymes associated with these reactions. For some variable reactions, such as those involved in central metabolism, expression data, proteomic data, and flux data could be used to help identify which reaction a cell is utilizing. Another level of complication in identifying the physiological solution is that an affine convex combination of the basic alternate optima is also an alternate optima; it will also be a valid flux distribution with the same objective value. For these reasons it may prove too difficult to determine exactly what solution a living cell utilizes.
The first 500 alternate optima for each condition were used to identify sets of correlated reactions. The majority of these correlated reaction sets (45 of 66) were consistent with established regulatory structure. However, comparisons made using established transcriptional regulatory structure are limited by the fact that a large portion of the transcriptional regulatory network has not been elucidated. A better agreement with regulon structure and correlated reaction sets in genome-scale models could emerge as the transcriptional regulatory network is further characterized. It was recently shown that the consistency between transcriptional regulatory network structure and expression data can vary depending on the structural features of network elements and functional classes of genes (Herrgard et al. 2003 Unraveling the transcriptional regulatory network in E. coli is currently of great interest to many researchers. Comparing condition-specific optima provides useful insights into why the bacteria choose to express certain enzymes under certain conditions. Comparisons between glucose anaerobic and glucose aerobic reaction usage in optimal solutions generated testable hypotheses, some of which have already been proven experimentally. It will be important to investigate multiple sets of conditional optima, as the reasons behind enzyme regulation might not be apparent by studying only two environmental conditions.
By looking at optimal and suboptimal flux distributions, reactions which are used only in suboptimal solutions can be identified. Why would E. coli over the course of evolution retain these enzymes, since these reactions are never used optimally? These enzymes might be useful for reasons not captured based on the assumed condition of optimal growth. Network topology cannot capture the importance of enzymes that might catalyze a less efficient overall reaction; these enzymes could have higher turnover rates, better kinetics, allosteric regulation, or other such reasons making them beneficial. For example, to convert pyruvate into acetyl-CoA, many of the aerobic optimal solutions use pyruvate formate lyase (PFL) rather than pyruvate dehydrogenase (PDH), because formate, a product of PFL reaction, can be converted to hydrogen gas whose electrons are then carried down the electron transport chain. Using PFL the cell can make slightly more ATP than if it used PDH (with NADH transferring electrons to the electron transport chain). The network topology alone cannot predict possible loss of hydrogen gas making PFL more efficient. As another example, E. coli has two cytochrome oxidases, one capable of translocating two protons and the other capable of translocating 2.5 protons per electron pair donated to oxygen. The enzymes have different affinities for oxygen, making one better than the other at different oxygenation levels (Gennis and Stewart 1996
For the analysis presented in this paper, further sampling of the alternate optima would not significantly affect the results. From the flux variability analysis results, the set of fluxes that are always used or never used across the mixed alternate optima would not change; however, deviations could occur in the fraction of mixed optima (fopt) that use a particular reaction for those reactions in Figure 2 with fopt between 0 and 1. Investigation of the correlated reaction sets, in conjunction with the flux variability analysis results, also indicated that these sets would not change if more sampling of the optima took place. It should be noted, however, that biases in the sampling could affect other types of analysis of the resulting flux distributions, such as the distribution of flux values through individual reactions (Wiback et al. 2004 Taken together, these in silico results indicate that an optimal E. coli growth phenotype might be achievable by a large number of internal flux distributions; distinguishing the differences between these optima experimentally might prove difficult. In addition, studying the optimal and suboptimal utilization of reactions in the network could lead to understanding why enzymes are expressed under different conditions.
Metabolic Network A recently reconstructed E. coli metabolic network was used in this study (Reed et al. 2003
Environmental Conditions Tested
MILP Algorithm
where S is the stoichiometric matrix, v is the steady-state flux vector, and
At each iteration, J, at least one of the non-zero fluxes from the previous solution (NZJ-1) must be set to zero, where the binary variable yi is 1 if that flux is selected to be removed from the basis at iteration J (equation 2a). The binary variable wi is subsequently forced to zero if yi is one (equation 2c), and the upper and lower bounds for that particular flux are then constrained to zero (equation 2d). Equation 2b ensures that alternate bases are not revisited by eliminating at least one non-zero variable found in previous iterations. The cplex solver in GAMS (GAMS Development) was used to enumerate the first 500 optima for each environmental condition.
Correlated Reactions Sets
Transcriptional Regulatory Network Predictions
Analysis of Expression Data
This work was funded by research grants from NIH (GM57089) and NSF (BES-01-20363). The authors and UCSD disclose potential conflicting financial interests. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
1 Corresponding author. E-MAIL bpalsson{at}be-research.ucsd.edu; FAX (858) 822-3120. [Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2546004.
Allen, T.E., Herrgard, M.J., Liu, M., Qiu, Y., Glasner, J.D., Blattner, F.R., and Palsson, B.Ø. 2003. Genome-scale analysis of the uses of the Escherichia coli genome: Model-driven analysis of heterogeneous data sets. J. Bacteriol. 185: 6392-6399. Almaas, E., Kovacs, B., Vicsek, T., Oltvai, Z.N., and Barabasi, A.L. 2004. Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature 427: 839-843.[CrossRef][Medline]
Beard, D.A., Liang, S.D., and Qian, H. 2002. Energy balance for analysis of complex metabolic networks. Biophys. J. 83: 79-86. Burgard, A.P. and Maranas, C.D. 2001. Probing the performance limits of the Escherichia coli metabolic network subject to gene additions or deletions. Biotechnol. Bioeng. 74: 364-375.[CrossRef][Medline] Burgard, A.P., Vaidyaraman, S., and Maranas, C.D. 2001. Minimal reaction sets for Escherichia coli metabolism under different growth requirements and uptake environments. Biotechnol. Prog. 17: 791-797.[CrossRef][Medline] Burgard, A.P., Pharkya, P., and Maranas, C.D. 2003. Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng. 84: 647-657.[CrossRef][Medline]
Burgard, A.P., Nikolaev, E.V., Schilling, C.H., and Maranas, C.D. 2004. Flux coupling analysis of genome-scale metabolic network reconstructions. Genome Res. 14: 301-312. Covert, M.W., Schilling, C.H., and Palsson, B. 2001. Regulation of gene expression in flux balance models of metabolism. J. Theoret. Biol. 213: 73-88.[CrossRef][Medline] Covert, M.W., Knight, E.M., Reed, J.L., Herrgard, M.J., and Palsson, B.Ø. 2004. Integrating high-throughput and computational data elucidates bacterial networks. Nature 429: 92-96.[CrossRef][Medline]
Edwards, J.S. and Palsson, B.Ø. 2000. The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. 97: 5528-5533.
Fong, S.S., Marciniak, J.Y., and Palsson, B.Ø. 2003. Description and interpretation of adaptive evolution of Escherichia coli K-12 MG1655 using a genome-scale in silico metabolic model. J. Bacteriol. 185: 6400-6408. Forster, J., Famili, I., Palsson, B.Ø., and Nielsen, J. 2003. Large-scale evaluation of in silico gene knockouts in Saccharomyces cerevisiae. OMICS 7: 193-202.[CrossRef][Medline] Gennis, R.B. and Stewart, V. 1996. Respiration. In Escherichia coli and salmonella (ed. F.C. Neidhardt), pp. 217-261. ASM Press, Washington, DC.
Gerdes, S.Y., Scholle, M.D., Campbell, J.W., Balazsi, G., Ravasz, E., Daugherty, M.D., Somera, A.L., Kyrpides, N.C., Anderson, I., Gelfand, M.S., et al. 2003. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J. Bacteriol. 185: 5673-5684.
Herrgard, M.J., Covert, M.W., and Palsson, B.Ø. 2003. Reconciling gene expression data with known genome-scale regulatory network structures. Genome Res. 13: 2423-2434. Ibarra, R.U., Edwards, J.S., and Palsson, B.Ø. 2002. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420: 186-189.[CrossRef][Medline]
Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Collado-Vides, J., Paley, S.M., Pellegrini-Toole, A., Bonavides, C., and Gama-Castro, S. 2002. The EcoCyc Database. Nucleic Acids Res. 30: 56-58. Kauffman, K.J., Prakash, P., and Edwards, J.S. 2003. Advances in flux balance analysis. Curr. Opin. Biotechnol. 14: 491-496.[CrossRef][Medline] Lee, S., Phalakornkule, C., Domach, M.M., and Grossmann, I.E. 2000. Recursive MILP model for finding all the alternate optima in LP models for metabolic networks. Comp. Chem. Eng. 24: 711-716.[CrossRef] Mahadevan, R. and Schilling, C.H. 2003. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab. Eng. 5: 264-276.[CrossRef][Medline]
Papin, J.A., Price, N.D., and Palsson, B.Ø. 2002. Extreme pathway lengths and reaction participation in genome-scale metabolic networks. Genome Res. 12: 1889-1900. Papin, J.A., Price, N.D., Wiback, S.J., Fell, D.A., and Palsson, B.Ø. 2003. Metabolic pathways in the post-genome era. Trends Biochem. Sci. 28: 250-258.[CrossRef][Medline] Phalakornkule, C., Lee, S., Zhu, T., Koepsel, R., Ataai, M.M., Grossmann, I.E., and Domach, M.M. 2001. A MILP-based flux alternative generation and NMR experimental design strategy for metabolic engineering. Metab. Eng. 3: 124-137.[CrossRef][Medline]
Price, N.D., Famili, I., Beard, D.A., and Palsson, B.Ø. 2002. Extreme pathways and Kirchhoff's second law. Biophys. J. 83: 2879-2882. Price, N.D., Papin, J.A., Schilling, C.H., and Palsson, B. 2003. Genome-scale microbial in silico models: The constraints-based approach. Trends Biotechnol. 21: 162-169.[CrossRef][Medline] Price, N.D., Schellenberger, J., and Palsson, B.Ø. 2004. Uniform sampling of steady state flux spaces: Means to design experiments and to interpret enzymopathies. J. Biol. Chem. (in press). Raamsdonk, L.M., Teusink, B., Broadhurst, D., Zhang, N., Hayes, A., Walsh, M.C., Berden, J.A., Brindle, K.M., Kell, D.B., Rowland, J.J., et al. 2001. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat. Biotechnol. 19: 45-50.[CrossRef][Medline]
Reed, J.L. and Palsson, B.Ø. 2003. Thirteen years of building constraint-based in silico models of Escherichia coli. J. Bacteriol. 185: 2692-2699. Reed, J.L., Vo, T.D., Schilling, C.H., and Palsson, B.Ø. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4: R54.51-R54.12.
Salgado, H., Gama-Castro, S., Martinez-Antonio, A., Diaz-Peredo, E., Sanchez-Solano, F., Peralta-Gil, M., Garcia-Alonso, D., Jimenez-Jacinto, V., Santos-Zavaleta, A., Bonavides-Martinez, C., et al. 2004. RegulonDB (version 4.0): Transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acids Res. 32: D303-306.
Schilling, C.H., Covert, M.W., Famili, I., Church, G.M., Edwards, J.S., and Palsson, B.Ø. 2002. Genome-scale metabolic model of Helicobacter pylori 26695. J. Bacteriol. 184: 4582-4593. Schuster, S., Klamt, S., Weckwerth, W., Moldenhauer, F., and Pfeiffer, T. 2002. Use of network analysis of metabolic systems in bioengineering. Bioprocess Biosyst. Eng. 24: 363-372.[CrossRef]
Segre, D., Vitkup, D., and Church, G.M. 2002. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. 99: 15112-15117. Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S., and Gilles, E.D. 2002. Metabolic network structure determines key aspects of functionality and regulation. Nature 420: 190-193.[CrossRef][Medline] Varma, A. and Palsson, B.Ø. 1994. Metabolic flux balancing: Basic concepts, scientific and practical use. Bio/Technology 12: 994-998.[CrossRef] Wiback, S.J., Famili, I., Greenberg, H.J., and Palsson, B.Ø. 2004. Monte Carlo sampling can be used to determine the size and shape of the steady state flux space. J. Theor. Biol. 228: 437-447.[CrossRef][Medline]
https://asap.ahabs.wisc.edu/~glasner/Protocols/DataDefinitionDefinitions.txt; Web site describes how the estimated transcript copy number is calculated from the gene expression data.
Received March 5, 2004; accepted in revised format July 8, 2004. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||