|
|
|
|
Genome Res. 13:2435-2443, 2003 ©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00 Letter Regulatory Network of Escherichia coli: Consistency Between Literature Knowledge and Microarray Profiles1 Program of Computational Genomics, Centro de Investigación sobre Fijación de Nitrógeno, Univercidad Nacional Autónoma de México (CIFN-UNAM), Morelos 62100, México 2 Genome Center, University of Wisconsin, Madison, Wisconsin 53706, USA 3 Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, 01000 México D. F., México
The transcriptional network of Escherichia coli may well be the most complete experimentally characterized network of a single cell. A rule-based approach was built to assess the degree of consistency between whole-genome microarray experiments in different experimental conditions and the accumulated knowledge in the literature compiled in RegulonDB, a data base of transcriptional regulation and operon organization in E. coli. We observed a high and statistical significant level of consistency, ranging from 70%-87%. When effector metabolites of regulatory proteins are not considered in the prediction of the active or inactive state of the regulators, consistency falls by up to 40%. Similarly, consistency decreases when rules for multiple regulatory interactions are altered or when "on" and "off" entries were assigned randomly. We modified the initial state of regulators and evaluated the propagation of errors in the network that do not correlate linearly with the connectivity of regulators. We interpret this deviation mainly as a result of the existence of redundant regulatory interactions. Consistency evaluation opens a new space of dialogue between theory and experiment, as the consequences of different assumptions can be evaluated and compared.
Regulatory gene networks in the cell play an essential role in controlling the expression of specific genes according to environmental changes. The resulting patterns of gene expression vary temporally and spatially, as the outcome of a set of decisions executed by the regulatory network (Oosawa and Savageau 2002
Extensive molecular studies in Escherichia coli have determined details of regulatory mechanisms and have also revealed many global aspects of the gene regulatory network (Neidhardt and Savageau 1996
The understanding of how structural properties of regulatory networks determine the dynamics of their regulated genes has been the subject of studies for four decades (Kauffman 1974
Given an initial condition specifying the state of regulatory proteins derived from an experiment, the network of regulatory interactions and conformations of regulators determines theoretical expression states of the regulated genes that can be compared with experimental data. This comparison, which generates a single number, the consistency between experiment and theory, opens a new space of dialogue between theory and experiment. The effects of different assumptions and their corresponding propagation of errors can be tested and compared. Contrary to most recent studies aimed at reconstructing the regulatory wiring from microarray experimental data alone (Eisen et al. 1998
We analyzed expression profiles of E. coli under four conditions, minimal medium (the common control condition), heat shock, stationary phase, and anaerobic growth. Three control, independently repeated experiments, showed a correlation coefficient varying between 79% and 87%. The filtering of noise, as explained in the Methods section, left a set of 2157 (49%) genes for all analyses presented here. Similarly, of 170 known regulatory proteins, only 83 (49%) satisfied these conditions and were used in our analyses. We used a relative expression scale to transform expression ratios into on and off states, as described in the Methods section. This discretization could be done because all experimental values are relative to a unique control condition, that of minimal medium. Minimal medium shows a larger fraction of on genes, contrary to anaerobic and stationary phase conditions, consistent with overall results obtained in other laboratories (Richmond et al. 1999
To get an initial estimate of the expected consistency, we performed a comparison of microarray expression values with well-defined known sets of genes of each stimulon. The results of literature comparison are detailed in Tables 1-4S in the Supplemental material, available online at www.genome.org. Table 1 shows that in all conditions except stationary phase, the expression of genes corresponds to that reported in the literature for 86%-88% of the cases. The lower consistency of 69% in stationary phase is not a surprise, as the experimental settings of deprivation and stresses to induce this condition are rather variable in the literature and do not correspond exactly with those used in the microarray experiment. We considered in stationary phase, the subset of genes induced by sigmaS, which precedes the most significant changes involved in the transcription of most of the genes when the cell enters to stationary phase (Ishihama 2000
Definitions of Complex, Simple, and Strict Regulons and Homogeneity of Their Expression
Furthermore, complex and simple regulons are strict regulons if the role (activator or repressor) of each regulatory protein is the same for every gene in the regulon. For instance, the group of genes fruB, fruA, fruK, and pykF define the negative strict simple regulon of FruR. On the other hand, the genes edp and pgk conform the complex regulon regulated by FruR and CRP. Once we have discretized individual genes, we identify those simple and complex strict regulons that are homogeneously expressed in a given condition (as explained in the Methods section). Now, we can refer to the on or off state for each strict regulon. We performed this process in all cases, excluding the regulator from the regulon set, even if subject to autoregulation, to prevent noise in the homogeneity due to possible conflicts in the expression of the regulatory gene, such as oscillations or other complex behavior. In the four conditions tested, 77% of the regulons, on average, show a homogeneous expression as shown in Table 2. This is rather high, considering the experimental noise inherent in the methodology of microarrays, messenger stability, and the amount of plausible incomplete knowledge of gene regulation in the database, such as unknown additional regulators affecting transcription initiation, and alternative levels of regulation. In addition, we are not certain that the experiments as performed were done under steady-state conditions. This could be an additional source of error that may have a consequence in our estimates of consistency. The subsequent analysis of consistency is limited to regulons with a homogeneous expression profile. We have information for 77 simple regulons in RegulonDB, with an average of 4.71 genes per regulon. Of these, only on average, 18.25 were shown homogeneous in at least one condition. There are 171 strict complex regulons in the database, of which 39.5, on average, were homogeneously expressed and are analyzed. Table 2 also shows that, on average, 57 simple regulons in each condition are regulating just one gene. Of these, only 20, on average, were used in evaluations of consistency. These left us
Prediction of Conformation and State of Regulators When a regulator is present, a more elaborated process, using our basic knowledge of gene regulation, is required. First, we assume that when a gene gives an on value on the microarray, its protein product is present too. But presence of a regulator does not mean it is in a conformation able to exert its positive (if an activator) or its negative (if a repressor) effect on regulated genes. We assume that regulators have two conformationswhich is the case, except for very few exceptionsone in which the allosteric metabolite (i.e., cAMP for CRP, allolactose for LacI, arabinose for AraC, etc.) is bound to the protein and one in which it is unbound. In some cases, alternative conformations involve phosphorylation or other processes, which, in the model, are similarly treated. Conformations are defined as active (for present regulators) or inactive (for present regulatorsdepending on the presence or absence of its effectorand for absent regulators), and are deduced from the expression of the strict simple regulon for each regulator. Conformations, as defined, can in fact be assigned also for regulators whose effector metabolites have not been identified. Briefly, we can say that a simple regulon is expressed either when its transcriptional activator is present and active, or when its transcriptional repressor is absent or inactive. We call this the simple rule, as it can be applied only to genes whose promoters are regulated by only one transcription factor, that is to say, to strict simple regulons. Table 3 summarizes the rules used to determine all possible active and inactive states of regulators determining simple regulons to be on or off. The first half of the table, when the regulatory gene is absent, is based on the promoter rules, whereas the second half, when regulatory proteins are present, uses the simple rule. We were able to predict in all of the conditions, on average, the conformation state as active or inactive for 64% of regulators from the initial set of 83. We found that 51% of the regulatory proteins have an inactive state (on average, around 25 are activators and 17 are repressors), whereas 13% are active (around five are activators and six are repressors). For 36% of the regulators (around 30), we were unable to predict whether the proteins were in an active or inactive state, either because the simple regulon did not behave homogeneously, or because there is no known strict simple regulon. With these data at hand, we are now in a position to exploit the structure of the network and evaluate consistency between simple and complex regulons.
Consistency in Expression Between Simple and Complex Regulons
The application of these rules, together with the defined active or inactive state of regulator proteins enables us to predict the expression of complex regulons. We assessed the consistency of individual strict complex regulons, comparing our predictions with the observed expression state in the experiments. For example, ArcA and FNR are two repressors predicted to be repressing five genes. These five genes appeared off under heat shock, a state that is consistent with both repressors being in an active state. In this case, the experimental and the predicted expression state coincide, so we define the (ArcA, FNR) complex regulon as a case of direct consistency. All of the combinations of multiple interactions were considered, and different cases and types of rules associated with consistency of complex regulons are listed in the last column of Table 5. We found the highest level of consistency (87%) in heat shock, compared with that of stationary phase, in which we obtained 70% of consistent regulons. These values are similar to our previous estimates, on the basis of a small set of well-studied genes under these different conditions. We obtained a consistency of 76% and 71% for minimal medium and anaerobiosis growth, respectively. The random estimations of consistency on the basis of an average of 1000 simulated values, given the network and the rules (see the Methods section), are 45% for heat shock, 49% for stationary phase, 54% for minimal medium, and 53% in anaerobiosis. The corresponding Z scores of 4.48, 2.49, 2.54, and 2.24 clearly suggest that the observed values are significantly different from random values.
Re-evaluating Consistency: Alternative Model and Error Propagation As shown in Figure 1, we found that in all conditions tested the difference between the maximum and minimum level of consistency of NC (no conformation) and NCRP (no conformation and no CRP rule) ranges from around 18% to 40%, with heat shock and anaerobiosis being the most affected conditions. This dramatic decrease illustrates how sensitive these numbers are to different definitions of rules.
We observe that simple negative autoregulated regulons tend to be less consistent in several conditions than simple negative regulons in which there is no autoregulation. This was observed in the autoregulated regulons governed by PdhR, PurR, and FhlA and OxyR, contrary to simple regulons like ArcA, DnaA, and OmpR. This observation can be rationalized in terms of the oscillations of expression of homeostatic systems (Thomas and D'Ari 1990
The E. coli network of regulatory interactions follows a power-law distribution (Oosawa and Savageau 2002
The results presented here suggest that our rule-based approach gives the best levels of congruence, in spite of the noise prevailing in the microarray data and the generalizations made about gene regulation. The results related to each condition are shown in Table 5S in the Supplemental material.
The final quantitative result of comparing the consistency between the microarray experiments and the predictions on the basis of the literature is a single number of consistency for each experiment. The range and reproducibility of the high consistency observed shall be strongly dependent on the quality of the experiments. Note that we used a single control as a reference, and all experiments were performed in the same laboratory. The value of the work presented here is not only the precise degree of consistency, but also an elaborated construction of ideas and knowledge of gene regulation integrated into a rich system whose output is compared with that of the experiment. Furthermore, the virtue of the approach presented here lies in the fact that almost any piece in this construction can be substituted by a different alternative, and can be evaluated by means of the effect on consistency with experimental data. In this sense, this work opens a large window of possibilities for future research. It is also important to emphasize that this setting of ideas and construction is only applicable when the network is known. The large amount of accumulated knowledge on transcription initiation, as well as on operon organizationestimated to represent 25% of the total regulatory network in E. coliis currently rather unique in this respect. RegulonDB describes knowledge in a discrete and static way, indicating regulatory interactions and their positive and negative effects. Consistency as described in this work was assigned to a single experimental condition. The expression levels of genes in simple regulons were used to assign the active and inactive state of regulators. We made a comparison, using rules of multiple interactions and gene expression levels within complex regulons. In this way, the comparison of these two levels (state assignment and multiple interaction comparison), determine our final measurements. Given the rules of multiple interactions, one could think that positive regulators have more redundant interactions than negative regulators. However, we did not observe a difference in consistency when comparing repressors and activators; in all of the conditions analyzed, we found, on average, that 51% of the complex regulons regulated only by activators were consistent, and 49% of those regulated only by repressors were consistent also. One would expect that the total connectivity of nonredundant interactions would exhibit a linear relationship with error propagation. If this were the case, the confidence associated with the experimental determination of the expression of genes could depend on their place within the network.
Estimation of consistency put together three ingredients, the knowledge of the network and precise interactions, the setting of the initial conditions of the state of regulatory proteins as derived from the experiment, and the rules determining the outcome of multiple interactions. As genome projects, modeling of regulatory networks (Covert et al. 2001 The ability to perform these comparisons opens questions to future research in order to precisely address and improve the adequate level of representation in the modeling of regulatory network interactions, and to integrate our understanding of the regulatory mechanisms as a function of the large set of interconnected regulatory interactions.
Growth Conditions For all experiments, a single colony of E. coli strain MG1655 was inoculated into MOPS minimal medium supplemented with 0.2% glucose (Neidhardt et al. 1974
Microarray Experiments
Data Treatment
Because the three conditions tested used the same control, we determined the Pearson correlation coefficient of the logarithmic percentage intensities individually for each gene as a measure of reproducibility on these repeated initial conditions. These varied from 0.79 to 0.87. All analyses performed were restricted to the set of genes whose expression values, defined as (Intensity-background)/background were
We used the normalized intensities to generate logarithmic expression ratios, as shown in the equation
is the standard deviation estimated from the normalized expression values for gene i in the three control repetitions. Discretization in on and off values was performed for each gene by taking the midpoint of the maximum and minimum of the relative values (in which the control is also included with a value of zero). Only those genes whose values and relative errors did not touch the midline were considered as either on if above, or off if below such midline. Note that in this way, we could discretize values for all four conditions, including the control.
Homogeneity in Strict Regulons
= Np (1 - p) are the expected value and standard deviation, respectively, in the binomial distribution, and p is the global frequency of genes in an off state. For a particular strict regulon, frequencies k/N within this interval were rejected, and those outside of the interval were accepted. Values below µ - are considered off, and values above µ + are considered on.
Prediction of Conformation and State of Regulators The effector prediction was performed automatically using a program implemented in Prolog, which uses as inputs, (1) the set of on and off values of homogeneous strictly coregulated simple regulons, (2) the known conformations obtained from RegulonDB, and (3) the rules from Table 3 under "Regulator Presence".
Consistency Evaluation We generated 1000 arrays with on and off entries selected randomly. Each of these randomized arrays of complex regulons were compared with the original arrays of complex regulons in such a way that we were able to generate the distribution of matches for each condition tested. With this information, we calculated the expected value of consistent entries and their standard deviation.
R.M.G. had been supported by a Ph.D. fellowship from DGEPUNAM. This work has been supported by grant 0028 from Conacyt-México, and by grant GM62205-02 from NIH. We thank Edgar Díaz-Peredo, Julio Freyre, Heladia Salgado, César Bonavides, Delfino García, and Víctor del Moral for their computer support, and Alejandro Garcíarrubio, Jaques vanHelden, Socorro Gama-Castro, Agustino Matínez-Antonio, and Gabriel Moreno-Hagelsieb and for fruitful discussions during the performance of this work. We acknowledge the useful comments of an anonymous referee. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
4 Corresponding author. E-MAIL collado{at}cifn.unam.mx; FAX 52-777-317-5581. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1387003. [Supplemental material is available online at www.genome.org. The data set and supplemental material are available at http://www.cifn.unam.mx/Computational_Genomics/Consistency/.]
Aki, T. and Adhya, S. 1997. Repressor induced site-specific binding of HU for transcriptional regulation. EMBO J. 16: 3666-3674.[CrossRef][Medline]
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares Jr., M., and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97: 262-267. Browning, D.F., Cole, J.A., and Busby, S.J. 2000. Suppression of FNR-dependent transcription activation at the Escherichia coli nir promoter by Fis, IHF and H-NS: Modulation of transcription initiation by a complex nucleo-protein assembly. Mol. Microbiol. 37: 1258-1269.[CrossRef][Medline] Busby, S. and Ebright, R.H. 1997. Transcription activation at class II CAP-dependent promoters. Mol. Microbiol. 23: 853-859.[CrossRef][Medline] Cotter, P.A. and Gunsalus, R.P. 1992. Contribution of the fnr and arcA gene products in coordinate regulation of cytochrome o and d oxidase (cyoABCDE and cydAB) genes in Escherichia coli. FEMS Microbiol. Lett. 70: 31-36.[Medline] Covert, M.W., Schilling, C.H., Famili, I., Edwards, J.S., Goryanin, I.I., Selkov, E., and Palsson, B.O. 2001. Metabolic modeling of microbial strains in silico. Trends Biochem. Sci. 26: 179-186.[CrossRef][Medline]
Darwin, A.J., Ziegelhoffer, E.C., Kiley, P.J., and Stewart, V. 1998. Fnr, NarP, and NarL regulation of Escherichia coli K-12 napF (periplasmic nitrate reductase) operon transcription in vitro. J. Bacteriol. 180: 4192-4198. Dombrecht, B., Marchal, K., Vanderleyden, J., and Michiels, J. 2002. Prediction and overview of the RpoN-regulon in closely related species of the Rhizobiales. Genome Biol. 3: RESEARCH0076.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95: 14863-14868. Goosen, N. and van de Putte, P. 1995. The regulation of transcription initiation by integration host factor. Mol. Microbiol. 16: 1-7.[CrossRef][Medline]
Halfon, M.S., Grad, Y., Church, G.M., and Michelson, A.M. 2002. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12: 1019-1028. Ishihama, A. 2000. Functional modulation of Escherichia coli RNA polymerase. Annu. Rev. Microbiol. 54: 499-518.[CrossRef][Medline] Jacobson, B.A. and Fuchs, J.A. 1998. Multiple cis-acting sites positively regulate Escherichia coli nrd expression. Mol. Microbiol. 28: 1315-1322.[CrossRef][Medline] Kauffman, S. 1974. The large scale structure and dynamics of gene control circuits: An ensemble approach. J. Theor. Biol. 44: 167-190.[CrossRef][Medline]
Lamark, T., Rokenes, T.P., McDougall, J., and Strom, A.R. 1996. The complex bet promoters of Escherichia coli: Regulation by oxygen (ArcA), choline (BetI), and osmotic stress. J. Bacteriol. 178: 1655-1662.
Lawley, B. and Pittard, A.J. 1994. Regulation of aroL expression by TyrR protein and Trp repressor in Escherichia coli K-12. J. Bacteriol. 176: 6921-6930. Lee, D.H., Huo, L., and Schleif, R. 1992. Repression of the araBAD promoter from araO1. J. Mol. Biol. 224: 335-341.[CrossRef][Medline]
Lee, N.L., Gielow, W.O., and Wallace, R.G. 1981. Mechanism of araC autoregulation and the domains of two overlapping promoters, Pc and PBAD, in the L-arabinose regulatory region of Escherichia coli. Proc. Natl. Acad. Sci. 78: 752-756. Lewis, M., Chang, G., Horton, N.C., Kercher, M.A., Pace, H.C., Schumacher, M.A., Brennan, R.G., and Lu, P. 1996. Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271: 1247-1254.[Abstract] Maas, W.K. and Clark, A.J. 1964. Studies on the mechanism of repression of arginine biosynthesis in E. coli.II. Dominance of repressibility in diploids. J. Mol. Biol. 8: 365-370.[Medline] Martínez-Antonio, A. and Collado-Vides, J. 2003. Identifying global regulators in transcriptional regulatory networks in bacteria. Curr. Opin. Microbiol. 6: 1-8. Matthews, K.S. 1996. The whole lactose repressor. Science 271: 1245-1246.[Medline]
Neidhardt, F.C., Bloch, P.L., and Smith, D.F. 1974. Culture medium for enterobacteria. J. Bacteriol. 119: 736-747. Neidhardt, F.S. and Savageau, M.A. 1996. Regulation beyond the operon. In Escherichia coli and Salmonella: Cellular and molecular biology, 2nd ed. (eds. F. Neidhardt et al.), vol. 2, pp. 1310-1324. ASM Press, Washington, DC. Oh, M.K. and Liao, J.C. 2000. Gene expression profiling by DNA microarrays and metabolic fluxes in Escherichia coli. Biotechnol. Prog. 16: 278-286.[CrossRef][Medline] Oosawa, C. and Savageau, M.A. 2002. Effects of alternative connectivity on behavior of randomly constructed Boolean networks. Physica D: Nonlinear Phenomena 170: 143-161.[CrossRef] Palsson, S. 2001. The effects of deleterious mutations in cyclically parthenogenetic organisms. J. Theor. Biol. 208: 201-214.[CrossRef][Medline]
Perez-Rueda, E. and Collado-Vides, J. 2000. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res. 28: 1838-1847. Pilpel, Y., Sudarsanam, P., and Church, G.M. 2001. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet. 29: 153-159.[CrossRef][Medline]
Rasmussen, P.B., Holst, B., and Valentin-Hansen, P. 1996. Dual-function regulators: The cAMP receptor protein and the CytR regulator can act either to repress or to activate transcription depending on the context. Proc. Natl. Acad. Sci. 93: 10151-10155.
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., and Barabasi, A.L. 2002. Hierarchical organization of modularity in metabolic networks. Science 297: 1551-1555. Richet, E., Vidal-Ingigliardi, D., and Raibaud, O. 1991. A new mechanism for coactivation of transcription initiation: Repositioning of an activator triggered by the binding of a second activator. Cell 66: 1185-1195.[CrossRef][Medline]
Richmond, C.S., Glasner, J.D., Mau, R., Jin, H., and Blattner, F.R. 1999. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 27: 3821-3835.
Salgado, H., Santos-Zavaleta, A., Gama-Castro, S., Millan-Zarate, D., Diaz-Peredo, E., Sanchez-Solano, F., Perez-Rueda, E., Bonavides-Martinez, C., and Collado-Vides, J. 2001. RegulonDB (version 3.2): Transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res. 29: 72-74. Savageau, M.A. 1998. Rules for the evolution of gene circuits. Pac. Symp. Biocomput. 54-65.
Tao, H., Bausch, C., Richmond, C., Blattner, F.R., and Conway, T. 1999. Functional genomics: Expression analysis of Escherichia coli growing on minimal and rich media. J. Bacteriol. 181: 6425-6440. Thieffry, D., Huerta, A.M., Perez-Rueda, E., and Collado-Vides, J. 1998. From specific gene regulation to genomic networks: A global analysis of transcriptional regulation in Escherichia coli. Bioessays 20: 433-440.[CrossRef][Medline] Thomas, R. and D'Ari, R. 1990. Biological Feedback. CRC Press, Boston, MA. Wade, J.T., Belyaeva, T.A., Hyde, E.I., and Busby, S.J. 2001. A simple mechanism for co-dependence on two activators at an Escherichia coli promoter. EMBO J. 20: 7160-7167.[CrossRef][Medline] Wu, H., Tyson, K.L., Cole, J.A., and Busby, S.J. 1998. Regulation of transcription initiation at the Escherichia coli nir operon promoter: A new mechanism to account for co-dependence on two transcription factors. Mol. Microbiol. 27: 493-505.[CrossRef][Medline]
Received March 28, 2003;
accepted in revised format August 18, 2003.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||