|
|
|
|
Genome Res. 14:201-208, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 A Motif Co-Occurrence Approach for Genome-Wide Prediction of Transcription-Factor-Binding Sites in Escherichia coli1 Harvard University Graduate Biophysics Program, Harvard Medical School, Boston, Massachusetts 02115, USA 2 Harvard Medical School Department of Genetics, Boston, Massachusetts 02115, USA
Various computational approaches have been developed for predicting cis-regulatory DNA elements in prokaryotic genomes. We describe a novel method for predicting transcription-factor-binding sites in Escherichia coli. Our method takes advantage of the principle that transcription factors frequently coregulate gene expression, but without requiring prior knowledge of which groups of genes are coregulated. Using position weight matrices for 49 known transcription factors, we examined spacings between pairs of matrix hits. These pairs were assigned probabilities according to the overrepresentation of their separation distance. The functions of many open reading frames (ORFs) downstream from predicted binding sites are unknown, and may correspond to novel regulon members. For five predictions, knockouts with mutated replacements of the predicted binding sites were created in E. coli MG1655. Quantitative real-time PCR (RT-PCR) indicates that for each of the knockouts, at least one gene immediately downstream exhibits a statistically significant change in mRNA expression. This approach may be useful in analyzing binding sites in a variety of organisms.
Although the pace of genome sequencing has been growing at an exponential pace, much still remains to be understood about how the genes in the various genomes are regulated. Even in Escherichia coli, probably the most well-studied model organism, the complete mechanism of transcriptional regulation of many of its genes is still unknown, despite the fact that the E. coli genome contains only
The presence of multiple copies of a given transcription factor's binding site motif can be used to predict candidate target genes. For example, a search of the Drosophila melanogaster genome for three or more optimal binding sites within a span of 400 bp for the transcription factor Dorsal resulted in the identification of two additional functional regulatory regions containing at least three Dorsal binding sites (Markstein et al. 2002
The TRANSCompel database on composite regulatory elements in eukaryotic genes provides information on composite elements, containing two closely situated binding sites for distinct transcription factors, within a particular gene and experimental results confirming cooperative action between the transcription factors (Kel-Margoulis et al. 2002b
There are several different algorithms presently in use for finding sequence motifs shared by sets of genes (Bailey and Elkan 1995 Our hypothesis is that many high-scoring false positives can be filtered out by an additional criterion: the condition that most true binding sites co-occur with a second binding site, either for the same transcription factor or a different one. The basis for this assumption is twofold: (1) a transcription factor that regulates a particular ORF often has multiple binding sites in the upstream region either for purposes of binding or for simply increasing the local concentration of that particular transcription factor; and (2) ORFs tend to be coregulated by two or more transcription factors.
There are several different algorithms presently in use for finding pairs of sequence motifs shared by sets of genes. One approach combines a search algorithm for transcription-factor-binding sites with a distance correlation function (Quandt et al. 1996 Likewise, there are several different ways that predicted transcription-factor-binding sites can be tested. One way is to knockout the predicted transcription factor and see if the mRNA levels of the ORF(s) physically downstream from the predicted binding site(s) are either up- or down-regulated. However, several secondary effects may make interpretation of such data difficult (Lee et al. 2002b). A better way to test the predicted binding sites is to mutate the predicted site itself, so that no other genes regulated directly by the transcription factor in question are immediately expected to be up- or down-regulated. Any other genes whose expression is up- or down-regulated in these mutants are then secondary effects caused by a perturbation in the expression of the gene(s) downstream from the predicted binding site.
In this paper, we describe a new approach we have developed to predict sets of two or more transcription-factor-binding sites that coregulate the downstream genes in the E. coli genome. We used a database of E. coli binding site weight matrices (Robison et al. 1998 Pairs of candidate binding sites were then assigned probability scores, according to how overrepresented the spacing between the predicted binding sites is. Binding site substitutions were created such that MG1655 genomic DNA contained mutant versions of the predicted binding sites, rather than mere deletions of the predicted binding sites (see Fig. 1 for a summary of the binding site knockouts). Quantitative real-time PCR analysis indicates that at least one of the genes immediately downstream from each of the binding site knockouts exhibits a significant change in mRNA expression.
Binding Site Predictions All instances of biochemically footprinted DNA-binding sites for 55 different E. coli DNA-binding proteins in the literature were assembled into a database previously (Robison et al. 1998
For example, it is highly significant that there are eight pairs of ArgR-binding sites separated by 3 bp found in the genome by our search matrices. Seven of these pairs were previously described in the literature; one pair is new. This new pair of ArgR sites lies in the aroPpdhR intergenic region (IGR; see Fig. 1). A complete listing of our results and predictions is found at www.genome.org and http://arep.med.harvard.edu/ecoli_matrices/spacing/spacing_predictions.html
Binding Site Knockouts The knockouts created are not mere deletions of the predicted transcription-factor-binding sites, but, rather, they are substitutions of the most information-rich bases in the motif with those found with least frequency in a given transcription factor's footprinted binding sites (see Fig. 2). Furthermore, the replacement sequences were verified to ensure that they neither destroyed overlapping sites for other known transcription factors, nor created new potential sites.
Quantitative Real-Time PCR Assays The results of triplicate quantitative real-time PCR assays are shown in Table 2. These data indicate that at least one of the genes immediately downstream from each of the five binding site knockouts exhibits a significant change in mRNA expression. This indicates that the predicted binding sites, which were mutated in the binding site knockout strains, are most likely real and involved in regulation of these downstream genes.
Negative-control quantitative real-time PCR assays consisted of mispairings between binding site knockout RNAs and primer/probe pairs (i.e., quantitative real-time PCR assays were performed on genes assayed in this project, but not downstream from the binding site knockouts in the particular assayed RNA). Out of 10 such mispairings, eight resulted in essentially no change (see Supplemental material available online at www.genome.org). One negative control that resulted in a significant change was pdhR in the strain containing MetJ-binding site knockouts in the ybdHybdL IGR. YbdH is a hypothetical oxidoreductase, and YbdL is a hypothetical aminotransferase; it is possible that misregulation of ybdH and/or ybdL might impact glycolysis and the TCA cycle, causing a change in expression of pdhR. The other negative control that resulted in a change was aroP in the strain containing PhoB binding site knockouts in the yqeFyqeG IGR. The functions of these proteins are unknown; it is possible that misregulation of yqeF and/or yqeG might cause a change in expression of aroP.
Transcription Factor Knockouts
Primer Extension Assays In primer extension experiments using the ArgR-binding sites knockout, a 1.2-fold derepression of pdhR expression was observed (see Fig. 3). This is consistent with the 2.4-fold derepression of pdhR observed by quantitative real-time PCR. It is also consistent with the 1.5-fold pdhR derepression observed in a primer extension assay of an argR transcription factor knockout (data not shown).
Affymetrix Oligonucleotide Array mRNA Expression Analysis
Many of the binding site knockouts are upstream of uncharacterized genes or operons. Some of these URFs have important homologies for connecting them to regulons via the predicted transcription-factor-binding sites. For example, ytfQ, which is downstream from predicted GalR- and CRP-binding sites, shows significant homology in a BLAST search to a number of D-ribose-binding periplasmic proteins (E-value = 2 x 10-29) and ribose ABC transporters from a few different prokaryotes, including homology to D-galactose-binding periplasmic proteins (E-value = 5 x 10-5). Its highest scoring hit is for homology to a bifunctional carbohydrate binding and transport protein (E-value = 2 x 10-30). These homologies provide further support that this URF might be regulated by transcription factors known to regulate galactose metabolism. In addition, the results of such experiments can indicate interconnections between various regulons. For example, our data indicate that the two ArgR sites, separated by 3 bp, predicted upstream of the pdhRaceEFlpd operon are functional. ArgR is presently only known to regulate genes specifically involved in arginine biosynthesis, whereas PdhR is the repressor of the pyruvate dehydrogenase complex. However, the product of the pyruvate dehydrogenase complex, acetyl-CoA, is needed in the first step of arginine biosynthesis. This biochemical pathway information further supports our finding that the ArgR-binding sites upstream of the pdhRaceEFlpd operon are functional, and thus that the ArgR regulon is interconnected with the PdhR regulon. Furthermore, once a predicted site has been demonstrated to be functional, that site can then be added to the set of sequences used to generate a binding site weight matrix for the given transcription factor. That refined weight matrix can then be used in a new search of the E. coli genome to identify a refined list of predicted binding sites. This set of genes can then be analyzed to determine how the genes are involved in a regulon and thus to further characterize the functions of these genes. Moreover, if the gene(s) physically downstream from the predicted binding site(s) have not yet been functionally characterized, then the functions of the genes affected in a secondary manner should aid in assigning the URF to a regulon. A critical point in these experiments is selection of the proper culture conditions to permit analysis of the predicted binding site. The culture conditions used must be those that will induce expression of the transcription factor in the wild-type cells. Otherwise, if the transcription factor is not expressed, then none of the transcription factor's binding sites will be bound, and comparing wild-type versus the knockout will not provide data on the predicted binding site.
A particularly interesting finding is that despite the fact that the predicted binding sites we examined were all in divergent promoters, the immediately downstream genes in either direction were not affected equally by the binding site mutations. For example, mutating the three PhoB sites in the yqeFyqeG IGR resulted in 5.0-fold down-regulation of yqeF, and 1.4-fold up-regulation of yqeG. Similarly, mutating the four MetJ sites in the ybdHybdL IGR resulted in 1.4-fold up-regulation of ybdH and 3.6-fold up-regulation of ybdL. These different changes in gene expression could not be explained by the distance between the predicted sites and the affected gene; that is, considering these two pairs of divergently transcribed genes, the genes closer to the predicted sites do not as a rule exhibit a stronger up- or downregulation as a result of the binding site knockouts as compared with the genes that are farther away from the predicted sites. It is unclear what may be the mechanism of this differential regulation at divergent promoters. In constructing the binding site mutations, care was taken neither to disrupt overlapping transcription factor binding sites, nor to create new sites for the 55 E. coli transcription factors for which weight matrices have been published. Nevertheless, it is possible that some as-yet-unknown binding site, whose binding factor functions in a directional manner, or some other kind of DNA sequence element that functions in an orientation-dependent manner, was either disrupted or created in the binding site knockouts. For example, a sequence-dependent bend upstream of the rRNA promoter P1 in E. coli is responsible for high promoter activity, and both the distance and angular orientation of the bent DNA is crucial for the degree of activation (Zacharias et al. 1991
Initial site clustering approaches that simply consider a certain number of sites within a given genomic sequence window size recently have produced some initial successes in predicting DNA regulatory elements in eukaryotic genomes (Wagner 1999
Binding Site Prediction Only those matrix hits that occurred within noncoding regions were analyzed because most experimentally confirmed binding sites for transcription factors occur in noncoding regions (of course, this could be at least in part caused by a bias in where people traditionally have looked for transcription-factor-binding sites). We used no size restrictions on noncoding regions; any nucleotide that does not code for protein is called noncoding in our analysis. All matrix hits scoring above two standard deviations below the mean of the scores of the known footprinted (input) sites (Robison et al. 1998
The rankings were based on the probability of obtaining the observed number of hits for the most overrepresented bin or spacing, given the number expected by chance for that particular bin or spacing. This number expected by chance was determined in the following manner:
(x) is the probability that two randomly chosen noncoding base pairs are separated by a distance x. (x) was computed by tabulating the actual frequencies of occurrence of separations between all pairwise combinations of noncoding bases in E. coli. (x) is a decreasing function of x (McGuire 2000
The probability P(x) of obtaining at least the observed number of pairs, obs(x), at each spacing x between 0 and 500 bp, given Na · Nb trials, where the probability of observing a pair at this spacing in a single trial is
By checking 500 different spacings, multiple hypotheses are being tested. To obtain a more reliable probability value, the probability of observing obs(x) sites at any single spacing within a spacing range that includes x (i.e., 0 to x bp), P(x) was summed over this range of x values. All pairs of search matrices that have a spacing x for which this adjusted value for P(x) is <0.05 were saved.
Similarly, the probabilities of obtaining the observed number of hits within the eight different spacing bins was calculated:
In the case in which the two search matrices are identical, the number of hits expected by chance should be determined in the following manner:
The equations for P(x) and Pbin can be modified accordingly. The matrix pairs were ranked according to their values for Pbin for each of the spacing bins, and all those that had values for Pbin(xmin···xmax) <0.05 were saved. In our calculation of probabilities, we assumed the presence of two independent sites. This assumption is not valid in the case of overlapping sites. However, we found it useful to calculate a "significance index" for the overlapped data in the same way as we calculated the probabilities above. These values are not comparable with the probabilities listed above because of the nonindependence of the two overlapped sites, but we found this index to be useful for comparisons within our analysis of the overlapped data.
Because most of the search matrices, even the more nonspecific ones such as DnaA, Hns, Ihf, Lrp, OmpR, Fis, NarL, TyrR, and RpoS, are biased in their distribution within the noncoding regions, the false positives can be expected likewise to be distributed nonrandomly. This nonrandomness is caused by variation in AT content for different noncoding regions in E. coli. A sharp dip in AT content at
Strains
Binding Site and Transcription Factor Knockouts
The binding site substitutions were created by modifying the current pKOV knockout scheme (Link et al. 1997 Chloramphenicol-sensitive colonies were tested by PCR. For the transcription factor knockouts, the Co and No primers were used in PCR. Because these primers flanked the gene, the size of the PCR product indicated whether the template was from a wild-type or deletion strain. For the binding site knockouts, analytical PCRs were performed using the Co primer in conjunction with a primer representing either the wild-type or mutant binding site, in two separate PCR reactions. These primers were designed to be complementary to either the wild-type binding site sequence or the mutant binding site sequence. See Supplemental material for a listing of these primer sequences. Binding site knockouts were verified by sequencing. See Supplemental material for a listing of the sequencing primers.
PCR
Media and Culture Conditions
Duplicate wild-type and transcription factor knockout strains were grown under the conditions listed below. The argR knockout strains were grown at 37°C in M9 minimal medium with 11 mM (0.2%) glucose, 0.5% casamino acids, 1 mM arginine (Charlier et al. 1992
RNA Isolation and Purification
Primer Extension Analysis
Real-Time RT-PCR Primers and Fluorogenic Probes
Real-Time RT-PCR Amplification
mRNA Expression Analysis Using Affymetrix Oligonucleotide Arrays
We thank Dereth Phillips, Xiaohua Huang, Vasudeo Badarinarayana, Aimee Dudley, Doug Selinger, Martin Steffen, Rey Sequerra, and Jennifer C. Lee for technical assistance. We also thank Dereth Phillips, Jason Hughes, Pete Estep, Tzachi Pilpel, and other members of the Church Lab for helpful discussion. M.L.B. was partially supported by an NSF Graduate Fellowship. A.M.M. was a Howard Hughes Predoctoral Fellow. This work was supported in part by a grant from the Office of Naval Research (N00014-99-1-0783). The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1448004.
3 These authors contributed equally to this work.
4 Present address: Division of Genetics, Brighamand Women's Hospital and Harvard Medical School, Harvard Medical School New Research Building, Boston, MA 02115, USA.
5 Corresponding author. [Supplemental material including detailed methods is available online at www.genome.org and http://arep.med. harvard.edu/ecoli_matrices/spacing/spacing_predictions.html.]
Affymetrix, Inc. 2002. Affymetrix GeneChip Expression Analysis Technical Manual. Affymetrix, Inc., Santa Clara, CA. Bailey, T. and Elkan, C. 1995. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3: 21-29.[Medline]
Benos, P., Bulyk, M., and Stormo, G. 2002. Additivity in proteinDNA interactions: How good an approximation is it? Nucleic Acids Res. 30: 4442-4451. Berg, O. and von Hippel, P. 1987. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193: 723-750.[CrossRef][Medline]
Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., and Eisen, M.B. 2002. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. 99: 757-762.
Blattner, F., Plunkett III, G., Bloch, C., Perna, N., Burland, V., Riley, M., Collado-Vides, J., Glasner, C., Rode, G., Mayhew, J., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1474. Browning, D., Beatty, C., Wolfe, A., Cole, J., and Busby, S. 2002. Independent regulation of the divergent Escherichia coli nrfA and acsP1 promoters by a nucleoprotein assembly at a shared regulatory region. Mol. Microbiol. 43: 687-701.[CrossRef][Medline]
Bulyk, M., Johnson, P., and Church, G. 2002. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 30: 1255-1261.
Bussemaker, H., Li, H., and Siggia, E. 2000. Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. 97: 10096-10100. Charlier, D., Roovers, M., Vliet, F.V., Boyen, A., Cumin, R., Nakamura, Y., Glansdorff, N., and Pierard, A. 1992. Arginine regulon of Escherichia coli K-12: A study of repressoroperator interactions and of in vitro binding affinities versus in vivo repression. J. Mol. Biol. 226: 367-386.[CrossRef][Medline] Frech, K. and Werner, T. 1997. Specific modelling of regulatory units in DNA sequences. Pac. Symp. Biocomput. 151-162.
Frith, M., Hansen, U., and Weng, Z. 2001. Detection of cis-element clusters in higher eukaryotic DNA. Bioinformatics 17: 878-889.
Frith, M., Spouge, J., Hansen, U., and Weng, Z. 2002. Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res. 30: 3214-3224. Grundy, W., Bailey, T., and Elkan, C. 1996. ParaMEME: A parallel implementation and a web interface for a DNA and protein motif discovery tool. CABIOS 12: 303-310.
GuhaThakurta, D. and Stormo, G. 2001. Identifying target sites for cooperatively binding factors. Bioinformatics 17: 608-621.
Halfon, M., Grad, Y., Church, G., and Michelson, A. 2002. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12: 1019-1028.
Heid, C., Stevens, J., Livak, K., and Williams, P. 1996. Real time quantitative PCR. Genome Res. 6: 986-994. Hengge-Aronis, R. 1999. Interplay of global regulators and cell physiology in the general stress response of Escherichia coli. Curr. Opin. Microbiol. 2: 148-152.[CrossRef][Medline]
Hertz, G. and Stormo, G. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563-577. Hughes, J., Estep, P., Tavazoie, S., and Church, G. 2000. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296: 1205-1214.[CrossRef][Medline] Kel-Margoulis, O., Ivanova, T., Wingender, E., and Kel, A. 2002a. Automatic annotation of genomic regulatory sequences by searching for composite clusters. Pac. Symp. Biocomput. 187-198.
Kel-Margoulis, O., Kel, A., Reuter, I., Deineko, I., and Wingender, E. 2002b. TRANSCompel: A database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res. 30: 332-334.
Klingenhoff, A., Frech, K., Quandt, K., and Werner, T. 1999. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 15: 180-186.
Kolchanov, N., Ignatieva, E., Ananko, E., Podkolodnaya, O., Stepanenko, I., Merkulova, T., Pozdnyakov, M., Podkolodny, N., Naumochkin, A., and Romashchenko, A. 2002. Transcription Regulatory Regions Database (TRRD): Its status in 2002. Nucleic Acids Res. 30: 312-317.
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. 1993. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262: 208-214. Lee, M.-L., Bulyk, M., Whitmore, G., and Church, G. 2002. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays. Biometrics 58: 981-988.[CrossRef][Medline]
Lee, T., Rinaldi, N., Robert, F., Odom, D., Bar-Joseph, Z., Gerber, G., Hannett, N., Harbison, C., Thompson, C., Simon, I., et al. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298: 799-804.
Li, H., Rhodius, V., Gross, C., and Siggia, E. 2002. Identification of the binding sites of regulatory proteins in bacterial genomes. Proc. Natl. Acad. Sci. 99: 11772-11777.
Link, A., Phillips, D., and Church, G. 1997. Methods for generating precise deletions and insertions in the genome of wild-type Escherichia coli: Application to open reading frame characterization. J. Bacteriol. 179: 6228-6237. Liu, X., Brutlag, D., and Liu, J. 2001. BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 127-138. . 2002. An algorithm for finding proteinDNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol. 20: 835-839.[Medline]
Man, T.K. and Stormo, G.D. 2001. Non-independence of Mnt repressoroperator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 29: 2471-2478.
Mangan, J.A., Sole, K.M., Mitchison, D.A., and Butcher, P.D. 1997. An effective method of RNA extraction from bacteria refractory to disruption, including mycobacteria. Nucleic Acids Res. 25: 675-677.
Markstein, M., Markstein, P., Markstein, V., and Levine, M. 2002. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl. Acad. Sci. 99: 763-768.
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A., Kel-Margoulis, O., et al. 2003. TRANSFAC: Transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31: 374-378. McGuire, A. 2000. "Computational studies of transcriptional regulation in prokaryotes." Ph.D. thesis, Harvard University, Cambridge. Neidhardt, F. 1996. Escherichia coli and Salmonella: Cellular and molecular biology. American Society for Microbiology, Washington, DC.
Neidhardt, F., Bloch, P., and Smith, D. 1974. Culture medium for enterobacteria. J. Bacteriol. 119: 736-747. Phillips, D. 2000. "Competitive growth analysis of E. coli in-frame deletion mutants across a spectrum of environmental conditions." Ph.D. thesis, Harvard University, Cambridge. Pilpel, Y., Sudarsanam, P., and Church, G. 2001. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet. 29: 153-159.[CrossRef][Medline]
Quandt, K., Grote, K., and Werner, T. 1996. GenomeInspector: A new approach to detect correlation patterns of elements on genomic sequences. Comput. Appl. Biosci. 12: 405-413.
Rebeiz, M., Reeves, N., and Posakony, J. 2002. SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Proc. Natl. Acad. Sci. 99: 9888-9893. Robin, S., Daudin, J., Richard, H., Sagot, M., and Schbath, S. 2002. Occurrence probability of structured motifs in random sequences. J. Comput. Biol. 9: 761-773.[CrossRef][Medline] Robison, K., McGuire, A.M., and Church, G.M. 1998. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J. Mol. Biol. 284: 241-254.[CrossRef][Medline]
Rosenblueth, D.A., Thieffry, D., Huerta, A.M., Salgado, H., and Collado-Vides, J. 1996. Syntactic recognition of regulatory regions in Escherichia coli. Comput. Appl. Biosci. 12: 415-422. Roth, F.P., Hughes, J.D., Estep, P.W., and Church, G.M. 1998. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotech. 16: 939-945.[CrossRef][Medline] Sambrook, J., Fritsch, E., and Maniatis, T. 1989. Molecular cloning: A laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
Sudarsanam, P., Pilpel, Y., and Church, G. 2002. Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Res. 12: 1723-1731.
Thieffry, D., Salgado, H., Huerta, A., and Collado-Vides, J. 1998. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12. Bioinformatics 14: 391-400. Tian, G., Lim, D., Oppenheim, J.D., and Maas, W.K. 1994. Explanation for different types of regulation of arginine biosynthesis in Escherichia coli B and Escherichia coli K12 caused by a difference between their arginine repressors. J. Mol. Biol. 235: 221-230.[Medline] van Helden, J., Andre, B., and Collado-Vides, J. 1998. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281: 827-842.[CrossRef][Medline]
van Helden, J., Rios, A., and Collado-Vides, J. 2000. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28: 1808-1818.
Wagner, A. 1999. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15: 776-784. Winer, J., Jung, C., Shackel, I., and Williams, P. 1999. Development and validation of real-time quantitative reverse transcriptase-polymerase chain reaction for monitoring gene expression in cardiac myocytes in vitro. Anal. Biochem. 270: 41-49.[CrossRef][Medline] Workman, C. and Stormo, G. 2000. ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. Pac. Symp. Biocomput. 467-478. Zacharias, M., Theissen, G., Bradaczek, C., and Wagner, R. 1991. Analysis of sequence elements important for the synthesis and control of ribosomal RNA in E. coli. Biochimie 73: 699-712.[Medline]
http://arep.med.harvard.edu/ecoli_matrices/spacing/spacing_predictions.html; Web site contains tab-delimited files containing predictions based on individual spacings, and separately based on spacing bins. http://arep.med.harvard.edu/labgc/pko3.html; Descriptions of the gene replacement vectors pKO3 and pKOV. http://twod.med.harvard.edu/labgc/estep/longPCR_protocol.html; Descriptions of the PCR conditions and protocols used in this project.
Received April 21, 2003; accepted in revised format November 5, 2003. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||