|
|
|
|
Vol. 10, Issue 6, 744-757, June 2000 Conservation of DNA Regulatory Motifs and Discovery of New Motifs in Microbial GenomesGraduate Program in Biophysics, and Department of Genetics, Lipper Center for Computational Genetics, Harvard Medical School, Boston, MA 02115 USA
Regulatory motifs can be found by local multiple alignment of upstream regions from coregulated sets of genes, or regulons. We searched for regulatory motifs using the program AlignACE together with a set of filters that helped us choose the motifs most likely to be biologically relevant in 17 complete microbial genomes. We searched the upstream regions of potentially coregulated genes grouped by three methods: (1) genes that make up functional pathways; (2) genes homologous to regulons from a well-studied species (Escherichia coli); and (3) groups of genes derived from conserved operons. This last group is based on the observation that genes making up homologous regulons in different species are often assorted into coregulated operons in different combinations. This allows partial reconstruction of regulons by looking at operon structure across several species. Unlike other methods for predicting regulons, this method does not depend on the availability of experimental data other than the genome sequence and the locations of genes. New, statistically significant motifs were found in the genome sequence of each organism using each grouping method. The most significant new motif was found upstream of genes in the methane-metabolism functional group in Methanobacterium thermoautotrophicum. We found that at least 27% of the known E. coli DNA-regulatory motifs are conserved in one or more distantly related eubacteria. We also observed significant motifs that differed from the E. coli motif in other organisms upstream of sets of genes homologous to known E. coli regulons, including Crp, LexA, and ArcA in Bacillus subtilis; four anaerobic regulons in Archaeoglobus fulgidus (NarL, NarP, Fnr, and ModE); and the PhoB, PurR, RpoH, and FhlA regulons in other archaebacterial species. We also used motif conservation to aid in finding new motifs by grouping upstream regions from closely related bacteria, thus increasing the number of instances of the motif in the sequence to be aligned. For example, by grouping upstream sequences from three archaebacterial species, we found a conserved motif that may regulate ferrous ion transport that was not found in individual genomes. Discovery of conserved motifs becomes easier as the number of closely related genome sequences increases.
Motif-finding algorithms can be used to discover
alignments of sites, which correspond to
transcriptional regulatory motifs in upstream regions of genes. These
motifs are often binding sites for DNA-binding proteins. Several
different algorithms have been used previously for motif finding,
including Gibbs sampling (Lawrence et al. 1993 AlignACE (Roth et al.1998 Computationally identifying upstream-regulatory motifs in bacterial genomes by AlignACE is complicated by the presence of operons. It is difficult to locate the regulatory region for a gene found within an operon, since the promoter for that operon can lie several genes upstream, and it is difficult to predict which gene is at the head of the operon. In addition, there are fewer instances of most regulatory motifs in a bacterial genome than in the S. cerevisiae genome, as there is usually only one instance of a regulatory motif per operon instead of one instance per gene. It is easier to discover a motif that is found in more copies in the genome. However, one can increase the number of instances of a conserved regulatory motif by pooling together upstream sequence from orthologous genes in closely related organisms, assuming the motif is conserved across these organisms. A similar method was employed recently by Gelfand et al. (2000) Microarray data, as well as at least five additional methods based on
comparative genomics might be used to obtain functionally linked sets
of genes that are good candidates for coregulation. Pellegrini et al.
(1999) By aligning the upstream regions of potentially coregulated sets of genes, we were able to find many known bacterial regulatory motifs, and to predict new regulatory motifs in 17 bacterial genomes. In addition, by pooling together upstream sequences from orthologous genes in closely related organisms, we were able to find conserved motifs with only a few instances in each genome. We also analyzed conservation between organisms of motifs in orthologous sets of upstream regions.
Potentially Coregulated Groups of Genes We searched in each species for orthologs to the E. coli
genes known to be regulated by 55 DNA-binding proteins
(http://arep.med.harvard.edu/ecoli_matrices/; Robison et al.1998 The second method we used to predict coregulated groups of genes is
based on analysis of conserved operons. Functional couplings between
genes can be inferred from conserved spatial proximity of gene pairs on
a chromosome (Dandekar et al. 1998 We also used 68 different functional group categories in each of 17 bacterial species from the KEGG database
(http://www.genome.ad.jp/kegg/). These groups are based on a
compilation of experimental data on metabolic pathways (Ogata et al. 1999 Motif Discovery Strategy We used the AlignACE program (Roth et al. 1998 Running AlignACE on all of these groups of genes in each of 17 organisms, as well as in sets of closely related organisms, results in
104,282 motifs. To select significant motifs that are the most likely
to be functional, regulatory motifs, we calculated several indices for
each matrix: MAP score, site specificity score (Ssite), positional bias, AT content, and
palindromicity. The MAP score is a measure used by the AlignACE program
to judge alignments sampled during the course of the algorithm, based
on the over-representation of the motif in the input sequence (Liu et
al. 1995 Table 1 shows how many motifs score above
various cutoffs in the values for these indices. The
AlignACE output files, as well as the values for these indices
for all of the motifs, are available on our website
(http://arep.med.harvard.edu/microbial_motifs).
Controls By searching the upstream regions of genes making up the known
regulons in E. coli with AlignACE, we can determine what
fraction of the known E. coli DNA-binding motifs can be found,
and how these known motifs score in terms of the parameters that we use to rank motifs. We used the 32 E. coli footprinted regulons in our database with between 5 and 100 known binding sites (Robison et al.
1998
Figure 1 shows the MAP scores and
Ssite values for the positive controls (green
triangles), as well as all of the motifs found by AlignACE with MAP
scores greater than 5.0 (30,252 motifs from all AlignACE runs in three
kinds of gene groupings in 17 organisms). Some of the controls are
plotted more than once because the motif was found multiple times.
Motifs in the upper right hand corner of the plot (nonspecific motifs
with good alignments) tend to be either repetitive elements (i.e.,
E. coli BIME elements) or common elements in the genome such
as Shine-Dalgarno sequences. The fraction of motifs corresponding to
known controls is highest in the upper left corner of the plot.
Motifs with MAP score > 10.0 and Ssite < 10
The most useful parameter for discriminating the known motifs in the E. coli regulon controls from the rest of the motifs found by AlignACE is Ssite (see Fig. 1). Palindromicity is also a useful parameter. Almost half of the known motifs are palindromic, but only ~5% of the 104,282 motifs found in our analysis are palindromic. Thus, selecting for palindromic motifs does increase greatly the fraction of biologically relevant motifs. The MAP score was not as useful as Ssite for distinguishing known motifs in the E. coli regulon controls from the rest of the motifs, since many nonspecific chromosomal features have high MAP scores (i.e., Shine-Dalgarno sequences and repetitive elements). The positional bias statistic was also not useful in distinguishing the E. coli regulon controls, because most known motifs do not have significant values for our positional bias parameter (data not shown). However, there are many other motifs found in this study that do have significant positional bias. Most of these likely correspond to locationally conserved features such as the ribosome binding site (Shine-Dalgarno sequence), which is always located close to the start codon (4-13-bp upstream of ATG). Conservation of Known E. coli Motifs in Other Bacteria In organisms other than E. coli, running AlignACE on upstream regions of orthologs of members of E. coli footprinted regulons allows for study of the conservation of E. coli regulatory motifs, as well as identification of other possible mechanisms for regulating the same cellular process. For each of the 34 E. coli regulons with more than five members, Figure 2 shows in which organisms the E. coli motif is conserved, and in which organisms there is a new and significant motif in the upstream regions of the homologous regulon.
A new motif can indicate either a different mechanism for regulating a
similar cellular process, or divergence of binding site residues in a
conserved DNA-binding protein. In Bacillus subtilis, there is
no Crp protein; instead, CcpA regulates carbon metabolism by a
different mechanism (Henkin 1996 Five new motifs were identified in B. subtilis, and ten were identified in archaebacterial species (Methanococcus janaschii, Pyrococcus horokoshii, Methanobacterium thermoautotrophicum, and Archaeoglobus fulgidus). These motifs (Fig. 2, yellow) are listed in Tables 3a and 3c. In B. subtilis, the CcpA motif is found in the Crp, AraC, and PhoB categories. The motif found in the B. subtilis ArcA category is similar to the E. coli -Crp motif. Therefore this could simply be a variation on the Fnr motif, which is closely related to the Crp motif in E. coli. The Fnr and ArcA regulons overlap significantly. In A. fulgidus, there are new motifs in the categories for NarL, NarP, Fnr, and ModE. In E. coli, all four of these DNA-binding proteins regulate overlapping regulons related to anaerobic metabolism. The motifs from the NarL, NarP, and Fnr categories are very similar to each other (pairwise CompareACE scores < 0.7). However, the motif that is found in the ModE category is different. Thus, we predict that these two new motifs control anaerobic metabolism in A. fulgidus (see Tables 3a and 3c). Combining upstream sequences from orthologous genes in closely related organisms can help in the discovery of conserved motifs with few instances in each genome. A known E. coli motif (MetR) is not found when AlignACE is run on the upstream regions of the members of the MetR regulon in E. coli alone because there are too few instances of the motif in the E. coli genome. However, when E. coli and B. subtilis upstream regions are pooled together, the MetR motif can be found because it is also present in B. subtilis. Eleven additional E. coli motifs can be identified in B. subtilis and/or H. influenzae using this method. Even for the motifs that occur frequently enough to be found in a single genome, pooling together sequence from closely related organisms increases the number of instances of the motif. This improves the values of the MAP and Ssite parameters in the alignments, making the motif easier to identify (see Fig. 1). New Motifs Table 3 shows the most specific motifs, as well as the top palindromic motifs, that result from AlignACE runs. Some of these motifs are known, and some of these correspond to potential new regulatory motifs. Only motifs scoring above stringent cutoffs are presented here; alignments and parameters for all motifs found in this analysis are available (http://arep.med.harvard.edu/microbial_motifs). Specific Motifs Found by Aligning Sequence from Individual Organisms Table 3a displays the most specific motifs found in AlignACE runs on sequence from single organisms. Out of 53,778 motifs, the 41 motifs with Ssite < 1e 25 were clustered according to their
similarity using pairwise CompareACE scores (see Methods), resulting in
18 motif clusters. The member from each cluster with the lowest
Ssite value is shown in the Table 3. Three of these clusters correspond to known E. coli motifs (the LexA, Crp,
and PurR control sets), and one corresponds to a known B. subtilis motif (the T-box). The T-box is a known regulatory motif
found in the functional group made up of aminoacyl-tRNA synthetases in
gram-positive bacteria (Henkin et al. 1992 37) for which we are not aware of any documentation in the
literature is an AT-rich motif found in the methane metabolism
functional group (00680) of the methane-producing archaebacteria
M. thermoautotrophicum. Another very specific member of this
motif cluster (Ssite = 3.5e 31) shows up in the
closely related folate biosynthesis functional group (00790) in M. thermoautotrophicum. Therefore this motif could regulate metabolism
of one-carbon units in this organism. A similar motif also shows up
with a slightly higher Ssite in the glyoxylate and
dicarboxylate metabolism functional group (00630) and the group
corresponding to the orthologs of the RpoH (heat shock) regulon in
M. thermoautotrophicum. The relationship between these two
groups and metabolism of one-carbon units is not clear. This motif
resembles the central AT-rich core (ATATAAAxxTT) of the known archaeal
heat-shock promoter (Thompson and Daniels 1998Specific Motifs Found by Combining Sequence from Closely Related Organisms Of the 50,504 motifs found by aligning sequence from closely related organisms together, there are 18 motifs with Ssite < 1e 25, which fall into seven distinct clusters (Table 3b). Six of these
clusters correspond to known E. coli motifs conserved in
either H. influenzae or B. subtilis (PurR, LexA,
TrpR, Crp, Fur, and ArgR). Many of these conserved motifs show up with
lower values for Ssite and higher MAP scores than
when sequence from only one organism containing the motif is aligned,
because a stronger alignment can be obtained when more instances of the
motif are present. Therefore, more known E. coli motifs are
present in Table 3b than in Table 3a, which contains alignments of
sequence from E. coli only.
The only motif in this list that has not been documented previously is
found in a group derived from conserved operons (group 103) in A. fulgidus, M. thermoautotrophicum, and P. horokoshii. In A. fulgidus, the motif is found upstream of
two genes with homology to ferrous ion transporters; in M. thermoautotrophicum, the motif is found upstream of a ferrous ion
transporter and a gene with homology to the iron repressor; and in
P. horokoshii, the motif is found upstream of a ferrous ion
transporter (see Table 3d). This motif is highly palindromic (consensus
TTAGG-x4-CCTAA). This pattern of two palindromic halfsites separated by
a short linker sequence is common among the binding sites for known
bacterial regulatory DNA-binding proteins. Our prediction is that this
motif is a binding site for a protein regulating iron transport in
these archaebacteria. Since the motif is found upstream of a putative iron repressor in M. thermoautotrophicum, it is possible that this putative repressor (MTH214) is the regulatory protein that binds
to these sites, and that it is autoregulatory.
Top Palindromic Motifs Found by Aligning Sequence from Individual Organisms Palindromicity is a parameter that is strongly correlated with regulatory function. Only about 5% of all of the motifs found by AlignACE in this study were palindromic, whereas almost half of all of the known motifs are palindromic. By considering palindromic motifs separately, we can increase the Ssite cutoff and retain a high rate of true positives (see Table 1). A selection of these motifs is shown in Table 3c. All 101 palindromic motifs with MAP >10, Ssite 1e 10, and AT-content <80%
were clustered into 30 clusters. One representative from each cluster
is shown here. If we do not impose the AT-content cutoff, there are
over twice as many motifs present (220 motifs). Ten of these 30 clusters correspond to known E. coli or H. influenzae
motifs, and two correspond to known B. subtilis motifs. The
known E. coli motifs scoring above this cutoff are Crp, LexA,
ArgR, Fnr, TyrR, FruR, TrpR, and GalR. The known B. subtilis
motifs are the Cheo motif (the B. subtilis SOS box), and the
CcpA motif. In B. subtilis there is also a variant of the Fnr
motif that resembles the E. coli Crp motif. ArgR and PurR are
also found above this cutoff in groups derived from conserved
operons (groups 054 and 001, which contain parts of the purine and
arginine biosynthesis pathways, respectively). Of the 22,134 motifs
found by running AlignACE on the upstream regions of all 343 gene
groups predicted from conserved operons, the ArgR and PurR motifs in
E. coli are among the most specific palindromic motifs.
Eighteen of the 30 motif clusters in Table 3c show no similarity to
known motifs. Thirty percent of these motifs are also AT rich
(70%-80% AT content). Three of these motifs are found in groups
derived from conserved operons. These are group 156 (Hyp operon genes)
in P. horokoshii, group 255 (heat shock genes) in M. genitalium, and group 033 (pyruvate synthase genes) in M. janaschii. The presence of regulatory motifs upstream of the genes making up these groups lends additional support to the hypothesis that
these are indeed functionally coupled groups of genes. Since this
method for predicting regulons is based purely on the chromosomal gene
order across genomes, the high-scoring motifs found in these groups are
not biased toward well-studied organisms. In contrast, the other kinds
of groups of potentially coregulated genes that we used in this study
(groups from metabolic pathways and groups based on footprinted E. coli regulons) both originate from experimental information
determined in the well-studied organisms, so the high-scoring motifs
from these two methods are largely from E. coli and B. subtilis.
The motif found upstream of group 255 (heat shock genes) in M. genitalium does not resemble the CIRCE (Controlling Inverted Repeat
of Chaperone Expression) motif, which is known to regulate several heat
shock-related genes in a wide variety of organisms, including M. genitalium, through the binding of the repressor HrcA (Naberhaus
1999 7).
Top palindromic motifs found by combining sequence from closely related organisms Using this same cutoff (Ssite < 1e 10, MAP >10, AT content < 80%) on the motifs obtained from aligning
upstream sequence from orthologous genes in two or more closely related
organisms together, we obtain 65 motifs that reduce to 19 clusters
(Table 3d). Of these 19 clusters, 12 correspond to known E. coli
motifs conserved in either H. influenzae or B. subtilis (PurR, ArgR, LexA, TrpR, Crp, GalR, Fur, TyrR, ModE, HipB,
ArcA, and Fnr). Again, these conserved motifs show up with lower
Ssite and higher MAP scores when they are aligned in
multiple organisms containing the same motif because there are more
instances of the motif to align.
Of the seven clusters corresponding to new motifs, one is the ferrous
ion-transport motif in A. fulgidus, M. thermoautotrophicum, and P. horokoshii described above.
Four of the remaining six clusters correspond to conserved motifs found
in groups derived from conserved operons. These are group 190 (ribonucleoside diphosphate) in E. coli and B. subtilis, group 199 (transcription) in M. janaschii and
P. horokoshii, group 073 (fatty acid biosynthesis) in E. coli and H. influenzae, and group 009 (nucleotide
biosynthesis) in E. coli and B. subtilis.
We found many significant new motifs in 17 bacterial genomes. Known regulons in E. coli were used as controls to calibrate the significance of these motifs. More motifs were found in larger genomes with more complex regulation. Some of the highest-scoring new motifs are found in archaebacteria, for which there is relatively little experimental data because of difficulties in performing experiments in these organisms. New, significant motifs include two motifs potentially regulating anaerobic metabolism in A. fulgidus, a motif potentially regulating methane metabolism in M. thermoautotrophicum, and a palindromic motif regulating ferrous ion transport that is conserved in M. thermoautotrophicum, A. fulgidus, and P. horokoshii. We also identified a number of motifs that are conserved in several eubacterial species. At least 22% of the known E. coli DNA regulatory motifs are conserved in H. influenzae. In the more distantly related organism B. subtilis, at least 15% of the known E. coli motifs are conserved. In even more distantly related organisms, motifs can differ considerably. We found cases in which there are different but significant motifs upstream of genes homologous to known E. coli regulons, including Crp, LexA, and ArcA in B. subtilis; four anaerobic regulons in A. fulgidus (NarL, NarP, Fnr, and ModE); and PhoB, PurR, RpoH, and FhlA in other archaebacterial species. This can indicate that the organisms have evolved different methods for regulating the same cellular processes, or it can indicate parallel mutations in the DNA-binding protein and the motif that it recognizes. The three methods of predicting regulons that we use here are based on different sources of biological information. The gene groups obtained from homologs to members of known E. coli regulons and the groups based on functional pathways in each organism are based on the prior body of knowledge from biological experiments. Therefore, many known motifs are found in these groups. In contrast, the groups derived from conserved operons in other organisms are not based on any biological information other than the positions of genes within the genomic sequence. Therefore, these groups are less biased towards motifs that were already known. However, some known motifs are found in these groups as some of the known regulons can be reconstructed in this manner (i.e., ArgR and PurR). By our criteria for ranking motifs, the PurR and ArgR motifs are two of the top 10 motifs to come out of the AlignACE runs on the groups derived from conserved operons, which lends credibility to the use of this method for predicting regulons and their regulatory motifs. The usefulness of motif finding upstream of potential regulons depends
strongly on how good the regulon predictions are. The groups of genes
that were constructed based on conserved operons in other organisms
were limited in several ways. The first limitation is that only top
reciprocal FASTA hits were considered orthologs (Overbeek et al. 1999 These methods will continue to become more powerful as the amount of
available sequence data increases. Overbeek et al. (1998) The three methods that we use separately for predicting coregulated
sets of genes can be combined to obtain larger and more complete
groups. Groups obtained by these three methods can also be combined
with groups of genes experimentally observed to have similar expression
profiles, as well as groups obtained by methods such as that of
Pellegrini et al. (1999) The biological significance of some of the motifs presented here should be verified experimentally, including determination of factors binding to these motifs. Predictions for which DNA-binding protein might be interacting with the motif can be obtained by computational methods, such as finding which predicted DNA-binding proteins have the motif in their upstream region (assuming autoregulation), and searching for a member of a known DNA-binding protein family that is linked to the regulon via a conserved operon in another organism. The regulon prediction and motif discovery methods described here should be an increasingly powerful addition to the current array of tools used to elucidate connectivities in bacterial regulatory networks.
Organisms AlignACE runs were performed on upstream regions from the following 17 bacterial organisms (abbreviations used in the tables and figures are given in parentheses): A. fulgidus (AG), Aquifex aeolicus (AA), Borrelia burgdorferei (BB), B. subtilis (BS), Chlamydia trachomatis (CT), E. coli K12 (EC), H. influenzae (HI), Helicobacter pylori (HP), Mycoplasma genitalium (MG), M. janaschii (MJ), Mycoplasma pneumoniae (MP), M. thermoautotrophicum (TH), M. tuberculosis (MT), P. horokoshii (PH), Ricksettia prowazekii (RP), Synechocystis sp. (CY), and Trepenoma pallidum (TP). For AlignACE runs on groups of genes derived from conserved operons, sequence from the following 12 groupings of closely related organisms were pooled together: EC and HI; EC and BS; EC, HI, and BS; BS and MT; EC and MT; AG and TH; AG, TH, and MJ; AG, TH, MJi, and PH; MJ and PH; MG and MP; CY and CT; and TP and BB. For AlignACE runs on upstream regions of groups of genes derived from E. coli-footprinted regulons, 16 groups are constructed by pooling E. coli sequence together with sequence from each organism separately, including distantly related organisms. Two additional groups were also used: EC, HI, and BS; and BS and MT. Identification of Orthologs In each organism, we searched for orthologs to the members of the footprinted-E. coli regulons (http://arep.med.harvard.edu/ecoli_matrices), as well as for orthologs to the E. coli DNA-binding proteins controlling these regulons. To identify potential orthologs, we performed reciprocal BLAST searches between E. coli and each of 16 other completely sequenced organisms. To not discount closely related paralogs from our analysis, we allowed up to five potential orthologs for each gene to be included. It is desirable to include upstream regions from several potential orthologs in the alignments because AlignACE can tolerate some superfluous sequence, and we want to make sure that the correct ortholog is included; however, if too much extra sequence is added, the real motif will not be found. To find potential orthologs in genome Gb for a gene xa in genome Ga, we performed a BLAST search over all genes in genome Gb using xa as a query. We identified the raw BLAST score of the top hit in genome Gb, and selected up to five hits in genome Gb with raw score >70% of this value. For each of these genes xbi in genome Gb, we performed a BLAST search over all genes in genome Ga. If the original gene xa turned up with a raw score >70% of the value of the top BLAST hit in genome Ga, then xa and xbi are potential orthologs. Identification of Upstream Regulatory Regions If a gene lies within an operon, its promoter and regulatory region could lie several genes upstream. It is difficult to predict the first gene in an operon, especially in less well-studied organisms. To ensure that the sequence we align contains the regulatory intergenic region, we must include several possible intergenic regions. However, if we add too many extra intergenic sequences, the regulatory motif will not be found by AlignACE because there will be too much noise. Our definition of an operon is two or more tandem genes separated by less than a certain cutoff distance. We used two different operon cutoff distances (100 and 300 bp). For each operon defined in this manner, we recorded the entire sequence of all of the intergenic segments of length greater than 20 bp between the gene of interest and the operon head, as well as 300 bp upstream of the operon head (see Fig. 3). We ran AlignACE twice for each group of potentially coregulated genes: once using a loose distance cutoff in the operon definition (300 bp) to ensure inclusion of the correct upstream region, and once using a more conservative distance cutoff (100 bp) to reduce inclusion of extraneous intergenic regions.
To increase the signal to noise ratio, we took no more than 300 bp of
sequence upstream of the operon head because the overwhelming majority
of the binding sites for DNA-binding proteins in bacteria are found
within the first 300 bp upstream of the start codon. In cases where
there is a site further upstream, there is usually also a site close to
the promoter (Gralla and Collado-Vides 1996 AlignACE Runs The AlignACE (Roth et al. 1998 Each motif found in this analysis was compared to the 55 footprinted
E. coli motifs (Robison et al. 1998 The CompareACE program was also used to compare and cluster new motifs
coming out of the analysis. A matrix of all pairwise CompareACE scores
was calculated, and then motifs were clustered using a simple joining
algorithm (Hartigan 1975 Parameters Used in Motif Analysis Site Specificity Score (Ssite) Ssite is a measure of how specific a motif is for the sequence in which it was aligned. This statistic is similar to the specificity score described by Hughes et al. (2000)
AT Content Many of the motifs that were found by AlignACE, including motifs with low values of Ssite, are AT rich (>90% AT content). However, no known matrices for E. coli DNA-binding proteins have AT content greater than 80% (Robison et al. 1998Palindromicity To select palindromic motifs for further analysis, we used the CompareACE program to compare a motif with its reverse complement. We used the same CompareACE cutoff score for comparing motifs to one another (0.7).Additional Cutoffs used in Selecting Interesting Motifs To exclude repetitive elements from our analysis, we excluded motifs in which more than half of the aligned sites came from a single upstream region. In the AlignACE runs combining sequence from several closely related organisms, we only looked at those motifs where < 70% of the aligned instances of the motif came from a single organism, in order to limit our analysis to conserved motifs.
We thank Jason Johnson, John Aach, Jong Park, Martha Bulyk, and Ann Nichols for help and discussions. A.M.M. is a Howard Hughes Medical Institute predoctoral fellow. This work was supported by the office of Naval Research (grant N00014-97-1-0865), the Department of Energy (grant DE-FG02-87-ER60565), and a grant from Hoechst Marion Roussel. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
1 Corresponding author.
E-MAIL church{at}rascal.med.harvard.ed; FAX (617) 432-7266.
Received October 28, 1999; accepted in revised form March 28, 2000. 10:744-757 ©2000 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/00 $5.00 This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||