|
|
|
|
Published online before print
February 6, 2007, 10.1101/gr.5989907 Genome Res. 17:348-357, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Methods Identification of muscle-specific regulatory modules in Caenorhabditis elegansDepartment of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
Transcriptional regulation is the major regulatory mechanism that controls the spatial and temporal expression of genes during development. This is carried out by transcription factors (TFs), which recognize and bind to their cognate binding sites. Recent studies suggest a modular organization of TF-binding sites, in which clusters of transcription-factor binding sites cooperate in the regulation of downstream gene expression. In this study, we report our computational identification and experimental verification of muscle-specific cis-regulatory modules in Caenorhabditis elegans. We first identified a set of motifs that are correlated with muscle-specific gene expression. We then predicted muscle-specific regulatory modules based on clusters of those motifs with characteristics similar to a collection of well-studied modules in other species. The method correctly identifies 88% of the experimentally characterized modules with a positive predictive value of at least 65%. The prediction accuracy of muscle-specific expression on an independent test set is highly significant (P < 0.0001). We performed in vivo experimental tests of 12 predicted modules, and 10 of those drive muscle-specific gene expression. These results suggest that our method is highly accurate in identifying functional sequences important for muscle-specific gene expression and is a valuable tool for guiding experimental designs.
In metazoans, the gene-regulatory information that directs development is encoded in their genomic DNA sequence. The temporal and spatial expression pattern of genes is controlled by short cis-regulatory elements that act as binding sites for transcription factors. Through interactions with the basal transcription apparatus and other regulatory proteins, transcription factors determine either activation or repression of the target gene at a particular developmental time or within a particular cell or tissue. Therefore, identification of cis-regulatory elements and their binding proteins constitute an important part of deciphering the role of noncoding sequences. However, the individual binding of a transcription factor to a regulatory element is rarely sufficient to confer context-specific expression. Mounting evidence suggests that complex, cooperative proteinprotein interactions between transcription factors are required to determine gene expression patterns (Arnone and Davidson 1997
Given the fast increasing number of genome sequences, our ability to decipher the encoded information lags far behind. For example, Caenorhabditis elegans is the first metazoan organism whose genome was sequenced. However, our understanding of the sequences that control tissue-specific gene expression is still limited. This limited understanding comes mainly from experimental investigation of the regulatory sequences of individual genes, which began almost 20 yr ago (Spieth et al. 1988
Studies from various organisms have revealed a common theme that transcription factor binding sites tend to be interconnected and function together to confer a particular context-specific expression on the target gene. Those clusters of transcription factor binding sites form a regulatory module that can be located in the upstream, downstream, or intronic sequences and can be moved from their native context and still recapitulate a portion of the native expression pattern independent of their position and orientation to the basal promoter (Arnone and Davidson 1997 Here we describe a de novo computational method for accurate identification of regulatory sequences that confer muscle-specific gene expression, as well as experimental tests of the predicted modules. Comparisons of the predicted modules with experimentally characterized modules show high sensitivity and positive predictive value (PPV, defined as True Positives/All Predictions). A totals of 88% (22/25) of experimentally characterized modules are predicted, and 65% (30/46) of our predicted modules are located within experimentally defined regions. The rest of the predicted modules have not been tested for function, so the PPV could be much higher; it is already much higher than currently available algorithms. We developed a scoring system to predict the muscle specificity for any segment of DNA sequence. When applied to the whole genome, this method can help discriminate muscle genes from non-muscle genes. Because no information about known modules was used for the predictions, we expected the new predictions to have the same sensitivity and PPV. To examine this, we experimentally tested the functionality of 12 predicted modules. Of these 12 modules, three are located within known muscle gene promoters and nine are located in the promoters of genes with unknown expression patterns and unknown functions. Ten of the 12 tested modules drive gene expression in muscle tissue, demonstrating that our method is a valuable tool for guiding experimental design. Although we focus on muscle-specific gene expression in this work, we expect the method to be generally applicable to many other context-specific module identification tasks, because our method requires no prior knowledge other than a set of likely coexpressed orthologous genes. C. elegans muscle-specific module prediction tool can be accessed at http://ural.wustl.edu/software.html.
Identification of regulatory motifs Promoters are commonly defined as the DNA regions located upstream of the transcription start sites that contain the necessary binding elements for proper transcriptional regulation. In C. elegans, 60% of predicted intergenic regions will be fully included within a 2-kb upstream segment (Dupuy et al. 2004
We used the program PhyloCon (Wang and Stormo 2003
Muscle specificity of identified motifs To identify motifs that are enriched in muscle gene promoters we calculated the Over Representation Index (ORI) (Bajic et al. 2004
The top four motifs, ranked by ORI (Table 1), are similar to previously identified muscle-specific regulatory motifs (GuhaThakurta et al. 2002
Identification of muscle-specific regulatory modules in C. elegans promoter sequences
Because some genes have alternative promoters, there are 138 different muscle gene promoters for the 122 muscle-specific genes. We applied this method on the 138 muscle gene promoters and identified 373 modules, an average of 2.7 modules per gene. The size of the modules ranges from 28 to 516 bp with a mean of 144 bp. Kirchhamer et al. (1996)
Verification of regulatory modules
A comparison of our predicted modules to those experimentally characterized modules shows that they match closely. For example, T18D3.4 encodes Myo-2, a pharyngeal-specific myosin heavy chain. The 17 to 239 region is defined as the minimal promoter that can drive reporter gene expression in pharyngeal muscles, while two overlapping 0.3-kb fragments (370 to 686 and 458 to 764) are sufficient for pharyngeal muscle-specific enhancer activity (Okkema et al. 1993 500 bp. The distance from the ends of predicted modules to the ends of experimentally characterized modules ranges from 5 to 182 bp, and the average is 69 bp. These results demonstrate that the predicted regulatory modules are highly correlated with experimentally determined enhancers that direct gene expression in muscle tissues.
We performed simulations to estimate the statistical significance of obtaining the same sensitivity and PPV, given the promoter sequences and the known regulatory modules. We simulate the distribution of predicted modules in the promoters by randomly picking a start position for each module. The length and number of modules in each gene is kept the same as the predicted modules in that gene. The simulation is repeated 100,000 times and the sensitivity and PPV are calculated for each one. The average sensitivity is 48.8% with standard deviation of 7.8. The average PPV is 35.5% with standard deviation of 5.5. Therefore, the P-values of getting 88% sensitivity and 65% PPV are both much less than 0.001.
Detection of muscle genes on a genome scale
Will prior information help? Our module predictions did not rely on any knowledge about experimentally defined modules, such as which genes contained them, where they were located, or which motifs they contained. We next examined whether the use of prior information about experimentally defined modules can identify a reduced set of motifs that is indispensable for module identification and can improve predictive performance. First, we tested the performance of module prediction using only muscle-specific motifs. We first noticed that the sensitivity is greatly reduced compared with the prediction made with the full set of motifs. Varying the distance parameter from 20 to 100 bp, the sensitivity ranges from 52% to 72%, while using the full set of motifs has a sensitivity range from 80% to 96%. Secondly, the PPV (from 61.8% to 74.3%) is comparable to the prediction made with the full set of motifs (from 60.5% to 77.4%). Using this motif set to perform genomic predictions does not improve the performance, as determined by the ROC curve of the 44 test set muscle-specific genes (Supplemental Fig. 3). This suggests that some of the non-muscle-specific motifs are important components of muscle-specific modules. We next performed experiments to find a subset of motifs to regain the prediction sensitivity with the same or higher level of PPV. By adding back combinations of one, two, or three non-muscle-specific motifs and using various distance parameters ranging from 20 to 100 bp, we find that there are six cases in which we can obtain both higher sensitivity and higher PPV (Supplemental Table 3). In all cases, motif 6 (WCTTTGM) is included in the motif set. We used three motif sets that give the highest sensitivity and PPV to perform genomic prediction, and plotted the ROC curve of the 44 test set muscle genes. The results suggest that the predictive performances are all comparable to, or worse than, the original set of motifs (Supplemental Fig. 3). Therefore, training on known modules can improve the performance on the training set, but this must be due to overfitting, because it does not improve the genomic predictions in any significant way. These results demonstrate that (1) our method for module identification does not need prior information in order to make high quality predictions; (2) our method is robust; (3) the initial step of motif prediction and redundant motif elimination effectively identifies motifs that are important for regulating muscle-specific gene expression.
Experimental verification of predicted modules
First, we tested our predictive powers by locating the regulatory regions of three genes that are known to be muscle-specific genes, but whose promoters have not been subjected to comprehensive functional analyses. Our results confirmed that our predictions are correct in all three cases. C02D4.2 (ser-2) has at least three alternative promoters that drive C02D4.2 expression in a set of neurons, as well as pharyngeal cells and head muscles (Tsalik et al. 2003
Second, we tested whether our predictions help to identify muscle-expressing genes in the genome. We randomly picked eight genes of unknown function and unknown expression pattern from the top-ranking predicted muscle genes (ranked from 1 to 198 in the genomic ranking, Table 3). For each gene we assayed whether the minimal upstream sequences encompassing the first predicted modules could drive gene expression in the muscle tissue. Table 3 shows the list of genes tested, as well as the genomic rank of the genes, the location of the predicted modules, and the observed expression patterns. C01B7.3 and C01B7.1 share the 2.6-kb intergenic sequences. C01B7.3 is a predicted gene with no RNAi phenotype and no hit in a BLASTP search in the genome of C. briggsae, Caenorhabditis remanei, Anopheles gambiae, D. melanogaster, Rattus norvegicus, Homo sapiens, C. elegans, and Saccharomyces cerevisiae (WormBase http://www.wormbase.org/.). In our experiment, the 553-bp C01B7.3 promoter did not give any expression pattern. Therefore, C01B7.3 is likely to be a falsely annotated gene. For the remaining seven genes, six are muscle genes, while the minimal promoter region of C10G11.7 drives reporter gene expression exclusively in the neurons (Fig. 3AL). It is known that muscle genes and neuronal genes share some regulatory elements (Wasserman and Fickett 1998
Third, we tested the functionality of modules located further upstream by deletion analysis. The first two predicted modules in K10G6.3 are clustered at 378 to 847. A DNA fragment containing this region drives gfp expression mainly in neurons and occasionally in the pharyngeal muscles (Fig. 3N). Deletion of this region results in complete loss of gfp expression. The first predicted module in F27D4.2 is located at 491 to 1041. A DNA fragment including the predicted module drives reporter gene expression in the pharyngeal muscle (Fig. 3C), body wall muscle (Fig. 3D), and intestine, whereas deletion of the predicted module from the DNA results in loss of gfp expression.
Fourth, we tested the enhancer activity of a predicted module. W06H8.6 is a gene with unknown function and unknown expression pattern that has an upstream sequence >7 kb. In the W06H8.6 2-kb promoter sequence, six modules were predicted. The first one is located at 256 to 591 and the first 675 bp upstream of ATG drives reporter gene expression in body wall muscle (Fig. 3J), vulva muscle (Fig. 3K), and pharyngeal muscle (Fig. 3L), as reported above. Another three are located between 764 and 1183 with intermodule distance of around 40 bp. We tested the functionality of this cluster of modules by introducing the DNA fragment upstream of a minimal pes-10 promoter (Fire et al. 1990 In summary, we tested the functionality of 12 predicted modules. Ten of them drive gene expression in muscle tissues and one of them is involved in gene expression in neuronal cells. The remaining one showed no expression and may not even correspond to a true gene. This gives a positive predictive value of 83%, and 92%, if we count neuronal regulatory modules as positive. Generally, it takes many similar experiments to dissect the long promoter sequences to identify the functional sequences of a single gene. For the genes we tested, several of them have very long upstream sequences. For example, the upstream sequence of F45D3.2, W06H8.6, and F27D4.2 are 9, 7, and 11 kb, respectively. These results demonstrate that our method is able to both predict unknown genes that are expressed in muscle cells and to reduce the important functional domains, which contain the essential modules, to much smaller regions.
The accurate identification of regulatory modules within a genomic sequence would be very useful for the study of gene regulation. However, identifying modules experimentally is a time-consuming and labor-intensive process. We developed a computational approach to predict muscle-specific cis-regulatory modules in C. elegans and performed experimental evaluations of their accuracy. Analysis of the in vivo activity of 12 predicted modules, of which 10 showed the predicted activity, demonstrates the utility of our approach.
We chose muscle genes for this study because muscle has been a fertile ground for molecular genetics studies with C. elegans for three decades. Most of the work focused on the organization, structure, and function of muscle fibers and muscle cells (Moerman and Fire 1997
Although this study focused on modules for muscle expression, we did not use any muscle-specific characteristics, and we expect that our method would work equally well for other tissue-specific expression patterns. The approach is quite simple and requires very little prior information, including no initial information about motifs. The input is merely a set of C. elegans genes known to share a particular expression pattern and their orthologs in another Caenorhabditis genome, so that the program PhyloCon could identify significant motifs. We then used the promoters of non-muscle genes to identify which motifs were muscle specific and which were general. The set of motifs were then combined into predicted modules based on characteristics of a few well-characterized modules found in human, mouse, rat, fly, and sea urchin (Arnone and Davidson 1997
While these results demonstrate the utility of our approach, we are still far from having a precise and completely accurate predictor of muscle expression patterns. Two of the 12 predicted modules we tested were not correct. From the ROC curve (Fig. 2) it can be seen that high-scoring promoters are highly enriched in muscle-specific genes, but there are a few non-muscle genes that also have high scores, and there are several muscle-specific genes with only low-scoring modules that do not distinguish well from non-muscle genes. Furthermore, we have only attempted to predict muscle expression in general, rather than for specific classes of muscles. Among the tested modules we see several distinct patterns that include specific subsets of muscles from the head, body wall, vulva, and pharynx, as well as some that also cause expression in subsets of neurons. More work is needed before we can fully model more specific expression patterns. For example, in this study, we have not considered possible modules occurring within introns or downstream of the genes, even though we know of such examples (Jantsch-Plunger and Fire 1994
Identification of C. elegans muscle genes and orthologs in C. briggsae In this study, we define muscle-specific genes as those that are only expressed in the muscle tissue or expressed in at most two other tissues. We identified a total of 122 C. elegans muscle-specific genes from searching the WormBase (Chen et al. 2005
Identification of putative regulatory motifs and elimination of redundant motifs
Calculation of over-representation index
We adopted the concept of over-representation of a particular pattern in one group of sequences with regard to another group of sequences from Bajic et al. (2004)
Searching for cis-regulatory modules
Calculation of module score and promoter score
Genome-wide searches
Construction of plasmids and GFP expression analysis
To test the enhancer activity of more distant predicted modules, PCR products were cloned into pPD107.94 (
We thank Ting Wang for assistance with the PhyloCon program and helpful discussions. We also thank Michael L. Nonet, Andrew Fire, and Susan E. Mango for providing reagents used in this work, and Dr. Frank E. Harrell Jr. for helping with statistical analysis of the predictions. This work was supported by NIH grants HG00249, and G.Z. was supported by NIH institutional training grant 5 T32 HG000045-08 and National Institute of General Medical Sciences NRSA service award 1 F32 GM73444-01.
1 Corresponding author.
E-mail stormo{at}genetics.wustl.edu; fax (314) 362-7855. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5989907
Anyanful, A., Sakube, Y., Takuwa, K., and Kagawa, H. 2001. The third and fourth tropomyosin isoforms of Caenorhabditis elegans are expressed in the pharynx and intestines and are essential for development and morphology. J. Mol. Biol. 313: 525537.[CrossRef][Medline] Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., and Mango, S.E. 2004. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305: 17431746. Arnone, M.I. and Davidson, E.H. 1997. The hardwiring of development: Organization and function of genomic regulatory systems. Development 124: 18511864.[Abstract] Bajic, V.B., Choudhary, V., and Hock, C.K. 2004. Content analysis of the core promoter region of human genes. In Silico Biol. 4: 109125.[Medline] Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., and Eisen, M.B. 2002. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. 99: 757762. Buhler, J. and Tompa, M. 2002. Finding motifs using random projections. J. Comput. Biol. 9: 225242.[CrossRef][Medline] Chen, L., Krause, M., Sepanski, M., and Fire, A. 1994. The Caenorhabditis elegans MYOD homologue HLH-1 is essential for proper muscle function and complete morphogenesis. Development 120: 16311641.[Abstract] Chen, N., Harris, T.W., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., Bradnam, K., Canaran, P., Chan, J., and Chen, C.K., et al. 2005. WormBase: A comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res. 33: D383D389. Cho, J.H., Eom, S.H., and Ahnn, J. 1999. Analysis of calsequestrin gene expression using green fluorescent protein in Caenorhabditis elegans. Mol. Cells 9: 230234.[Medline] Clark, M.P., Chow, C.W., Rinaldo, J.E., and Chalkley, R. 1998. Multiple domains for initiator binding proteins TFII-I and YY-1 are present in the initiator and upstream regions of the rat XDH/XO TATA-less promoter. Nucleic Acids Res. 26: 28132820. Culetto, E., Combes, D., Fedon, Y., Roig, A., Toutant, J.P., and Arpagaus, M. 1999. Structure and promoter activity of the 5' flanking region of ace-1, the gene encoding acetylcholinesterase of class A in Caenorhabditis elegans. J. Mol. Biol. 290: 951966.[CrossRef][Medline] Dupuy, D., Li, Q.R., Deplancke, B., Boxem, M., Hao, T., Lamesch, P., Sequerra, R., Bosak, S., Doucette-Stamm, L., and Hope, I.A., et al. 2004. A first version of the Caenorhabditis elegans Promoterome. Genome Res. 14: 21692175. Fire, A., Harrison, S.W., and Dixon, D. 1990. A modular set of lacZ fusion vectors for studying gene expression in Caenorhabditis elegans. Gene 93: 189198.[CrossRef][Medline] Gilleard, J.S., Shafi, Y., Barry, J.D., and McGhee, J.D. 1999. ELT-3: A Caenorhabditis elegans GATA factor expressed in the embryonic epidermis during morphogenesis. Dev. Biol. 208: 265280.[CrossRef][Medline] Gower, N.J., Temple, G.R., Schein, J.E., Marra, M., Walker, D.S., and Baylis, H.A. 2001. Dissection of the promoter region of the inositol 1,4,5-trisphosphate receptor gene, itr-1, in C. elegans: A molecular basis for cell-specific expression of IP3R isoforms. J. Mol. Biol. 306: 145157.[CrossRef][Medline] Gribskov, M. and Robinson, N.L. 1996. The use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20: 2534.[CrossRef][Medline] GuhaThakurta, D., Schriefer, L.A., Hresko, M.C., Waterston, R.H., and Stormo, G.D. 2002. Identifying muscle regulatory elements and genes in the nematode Caenorhabditis elegans. Pac. Symp. Biocomput. 7: 425436. GuhaThakurta, D., Schriefer, L.A., Waterston, R.H., and Stormo, G.D. 2004. Novel transcription regulatory elements in Caenorhabditis elegans muscle genes. Genome Res. 14: 24572468. Harfe, B.D. and Fire, A. 1998. Muscle and nerve-specific regulation of a novel NK-2 class homeodomain factor in Caenorhabditis elegans. Development 125: 421429.[Abstract] Hertz, G.Z. and Stormo, G.D. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563577. Hwang, S.B. and Lee, J. 2003. Neuron cell type-specific SNAP-25 expression driven by multiple regulatory elements in the nematode Caenorhabditis elegans. J. Mol. Biol. 333: 237247.[CrossRef][Medline] Jantsch-Plunger, V. and Fire, A. 1994. Combinatorial structure of a body muscle-specific transcriptional enhancer in Caenorhabditis elegans. J. Biol. Chem. 269: 2702127028. Kagawa, H., Sugimoto, K., Matsumoto, H., Inoue, T., Imadzu, H., Takuwa, K., and Sakube, Y. 1995. Genome structure, mapping and expression of the tropomyosin gene tmy-1 of Caenorhabditis elegans. J. Mol. Biol. 251: 603613.[CrossRef][Medline] Kamachi, Y., Uchikawa, M., and Kondoh, H. 2000. Pairing SOX off: With partners in the regulation of embryonic development. Trends Genet. 16: 182187.[CrossRef][Medline] Kirchhamer, C.V., Yuh, C.H., and Davidson, E.H. 1996. Modular cis-regulatory organization of developmentally expressed genes: Two genes transcribed territorially in the sea urchin embryo, and additional examples. Proc. Natl. Acad. Sci. 93: 93229328. Kostas, S.A. and Fire, A. 2002. The T-box factor MLS-1 acts as a molecular switch during specification of nonstriated muscle in C. elegans. Genes & Dev. 16: 257269. Krause, M., Harrison, S.W., Xu, S.Q., Chen, L., and Fire, A. 1994. Elements regulating cell- and stage-specific expression of the C. elegans MyoD family homolog hlh-1. Dev. Biol. 166: 133148.[CrossRef][Medline] Landmann, F., Quintin, S., and Labouesse, M. 2004. Multiple regulatory elements with spatially and temporally distinct activities control the expression of the epithelial differentiation gene lin-26 in C. elegans. Dev. Biol. 265: 478490.[CrossRef][Medline] Li, R., Pei, H., and Watson, D.K. 2000. Regulation of Ets function by proteinprotein interactions. Oncogene 19: 65146523.[CrossRef][Medline] MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., and Fraenkel, E. 2006. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7: 113.[CrossRef][Medline] Markstein, M., Markstein, P., Markstein, V., and Levine, M.S. 2002. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl. Acad. Sci. 99: 763768. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., and Kel-Margoulis, O.V., et al. 2003. TRANSFAC: Transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31: 374378. Mello, C.C., Kramer, J.M., Stinchcomb, D., and Ambros, V. 1991. Efficient gene transfer in C.elegans: Extrachromosomal maintenance and integration of transforming sequences. EMBO J. 10: 39593970.[Medline] Moerman, D.G. and Fire, A. 1997. Muscle: Structure, function, and development. In C. elegans II (eds. D.L. Riddle et al.), pp. 147184. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Moerman, D.G. and Williams, B.D. T.C.e.R. Community. 2006. Sarcomere assembly in C. elegans muscle. In WormBook. WormBook. Okkema, P.G. and Fire, A. 1994. The Caenorhabditis elegans NK-2 class homeoprotein CEH-22 is involved in combinatorial activation of gene expression in pharyngeal muscle. Development 120: 21752186.[Abstract] Okkema, P.G., Harrison, S.W., Plunger, V., Aryana, A., and Fire, A. 1993. Sequence requirements for myosin gene expression and regulation in Caenorhabditis elegans. Genetics 135: 385404.[Abstract] Polly, P., Haddadi, L.M., Issa, L.L., Subramaniam, N., Palmer, S.J., Tay, E.S., and Hardeman, E.C. 2003. hMusTRD1alpha1 represses MEF2 activation of the troponin I slow enhancer. J. Biol. Chem. 278: 3660336610. Reece-Hoyes, J.S., Deplancke, B., Shingles, J., Grove, C.A., Hope, I.A., and Walhout, A.J. 2005. A compendium of Caenorhabditis elegans regulatory transcription factors: A resource for mapping transcription regulatory networks. Genome Biol. 6: R110.[CrossRef][Medline] Remenyi, A., Scholer, H.R., and Wilmanns, M. 2004. Combinatorial control of gene expression. Nat. Struct. Mol. Biol. 11: 812815.[CrossRef][Medline] Roy, P.J., Stuart, J.M., Lund, J., and Kim, S.K. 2002. Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature 418: 975979.[Medline] Smith, P.A. and Mango, S.E. 2007. Role of T-box gene tbx-2 for anterior foregut muscle development in C. elegans. Dev. Biol. 302: 2539.[CrossRef][Medline] Spieth, J., MacMorris, M., Broverman, S., Greenspoon, S., and Blumenthal, T. 1988. Regulated expression of a vitellogenin fusion gene in transgenic nematodes. Dev. Biol. 130: 285293.[CrossRef][Medline] Staden, R.. 1989. Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5: 8996. Stormo, G.D.. 2000. DNA binding sites: Representation and discovery. Bioinformatics 16: 1623. Teng, Y., Girard, L., Ferreira, H.B., Sternberg, P.W., and Emmons, S.W. 2004. Dissection of cis-regulatory elements in the C. elegans Hox gene egl-5 promoter. Dev. Biol. 276: 476492.[CrossRef][Medline] Tsalik, E.L., Niacaris, T., Wenick, A.S., Pau, K., Avery, L., and Hobert, O. 2003. LIM homeobox gene-dependent expression of biogenic amine receptors in restricted regions of the C. elegans nervous system. Dev. Biol. 263: 81102.[CrossRef][Medline] Vilimas, T., Abraham, A., and Okkema, P.G. 2004. An early pharyngeal muscle enhancer from the Caenorhabditis elegans ceh-22 gene is targeted by the Forkhead factor PHA-4. Dev. Biol. 266: 388398.[CrossRef][Medline] Wagner, A.. 1999. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15: 776784. Wang, X. and Chamberlin, H.M. 2004. Evolutionary innovation of the excretory system in Caenorhabditis elegans. Nat. Genet. 36: 231232.[CrossRef][Medline] Wang, T. and Stormo, G.D. 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19: 23692380. Wang, T. and Stormo, G.D. 2005. Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. 102: 1740017405. Wasserman, W.W. and Fickett, J.W. 1998. Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278: 167181.[CrossRef][Medline] Zhao, Z., Fang, L., Chen, N., Johnsen, R.C., Stein, L., and Baillie, D.L. 2005. Distinct regulatory elements mediate similar expression patterns in the excretory cell of Caenorhabditis elegans. J. Biol. Chem. 280: 3878738794.
Received September 25, 2006; accepted in revised format December 12, 2006. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||