|
|
|
|
Published online before print
January 8, 2007, 10.1101/gr.5972507 Genome Res. 17:201-211, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 OPEN ACCESS ARTICLE
Methods Predicting tissue-specific enhancers in the human genome1 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; 2 U.S. Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA; 3 Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA; 4 Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA; 5 Computation Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
Determining how transcriptional regulatory signals are encoded in vertebrate genomes is essential for understanding the origins of multicellular complexity; yet the genetic code of vertebrate gene regulation remains poorly understood. In an attempt to elucidate this code, we synergistically combined genome-wide gene-expression profiling, vertebrate genome comparisons, and transcription factor binding-site analysis to define sequence signatures characteristic of candidate tissue-specific enhancers in the human genome. We applied this strategy to microarray-based gene expression profiles from 79 human tissues and identified 7187 candidate enhancers that defined their flanking gene expression, the majority of which were located outside of known promoters. We cross-validated this method for its ability to de novo predict tissue-specific gene expression and confirmed its reliability in 57 of the 79 available human tissues, with an average precision in enhancer recognition ranging from 32% to 63% and a sensitivity of 47%. We used the sequence signatures identified by this approach to successfully assign tissue-specific predictions to 328,000 humanmouse conserved noncoding elements in the human genome. By overlapping these genome-wide predictions with a data set of enhancers validated in vivo, in transgenic mice, we were able to confirm our results with a 28% sensitivity and 50% precision. These results indicate the power of combining complementary genomic data sets as an initial computational foray into a global view of tissue-specific gene regulation in vertebrates.
Increasing lines of evidence support the notion that the majority of functional elements in the human genome do not code for proteins (Waterston et al. 2002
Predicting candidate regulatory elements for tissue-specific genes As a first step toward directly relating gene expression to comparative sequence data, we clustered overlapping gene transcripts in the human genome and identified 18,504 unique protein-coding loci (the boundaries of each locus were defined by the neighboring genes, independent of the absolute size of the locus; see Methods). We next assigned transcriptional information obtained from the GNF Atlas2 gene expression database (gnfAtlas2) (Su et al. 2002
We initially observed a strong correlation between the tissue specificity of a gene and the size of the locus, such that loci of highly expressed genes in the central nervous system (CNS) were, on average, significantly larger than the global median locus length. In contrast, loci corresponding to highly expressed genes in the immune system or various tumor tissues were significantly shorter (Supplemental Fig. S1). For example, the median locus length of a human gene highly expressed in fetal brain was 245 kb, while genes highly expressed in testis were on average 3.6 times shorter (68 kb) (Supplemental Fig. S1). We also found that 10% of the brain and CNS loci coincided with vast noncoding regions termed gene deserts (Nobrega et al. 2003
Recent studies suggest that the most highly conserved noncoding ECRs within a locus commonly possess gene regulatory function (Nobrega et al. 2003
To explore the sequence motifs of these noncoding ECRs linked to genes displaying high versus low expression in the same tissue, we used a previously described motif-identification strategy (Loots and Ovcharenko 2004
Determining sequence signatures of candidate tissue-specific enhancers
The EI scoring optimization allowed us to maximize our resolving power to the point where 60% (±5%) of genes highly expressed in a tissue group contain signatures that are present in <15% of the low expressed genes for any given tissue (Fig. 2B). For example, EI identified at least one fetal lung candidate enhancer for 65% of genes with high fetal lung expression, while no such candidates were identified in the non-intergenic regions (promoter, UTR, or intronic) of >86% of genes with low fetal lung expression (intergenic regions were excluded from the negative control group to prevent potential associations with neighboring genes regulation [see Methods]). Of the original 24 k candidate regulatory elements linked to genes highly expressed in one or more of the 79 available tissues, EI optimization identified 7187 candidate enhancers with signatures that define tissue-specific expression. The database that summarizes these candidate tissue-specific enhancers is available at http://www.dcode.org/EI. Through this consolidation of the data set we found that 47% of human noncoding ECRs defined as candidate enhancers were predictive of expression in more than one tissue, consistent with our finding that 66% of the human genes in this study are highly expressed in multiple tissues. Since these candidate enhancers were mainly assigned to different tissues that are functionally related (e.g., CD4 and CD8 T-cells) (Supplemental Table S6), it is possible that the transcriptional regulation of genes expressed in similar tissues could be achieved through shared gene-regulatory mechanisms. These findings are consistent with in vivo expression data derived from enhancer scans in transgenic mice, indicating that one-third of embryonic enhancers active during a single time-point in development drive expression in more than one tissue type (Nobrega et al. 2003
Since the EI method is based on the weighting of multiple TFs for their association with tissue-specific expression, we sought to further explore the nature of this combinatorial TF-scoring scheme. We found that in no case was a single TF sufficient to predict tissue-specific gene expression, supporting the notion that tissue-specific gene regulation is a direct result of interplay among multiple TFs. To quantify the impact of an individual ith TF on predicting gene expression in a particular tissue t, we calculated the TF importance parameter (I ) defined as the product of the TF occurrence (percentage of tissue-specific candidate enhancers with a particular conserved TFBS) and its weight in a tissue-specific group of candidate enhancers (Supplemental Table S2). Since TF importance compounds the effects of TF occurrence and weight, it presents an integrative measure of the TFs role in generating high positive scores of tissue-specific candidate regulatory elements. At the same time, it minimizes the impact of TFs that are rare or have small weights and thus do not contribute significantly to establishing either a positive or a negative tissue-specificity score. This quantification allowed for the identification of cohorts of TFs in candidate enhancers potentially involved in tissue-specific regulatory networks, i.e., those TFs both with high weights and high occurrences (see Supplemental Materials). As an example of a high TF impact on tissue-specific regulation, the photoreceptor-specific CRX TF has the highest importance parameter value in eye development (Supplemental Table S2) consistent with the known function of this regulatory protein in Cone-Rod Dystrophy (CRD), an inherited progressive disease that causes deterioration of the cone and rod photoreceptor cells and leads to blindness (Itabashi et al. 2004
To illustrate this methods ability to predict functional enhancers, we examined two well-characterized enhancers, one for skeletal muscle and one for liver, flanking the human cardiac/slow skeletal muscle troponin C (TNNC1) and the apolipoprotein B (APOB) genes, respectively (Fig. 3). An EI scan of the TNNC1 locus first identified four noncoding ECRs (of 12 total) as candidate regulatory elements (two intergenic, one intronic, and one promoter element). Subsequent EI optimization then correctly predicted the noncoding ECR in intron 1 as a skeletal muscle enhancer in precise agreement with the previously defined TNNC1 skeletal muscle enhancer (Christensen et al. 1993
To explore the possibility of synergistic TF linkage that may be biologically required for directing tissue-specific gene expression, we extracted the top 10 scoring TFs for each tissue based on their importance in predicting tissue-specific expression. As an example, we focused on the TF characteristics of two similar tissue types: heart and skeletal muscle (Fig. 4A) (a complete list of the top TF for each tissue is provided in Supplemental Table S2). We observed that five of the top 10 TF predictions for both these muscle types are shared, four of which (MEF2, SRF, myogenin, and ESRRA) are strongly linked to transcriptional regulation in muscle tissue and associated with various human cardiac myopathies (Sakuma et al. 2003
To globally address the power of the predicted TF-tissue associations in addition to the support gained from the above selected examples, we mapped TFs to the human genome and determined the tissue gnfAtlas2 expression profile for each TF gene. Our rationale was that if tissue-specific gene expression predictiveness is based on TFBS density in candidate enhancer sequences, then the TF required for this function should be expressed in the tissue of activity. Thus, we attempted to correlate positive TF importance with the level of TF gene expression in the available 79 human tissues. This was accomplished by adjusting the minimal TF importance threshold increasingly from 0.25 to +0.25 (thus gradually increasing the ratio of TFs with positive importance values in the group) to determine whether TF expression and enhancer predictiveness were positively correlated (Fig. 4B). Indeed, we observed that 60% of predicted positive TF-tissue associations corresponding to TF importance thresholds of 0.1 were supported by an increased level of gene expression in the associated tissue (Fig. 4B). One possible explanation for the lack of total concordance between the predicted TF-tissue associations and tissue specificity is the ubiquitous nature of TF gene expression that often leads to ambiguous definitions of tissue specificity with increased and decreased level of gene expression in gnfAtlas2. Manual curation of these interactions revealed that 90% (142/158) of predicted TF-tissue associations with 0.25 TF importance threshold are supported by published literature or alternative sources of experimental evidence (see Supplemental materials; Table S3).
Since any parametric optimization approach could potentially introduce "overfitting"the identification of random profiles that separate genes with high versus low expression purely by chancewe attempted to cross-validate our results. This was accomplished by characterizing the ability of the EI method to annotate tissue-specific enhancers in loci of highly expressed genes without any a priori knowledge of tissue specificity of gene expression (i.e., these genes were excluded from the training set; see Methods). This approach allowed us to quantify both the methods precision (defined as the proportion of predicted elements that act as tissue-specific enhancers) and sensitivity for each tissue (Fig. 2). Through this analysis, we observed a high variability in EI precision across the 79 sampled human tissues, and hence, these tissues were classified into three quality groups (Fig. 2A): (1) poor (lower-bound precision, P
Assigning tissue-specific predictions to conserved noncoding sequences in the human genome
Experimental validation of tissue-specific enhancer predictions
To expand these data beyond the limited published in vivo data for distant-acting enhancer elements, we next performed a large-scale analysis of our whole-genome predictions against a publicly available data set of 106 elements that have been shown to act as tissue-specific enhancers in the mouse at embryonic day 11.5 of development (E11.5) (data available at http://enhancer.lbl.gov) (Pennacchio et al. 2006 To further explore the relationship between the 20 concordant EI whole-genome predictions and the existing in vivo nervous system data set described above, we examined the distribution of the predictions within the 18 different brain tissues present in the gnfAtlas2 database. While we found four or less of these, in vivo-defined CNS enhancers were predicted to be expressed in each of the 17 adult brain tissues present in the expression annotation, 11 of them were annotated to the fetal brain category in the gnfAtlas2 (the probability of this observation being random is <107 [see Supplemental materials]). This high ratio of fetal-brain predictions is consistent with the entire in vivo expression data set that corresponds to a single time point of enhancer analysis during embryonic development at E11.5. This suggests that the fetal brain-enhancer recognition profile of EI is a specific signature of in vivo embryonic brain enhancers, in contrast to enhancers active in specialized compartments of the adult brain. It is unclear, however, whether these enhancers are exclusively active during embryonic time points and not during adult stages. Additional in vivo data sets based on nonembryonic time points will further aid in assessing the ability of this approach to predict enhancer elements active in adult tissues.
Deciphering the genetic code of gene regulation in vertebrate genomes remains a significant challenge that has been partially aided by the availability of the human and other vertebrate genome sequences. However, while techniques such as comparative genomics can enrich for putative enhancer sequences based on evolutionary conservation, predicting their tissue specificity has been difficult. Nevertheless, several proof-of-principle studies have demonstrated that there is a vaguely defined, but computationally recognizable genetic code of gene regulatory elements corresponding to selected biological functions (Thompson et al. 2004
One of the inferences we can formulate based on the results of the EI method introduced here is the proportion of enhancer activity assigned to promoters versus more distant-acting sequences. This measurement was possible since the EI approach utilizes the three most highly conserved humanmouse elements neighboring the gene under investigation and thus goes beyond promoter only exploration of cis-regulatory features, the dominant method currently used in regulatory genomics. Through the comparison of the EI signal strength in promoter versus nonpromoter conserved elements, we found that only 23% of EI candidate enhancers map to promoter regions of corresponding genes. While a caveat to this analysis is the incomplete status of precisely defined promoter boundaries, this result is consistent with ChIP-chip and in vivo enhancer studies, which also suggest that more than half of human genes potentially rely on distant mechanisms of gene regulation (Lettice et al. 2003
Since this method can be applied to the analysis of any set of coexpressed genes, this provides a rapid and efficient approach for translating gene-expression data into function-specific gene regulatory principles. Thereby, it should be straightforward to extend this method to other tissues, developmental time-points, or functional gene categories (such as Gene Ontology and KEGG data sets [Kanehisa and Goto 2000
It is likely that computational approaches that identify gene-regulatory elements and assign tissue specificity to enhancer function will greatly improve over time. Current challenges include the varying quality and the limited number of tissues (and primarily adult origin) uniformly profiled in humans and mice by microarray analysis. Further difficulties arise from the small size of available in vivo spatial and temporal enhancer data to further serve as training sets, as well as our incomplete knowledge of TFs and their precise sequence-based binding specificities currently available in the TRANSFAC database (Wingender et al. 2000 In summary, the data presented here provide further support for the notion that sequence-based features in vertebrate cis-regulatory elements are computationally recognizable, similar to previous successes in the inference of coding, intronexon, core promoter, and repetitive DNA sequence signatures. Even though our study is limited by the availability and reliability of position weight matrices (PWM) of known TFs, the methods introduced here present a universal framework for the de novo prediction of regulatory elements with shared biological function, as well as for defining novel interactions among transcription factors that can explain tissue-specific function of enhancer elements. Future computational efforts linked to topics such as human disease and vertebrate phenotypic diversity are likely to refine the predictive ability of our strategy and provide insights into gene regulatory mechanisms of unexplained biological phenomena.
Gene annotation and expression data integration The UCSC Genome Browser (Kent et al. 2002
Identification of noncoding ECRs and candidate regulatory elements
Profiling putative TFBS in candidate gene regulatory elements
Assigning tissue specificity scores to candidate enhancers
EI optimization to define TF tissue-specific weights
L+) and L (l L) was maximized to perform the optimization of weights (the distribution of positively scoring candidate enhancers in L+ and L was allowed to change dynamically following the change in TF weights). The ratio of the total number of candidate enhancers in L+ (NE+) to the total number of candidate enhancers in L (NE) was introduced to the scoring function to account for differences in the number of genes with high versus low gene expression and the number of corresponding candidate enhancers. , or the signal enrichment coefficient served to increase the negative impact of positively scoring noncoding ECRs in L. was selected as 1 during the initial optimization step and then gradually increased to 10,000 to achieve the greatest separation between loci of highly and lowly expressed genes. Optimization was initialized with TF weights estimated using the density of putative TFBS in L+ and L as
Cross-validation
) in the de novo recognition of tissue-specific enhancers (which measures the probability of a tissue-specific enhancer to be detected by EI) in cases where the corresponding gene does not belong to a specific group of highly expressed genes.
Mapping TFs to known transcripts
Permutation analysis to identify significant tissue-specific inter-TF interactions
Assigning tissue-specific enhancer predictions to a whole-genome data set of humanmouse noncoding ECRs
We thank Shyam Prabhakar and Alex Poliakov for providing Gumby enhancer predictions in the human genome. G.G.L. and I.O. were supported by an LLNL LDRD-04-ERD-052 grant; and I.O. was in part supported by an LLNL LDRD-06-ERD-004 grant. The work was performed under the auspices of the United States Department of Energy by the University of California, Lawrence Livermore National Laboratory Contract W-7405-Eng-48. L.A.P. was supported by the Grant HL066681, Berkeley-PGA, under the Programs for Genomic Application, funded by National Heart, Lung, & Blood Institute, and HG003988 funded by National Human Genome Research Institute and performed under Department of Energy Contract DE-AC02-05CH11231, University of California, E.O. Lawrence Berkeley National Laboratory.
6 Corresponding author.
E-mail ovcharenko1{at}llnl.gov; fax (925) 422-2099. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5972507
Bajic, V.B., Tan, S.L., Suzuki, Y., and Sugano, S. 2004. Promoter prediction analysis on the whole human genome. Nat. Biotechnol. 22: 14671473.[CrossRef][Medline] Cartharius, K., Frech, K., Grote, K., Klocke, B., Haltmeier, M., Klingenhoff, A., Frisch, M., Bayerlein, M., and Werner, T. 2005. MatInspector and beyond: Promoter analysis based on transcription factor binding sites. Bioinformatics 21: 29332942. Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., and Williams, A.J., et al. 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116: 499509.[CrossRef][Medline] Chang, W.T., Chen, H.I., Chiou, R.J., Chen, C.Y., and Huang, A.M. 2005. A novel function of transcription factor alpha-Pal/NRF-1: Increasing neurite outgrowth. Biochem. Biophys. Res. Commun. 334: 199206.[CrossRef][Medline] Cheng, W., Guo, L., Zhang, Z., Soo, H.M., Wen, C., Wu, W., and Peng, J. 2006. HNF factors form a network to regulate liver-enriched genes in zebrafish. Dev. Biol. 294: 482496.[CrossRef][Medline] Christensen, T.H., Prentice, H., Gahlmann, R., and Kedes, L. 1993. Regulation of the human cardiac/slow-twitch troponin C gene by multiple, cooperative, cell-type-specific, and MyoD-responsive elements. Mol. Cell. Biol. 13: 67526765. Das, D., Nahle, Z., and Zhang, M.Q. 2006. Adaptively inferring human transcriptional subnetworks. Mol. Syst. Biol. 2: 2006.0029. de la Calle-Mustienes, E., Feijoo, C.G., Manzanares, M., Tena, J.J., Rodriguez-Seguel, E., Letizia, A., Allende, M.L., and Gomez-Skarmeta, J.L. 2005. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts. Genome Res. 15: 10611072. Dermitzakis, E.T., Reymond, A., and Antonarakis, S.E. 2005. Conserved non-genic sequencesAn unexpected feature of mammalian genomes. Nat. Rev. Genet. 6: 151157.[CrossRef][Medline] Frazer, K.A., Tao, H., Osoegawa, K., de Jong, P.J., Chen, X., Doherty, M.F., and Cox, D.R. 2004. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14: 367372. Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., and Taipale, J. 2006. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124: 4759.[CrossRef][Medline] Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., and Mungall, C., et al. 2004. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32: D258D261. Huss, J.M., Torra, I.P., Staels, B., Giguere, V., and Kelly, D.P. 2004. Estrogen-related receptor Ilia, M., Sugiyama, Y., and Price, J. 2003. Gender and age related expression of Oct-6a POU III domain transcription factor, in the adult mouse brain. Neurosci. Lett. 344: 138140.[CrossRef][Medline] Itabashi, T., Wada, Y., Sato, H., Kawamura, M., Shiono, T., and Tamai, M. 2004. Novel 615delC mutation in the CRX gene in a Japanese family with cone-rod dystrophy. Am. J. Ophthalmol. 138: 876877.[CrossRef][Medline] Kadi, F., Johansson, F., Johansson, R., Sjostrom, M., and Henriksson, J. 2004. Effects of one bout of endurance exercise on the expression of myogenin in human quadriceps muscle. Histochem. Cell Biol. 121: 329334.[CrossRef][Medline] Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28: 2730. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., and Thomas, D.J., et al. 2003. The UCSC Genome Browser Database. Nucleic Acids Res. 31: 5154. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 9961006. Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond, T.A., Wu, Y., Green, R.D., and Ren, B. 2005. A high-resolution map of active promoters in the human genome. Nature 436: 876880.[CrossRef][Medline] Lettice, L.A., Heaney, S.J., Purdie, L.A., Li, L., de Beer, P., Oostra, B.A., Goode, D., Elgar, G., Hill, R.E., and de Graaff, E. 2003. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12: 17251735. Levine, M. and Tjian, R. 2003. Transcription regulation and animal diversity. Nature 424: 147151.[CrossRef][Medline] Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S., Karlsson, E.K., Jaffe, D.B., Kamal, M., Clamp, M., Chang, J.L., Kulbokas III, E.J., and Zody, M.C., et al. 2005. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803819.[CrossRef][Medline] Loots, G.G. and Ovcharenko, I. 2004. rVISTA 2.0: Evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32: W217W221. Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., and Frazer, K.A. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288: 136140. Nobrega, M.A., Ovcharenko, I., Afzal, V., and Rubin, E.M. 2003. Scanning human gene deserts for long-range enhancers. Science 302: 413. Novak, E.M., Dantas, K.C., Charbel, C.E., and Bydlowski, S.P. 1998. Association of hepatic nuclear factor-4 in the apolipoprotein B promoter: A preliminary report. Braz. J. Med. Biol. Res. 31: 14051408.[Medline] Okuda, T., Tagawa, K., Qi, M.L., Hoshio, M., Ueda, H., Kawano, H., Kanazawa, I., Muramatsu, M., and Okazawa, H. 2004. Oct-3/4 repression accelerates differentiation of neural progenitor cells in vitro and in vivo. Brain Res. Mol. Brain Res. 132: 1830.[Medline] Ovcharenko, I., Nobrega, M.A., Loots, G.G., and Stubbs, L. 2004a. ECR Browser: A tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res. 32: W280W286. Ovcharenko, I., Stubbs, L., and Loots, G.G. 2004b. Interpreting mammalian evolution using Fugu genome comparisons. Genomics 84: 890895.[CrossRef][Medline] Ovcharenko, I., Loots, G.G., Giardine, B.M., Hou, M., Ma, J., Hardison, R.C., Stubbs, L., and Miller, W. 2005. Mulan: Multiple-sequence local alignment and visualization for studying function and evolution. Genome Res. 15: 184194. Parlakian, A., Charvet, C., Escoubet, B., Mericskay, M., Molkentin, J.D., Gary-Bobo, G., De Windt, L.J., Ludosky, M.A., Paulin, D., and Daegelen, D., et al. 2005. Temporally controlled onset of dilated cardiomyopathy through disruption of the SRF gene in adult heart. Circulation 112: 29302939. Parmacek, M.S., Ip, H.S., Jung, F., Shen, T., Martin, J.F., Vora, A.J., Olson, E.N., and Leiden, J.M. 1994. A novel myogenic regulatory circuit controls slow/cardiac troponin C gene transcription in skeletal muscle. Mol. Cell. Biol. 14: 18701885. Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., and Lewis, K.D., et al. 2006. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444: 499502.[CrossRef][Medline] Prabhakar, S., Poulin, F., Shoukry, M., Afzal, V., Rubin, E.M., Couronne, O., and Pennacchio, L.A. 2006. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 16: 855863. Riccio, A., Alvania, R.S., Lonze, B.E., Ramanan, N., Kim, T., Huang, Y., Dawson, T.M., Snyder, S.H., and Ginty, D.D. 2006. A nitric oxide signaling pathway controls CREB-mediated gene expression in neurons. Mol. Cell 21: 283294.[CrossRef][Medline] Sakuma, K., Nishikawa, J., Nakao, R., Nakano, H., Sano, M., and Yasuhara, M. 2003. Serum response factor plays an important role in the mechanically overloaded plantaris muscle of rats. Histochem. Cell Biol. 119: 149160.[Medline] Shalizi, A.K. and Bonni, A. 2005. Brawn for Brains: The role of MEF2 proteins in the developing nervous system. Curr. Top. Dev. Biol. 69: 239266.[Medline] Sharan, R., Ovcharenko, I., Ben-Hur, A., and Karp, R.M. 2003. CREME: A framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics 19: i283i291.[Abstract] Sharan, R., Ben-Hur, A., Loots, G.G., and Ovcharenko, I. 2004. CREME: Cis-Regulatory Module Explorer for the human genome. Nucleic Acids Res. 32: W253W256. Shih, D.Q., Bussen, M., Sehayek, E., Ananthanarayanan, M., Shneider, B.L., Suchy, F.J., Shefer, S., Bollileni, J.S., Gonzalez, F.J., and Breslow, J.L., et al. 2001. Hepatocyte nuclear factor-1 Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., and Moqrich, A., et al. 2002. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. 99: 44654470. Sun, Q., Chen, G., Streb, J.W., Long, X., Yang, Y., Stoeckert Jr., C.J., and Miano, J.M. 2006. Defining the mammalian CArGome. Genome Res. 16: 197207. Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S., and Lawrence, C.E. 2004. Decoding human regulatory circuits. Genome Res. 14: 19671974. Uchikawa, M., Takemoto, T., Kamachi, Y., and Kondoh, H. 2004. Efficient identification of regulatory sequences in the chicken genome by a powerful combination of embryo electroporation and genome comparison. Mech. Dev. 121: 11451158.[CrossRef][Medline] Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., and An, P., et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520562.[CrossRef][Medline] Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., and Schacherer, F. 2000. TRANSFAC: An integrated system for gene expression regulation. Nucleic Acids Res. 28: 316319. Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway, H., and Kelly, K., et al. 2005. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3: e7. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||