|
|
|
|
Published online before print
July 15, 2005, 10.1101/gr.3642605 Genome Res. 15:1051-1060, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00
Letter Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences1 Center for Comparative Genomics and Bioinformatics, Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 2 Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 3 Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 4 Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 5 Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 6 National Human Genome Research Institute, Rockville, Maryland 20852, USA
Techniques of comparative genomics are being used to identify candidate functional DNA sequences, and objective evaluations are needed to assess their effectiveness. Different analytical methods score distinctive features of whole-genome alignments among human, mouse, and rat to predict functional regions. We evaluated three of these methods for their ability to identify the positions of known regulatory regions in the well-studied HBB gene complex. Two methods, multispecies conserved sequences and phastCons, quantify levels of conservation to estimate a likelihood that aligned DNA sequences are under purifying selection. A third function, regulatory potential (RP), measures the similarity of patterns in the alignments to those in known regulatory regions. The methods can correctly identify 50%60% of noncoding positions in the HBB gene complex as regulatory or nonregulatory, with RP performing better than do other methods. When evaluated by the ability to discriminate genomic intervals, RP reaches a sensitivity of 0.78 and a true discovery rate of 0.6. The performance is better on other reference sets; both phastCons and RP scores can capture almost all regulatory elements in those sets along with 7% of the human genome.
A major aim of genomics is to identify the functional segments of DNA (Collins et al. 2003
DNA segments needed to control the level, developmental timing, and spatial pattern of gene expression, termed cis-regulatory modules (CRMs), are even more difficult to identify accurately. Few constraints analogous to the rules of the genetic code, used for coding exons, can be universally applied to CRMs (Wasserman and Sandelin 2004
Interspecies comparisons can be used to infer function if aligned sequences are scored for the likelihood that they are under evolutionary constraint (purifying selection), which is often measured by an evolutionary rate slower than that observed in neutral DNA (Waterston et al. 2002
In addition, aligned sequences can be analyzed for features other than degree of constraint in order to discriminate between alignments in distinct functional classes. Elnitski et al. (2003
In this article, we derive a set of all the known regulatory elements in the intensively studied
Reference set of known regulatory elements from the HBB complex DNA sequences needed to regulate the set of developmentally controlled, erythroid-specific genes encoding -globin and its relatives have been studied intensively, and in this gene complex, the fraction of human sequences aligning with mouse and rat (35%) is very close to the genome average (Gibbs et al. 2004
One limitation to using interspecies conservation to predict CRMs is that some bona fide regulatory elements do not align between the species being examined. Of the 23 CRMs in the human HBB complex, 20 are conserved in mouse, 19 are conserved in rat, and only four are conserved in chicken (Table 1), based on BLASTZ pairwise alignments (Schwartz et al. 2003b It is important to realize that knowledge of CRMs is still incomplete, even in a rigorously studied region such as the HBB complex. DNA intervals identified by comparative genomics methods but not in our reference set are considered false positives (FPs), but in reality, they could be regulatory elements not yet tested for function.
Calibration of discriminatory thresholds
Our goal is to find the threshold for each score that optimizes the ability to find the CRMs (high Sn) while minimizing the amount of other DNA that also passes the threshold (high Sp; see Methods). As expected, Sn decreases and Sp increases with increasing score thresholds for each method (Fig. 2, center panel). Optimal performance occurs at the crossover point between the Sn and Sp curves. The Sn for RP at this point is higher than that for phastCons or MCS (Table 2), but it is only
In this binary discrimination analysis, the optimal threshold for RP scores is 0.006 (Table 2). The fact that it is a negative number is initially surprising, because negative values mean that the patterns of the alignments are more like those in the negative training set (aligned ancestral repeats) than those in the known regulatory elements. However, it is important to realize that in this binary discrimination analysis, the methods are evaluated by how much of all the regulatory regions are found.
Another important feature to evaluate is whether any part of a regulatory element passes a given threshold. Thus, we conducted a second analysis, in which the regulatory regions are considered as intervals (not individual positions) and the relevant score is the maximum within the interval. The intervals containing nonregulatory regions are continuous runs of positions whose RP scores meet or exceed the threshold; thus their size and number varies with the threshold. They also were evaluated by the maximum score within the interval. We computed the fraction of regulatory region intervals that exceed a threshold score, called the interval Sn, or Snint, and the fraction of intervals exceeding a threshold that are regulatory regions, called the "true discovery rate." With this approach, an RP threshold of zero achieves a Snint of 0.78 and a true discovery rate of The RP scores were trained on a set of 93 known regulatory regions, which included four of the CRMs in the reference set from the HBB gene complex, namely, HS2 of the LCR and promoters for the HBE1, HBG2, and HBB genes. To remove bias introduced by this overlap in training and testing sets, we repeated the training of the RP model excluding the CRMs from the HBB gene complex. The threshold, Sn, and Sp of RP scores generated in this way are similar to the ones generated by including the CRMs from the HBB gene complex (Table 2).
Genome-wide evaluation of alignment scores for regulatory elements
The distribution of RP scores for positions in the human genome (including coding regions) that align with mouse and rat shows that 20% has RP scores above zero (Fig. 3A). RP scores can be computed only for the 35% of the human genome that aligns with mouse and rat (Gibbs et al. 2004 7% of the human genome has RP scores in the range that is effective in finding CRMs in the HBB complex. Almost all of the known regulatory regions and miRNA from dispersed loci have positive RP scores (Fig. 3A). Thus, the RP threshold of zero should capture most of the functional regions whose genomic alignments have properties similar to those in these data sets. The distribution of phastCons scores for the sequences in the human genome that align with mouse and rat are dramatically skewed toward low values (Fig. 3B). About 22% of the aligning human positions have phastCons scores that exceed 0.13, the threshold determined from CRMs in the HBB gene complex. PhastCons scores in all the sets of functional DNA examined exceed this threshold, with the scores for miRNAs and developmental enhancers being particularly high (Fig. 3B). These results show that phastCons is a strong discriminatory function genome-wide, with a considerable dynamic range between scores for the known functional elements and the bulk genome scores. One of the intriguing results for both methods is that their diagnostic effectiveness for the HBB complex reference set is less than that observed for the other sets of regulatory elements. The cumulative distributions for both scores in the HBB complex CRMs are shifted considerably to the left of the scores for other sets of CRMs, coding sequences, and miRNAs (Fig. 3). This illustrates the difficulty of finding all the CRMs in this gene complex.
We organized the extensive experimental results on the DNA segments regulating expression of the HBB gene complex into a reference data set, and then used this data set to evaluate three different approaches for analyzing multispecies alignments to find CRMs. Two of the methods, MCSs and phastCons, are based exclusively on conservation, whereas a third method, RP, uses a pattern-matching discriminatory function within the conserved regions. All three methods had some success in detecting the CRMs in the reference set, with sensitivities and specificities ranging from 50%60% when evaluated on all nucleotide positions. The RP function performed better than did the conservation measures on the reference set of CRMs in the HBB gene complex. This is expected, given that high conservation should reflect the effects of purifying selection for any function, whereas the RP function was trained to find patterns in alignments similar to those in transcriptional regulatory regions. When the performance was evaluated on the maximum score in each interval, RP scores reached a Sn of 78% with a true discovery rate of 60%. Strikingly, both conservation-based scores and RP scores perform much better against other sets of CRMs.
The RP and phastCons scores are deposited in databases such as the Genome Browser (Kent et al. 2002
Although the prospects for application of the current measures to experimental investigation of gene regulation are promising, our study also illustrates some important limitations in using multispecies alignments to predict CRMs. First, RP scores fail to distinguish at least 20% of the conserved CRMs in the HBB complex, and other methods have less Sn. Fortunately, for other reference data sets, the performance is better, but it is important to realize that some conserved CRMs will be missed using current methods.
Second, some human CRMs have no reliable matches with mouse and sequence, as is the case for four of the 23 CRMs known in the human HBB complex. Obviously, CRMs such as these are invisible to predictive algorithms based on primaterodent alignments, but they may be detectable over shorter phylogenetic distances using techniques such as phylogenetic shadowing (Boffelli et al. 2003 A third limitation to the use of comparative genomics approaches for finding potential cis-regulatory elements is the incomplete knowledge of protein-coding regions. All the methods examined here, including RP, give high scores to exons. Exons that have not been annotated will not be excluded from the analysis of "noncoding" regions, and thus they can contribute to FPs in the predictions.
The reasons for Sn differing among data sets are of considerable interest. Recent studies show that genes encoding proteins involved in developmental and transcriptional regulation tend to have highly constrained CRMs (Sandelin et al. 2004b
Improvements are expected in the predictive power of all the scores being computed on multispecies alignments. The discriminatory power of alignments increases as more sequences are added, both for a particular locus (Thomas et al. 2003
Reference sets of transcriptional regulatory regions The -globin gene (HBB) complex contains several regulatory regions that have been well studied experimentally. A set of 23 experimentally determined CRMs was compiled from a literature survey and mapped within a 95-kb interval (chr11:51850015280000 in hg16), which encompasses the HBB complex and terminates at the surrounding olfactory receptor genes (Bulger et al. 2000
Of the 23 CRMs in this reference set, 19 can be found in multiple alignments of the human, mouse, and rat sequences. However, only 18 were available for the evaluation of the scores computed on the multiple alignment of hg16, mm3, and rn3 (see below) because much of the sequence of hypersensitive site HS4 (Stamatoyannopoulos et al. 1995
A set of 40,000 predicted promoters were compiled by Trinklein et al. (2003
Alignments
Scores based on alignments
MCS
phastCons
Evaluation of alignment scores for detecting known CRMs
A separate analysis was used to evaluate the ability of each score to discriminate the regulatory intervals from nonregulatory DNA, based on the highest score in each interval. In this analysis, the average value for each of the three scores was computed in 100-bp windows (with a 1-bp slide) for all the aligned positions in the HBB complex (and genome-wide for phastCons and RP). Windows whose average score met or exceeded a given threshold comprised the predicted set for that threshold. Overlapping windows were combined to make a single contiguous interval that passed the threshold. Regulatory regions that overlapped an interval that passed the threshold were counted as TPs, and those that did not were FNs. The intervals that passed the threshold but did not overlap with a regulatory region were counted as FPs. Note that the size of each regulatory interval is determined experimentally as the region required for regulation. The sizes vary among CRMs but are not affected by the score threshold. The sizes also vary among the FP intervals, being determined by the scores of overlapping windows; in addition, the sizes can differ for each threshold. Defining a TN interval is difficult, and thus the evaluation was based on interval-based Sn (Snint = TP/[TP + FN]) and the true discovery rate (TP/[TP + FP]). The optimal threshold is approximately the crossover between Snint and the true discovery rate. This evaluation is similar to procedures used to evaluate gene prediction programs (Burset and Guigó 1996 A similar discrimination based on the maximum RP or phastCons score in each interval was performed for several sets of functional elements in the human genome, and cumulative distributions are shown in Figure 3. These were compared with the cumulative distributions of the phastCons scores in all aligned positions in the human genome ("bulk DNA") and of every fifth aligned position for RP scores.
Availability
The reference set of regulatory regions for the HBB gene complex is available at http://www.bx.psu.edu/~ross/dataset/DatasetHome.html, in both hg16 and hg17 coordinates. More information, along with references to the supporting literature, is recorded in dbERGE II (Elnitski et al. 2005 The operations for the evaluations reported here can be performed in a UNIX environment using command-line pipes and wrapper scripts for software that is available on request.
This work was supported by NIH grants DK65806 (R.H.), HG02238 (W.M.), and HG02325 (L.E.). We thank Adam Siepel and David Haussler for access to the phastCons scores and programs prior to publication.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3642605. Article published online before print in July 2005.
7 Corresponding author.
Allan, M., Lanyon, G., and Paul, J. 1983. Multiple origins of transcription in the 4.5 kb upstream of the
Antoniou, M., deBoer, E., Habets, G., and Grosveld, F. 1988. The human
Behringer, R.R., Hammer, R.E., Brinster, R.L., Palmiter, R.D., and Townes, T.M. 1987. Two 3' sequences direct adult erythroid-specific expression of human
Bender, M., Reik, A., Close, J., Telling, A., Epner, E., Fiering, S., Hardison, R., and Groudine, M. 1998. Description and targeted deletion of 5' HS5 and 6 of the mouse Berman, B.P., Pfeiffer, B.D., Laverty, T.R., Salzberg, S.L., Rubin, G.M., Eisen, M.B., and Celniker, S.E. 2004. Computational identification of developmental enhancers: Conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5: R61.[CrossRef][Medline] Blanchette, M., Schwikowski, B., and Tompa, M. 2002. Algorithms for phylogenetic footprinting. J. Comput. Biol. 9: 211223.[CrossRef][Medline]
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708715.
Bodine, D. and Ley, T. 1987. An enhancer element lies 3' to the human A
Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 13911394. Brent, M.R. and Guigó, R. 2004. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14: 264272.[CrossRef][Medline]
Bulger, M. and Groudine, M. 1999. Looping versus linking: Toward a model for long-distance gene activation. Genes & Dev. 13: 24652477.
Bulger, M., Bender, M.A., von Doorninck, J.H., Wertman, B., Farrell, C., Felsenfeld, G., Groudine, M., and Hardison, R. 2000. Comparative structural and functional analysis of the olfactory receptor genes flanking the human and mouse
Bulger, M., Schubeler, D., Bender, M.A., Hamilton, J., Farrell, C.M., Hardison, R.C., and Groudine, M. 2003. A complex chromatin "landscape" revealed by patterns of nuclease sensitivity and histone modification within the mouse Burset, M.R. and Guigó, R. 1996. Evaluation of gene structure prediction programs. Genomics 34: 353367.[CrossRef][Medline]
Cao, S.X., Gutman, P.D., Dave, H.P.G., and Schechter, A.N. 1989. Identification of a transcriptional silencer in the 5'-flanking region of the human Cavallesco, R. and Tuan, D. 1997. Modulatory subdomains of the HS2 enhancer differentially regulate enhancer activity in erythroid cells at different developmental stages. Blood Cells Mol. Dis. 23: 826.[CrossRef][Medline]
Chao, M.V., Mellon, P., Charnay, P., Maniatis, T., and Axel, R. 1983. The regulated expression of Chiaromonte, F., Weber, R.J., Roskin, K.M., Diekhans, M., Kent, W.J., and Haussler, D. 2003. The share of human genomic DNA under selection estimated from humanmouse genomic alignments. In The genome of Homo sapiens, pp. 245254. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835847.[CrossRef][Medline] Cooper, G.M. and Sidow, A. 2003. Genomic regulatory regions: Insights from comparative sequence analysis. Curr. Opin. Genet. Dev. 13: 604610.[CrossRef][Medline]
Cooper, G.M., Brudno, M., Stone, E.A., Dubchak, I., Batzoglou, S., and Sidow, A. 2004. Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14: 539548.
Cowie, A. and Myers, R.M. 1988. DNA sequences involved in transcriptional regulation of the mouse
Dermitzakis, E.T. and Clark, A.G. 2002. Evolution of transcription factor binding sites in mammalian gene regulatory regions: Conservation and turnover. Mol. Biol. Evol. 19: 11141121.
Dhar, V., Nandi, A., Schildkraut, C.L., and Skoultchi, A.I. 1990. Erythroid-specific nuclease-hypersensitive sites flanking the human
Elnitski, L., Li, J., Noguchi, C.T., Miller, W., and Hardison, R. 2001. A negative cis-element regulates the level of enhancement of hypersensitive site 2 of the
Elnitski, L., Hardison, R.C., Li, J., Yang, S., Kolbe, D., Eswara, P., O'Connor, M.J., Schwartz, S., Miller, W., and Chiaromonte, F. 2003. Distinguishing regulatory DNA from neutral sites. Genome Res. 13: 6472.
Elnitski, L., Giardine, B., Shah, P., Zhang, Y., Riemer, C., Weirauch, M., Burhans, R., Miller, W., and Hardison, R.C. 2005. Improvements to GALA and dbERGEII: Databases featuring genomic sequence alignment, annotation and experimental results. Nucl. Acids Res. 33: D466D470.
The ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636640.
Farrell, C.M., West, A.G., and Felsenfeld, G. 2002. Conserved CTCF insulator elements flank the mouse and human Felsenstein, J. and Churchill, G.A. 1996. A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13: 93104.[Abstract]
Fleenor, D.E. and Kaufman, R.E. 1993. Characterization of the DNaseI hypersensitive site 3' of the human Forget, B.G. 2001. Molecular genetics of the human globin genes. In Disorders of hemoglobin: Genetics, pathophysiology, and clinical management (eds. M.H. Steinberg et al.), pp. 117130. Cambridge University Press, Cambridge, UK.
Forrester, W.C., Thompson, C., Elder, J.T., and Groudine, M. 1986. A developmentally stable chromatin structure in the human
Fraser, P., Hurst, J., Collis, P., and Grosveld, F. 1990. DNase I hypersensitive sites 1, 2 and 3 of the human
Fraser, P., Pruzina, S., Antoniou, M., and Grosveld, F. 1993. Each hypersensitive site of the human
Frazer, K.A., Elnitski, L., Church, D., Dubchak, I., and Hardison, R.C. 2003. Cross-species sequence comparisons: A review of methods and available resources. Genome Res. 13: 112.
Giardine, B.M., Elnitski, L., Riemer, C., Makalowska, I., Schwartz, S., Miller, W., and Hardison, R.C. 2003. GALA, a database for genomic sequence alignments and annotations. Genome Res. 13: 732741. Gibbs, R.A., Weinstock, G.M., Metzker, M.L., Muzny, D.M., Sodergren, E.J., Scherer, S., Scott, G., Steffen, D., Worley, K.C., Burch, P.E., et al. 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428: 493521.[CrossRef][Medline]
Gimble, J.M., Max, E.E., and Ley, T.J. 1988. High-resolution analysis of the human
Gong, Q.-H., Stern, J., and Dean, A. 1991. Transcriptional role of a conserved GATA-1 site in the human
Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res. 32: D109D111. Hardison, R. 2001. Organization, evolution and regulation of the globin genes. In Disorders of hemoglobin: Genetics, pathophysiology, and clinical management (eds. M.H. Steinberg et al.), pp. 95116. Cambridge University Press, Cambridge, UK. Hardison, R.C. 2003. Comparative genomics. PLoS Biol. 1: 156160.
Hardison, R., Slightom, J.L., Gumucio, D.L., Goodman, M., Stojanovic, N., and Miller, W. 1997. Locus control regions of mammalian Hardison, R.C., Chiaromonte, F., Kolbe, D., Wang, H., Petrykowska, H., Elnitski, L., Yang, S., Giardine, B., Zhang, Y., Riemer, C., et al. 2003a. Global predictions and tests of erythroid regulatory regions. In The genome of Homo sapiens, pp. 335344. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
Hardison, R.C., Roskin, K.M., Yang, S., Diekhans, M., Kent, W.J., Weber, R., Elnitski, L., Li, J., O'Connor, M., Kolbe, D., et al. 2003b. Covariation in frequencies of substitution, deletion, transposition and recombination during eutherian evolution. Genome Res. 13: 1326. Hillier, L.W., Miller, W., Birney, E., Warren, W., Hardison, R.C., Ponting, C.P., Bork, P., Burt, D.W., Groenen, M.A.M., Delany, M.E., et al. 2004. Sequencing and comparative analysis of the chicken genome. Nature 432: 695716.[CrossRef][Medline] International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931945.[CrossRef][Medline]
Jackson, J.D., Petrykowska, H., Philipsen, S., Miller, W., and Hardison, R. 1996. Role of DNA sequences outside the cores of DNase hypersensitive sites (HSs) in functions of the
Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32: D493D496.
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 9961006.
Kolbe, D., Taylor, J., Elnitski, L., Eswara, P., Li, J., Miller, W., Hardison, R., and Chiaromonte, F. 2004. Regulatory potential scores from genome-wide three-way alignments of human, mouse and rat. Genome Res. 14: 700707. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline]
Li, J., Noguchi, C., Miller, W., Hardison, R., and Schechter, A. 1998. Multiple regulatory elements in the 5'-flanking sequence of the human
Li, Q., Peterson, K., Fang, X., and Stamatoyannopoulos, G. 2002. Locus control regions. Blood 100: 30773086.
Liu, Y., Liu, X.S., Wei, L., Altman, R.B., and Batzoglou, S. 2004. Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res. 14: 451458.
Lloyd, J.A., Case, S.S., Ponce, E., and Lingrel, J.B. 1994. Positive transcriptional regulation of the human Ludwig, M.Z., Bergman, C., Patel, N.H., and Kreitman, M. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403: 564567.[CrossRef][Medline]
Margulies, E.H., Blanchette, M., NISC Comparative Sequencing Program, Haussler, D., and Green, E.D. 2003. Identification and characterization of multi-species conserved sequences. Genome Res. 13: 25072518.
McAuliffe, J.D., Pachter, L., and Jordan, M.I. 2004. Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Bioinformatics 20: 18501860.
McDonagh, K.T., Lin, H.J., Lowrey, C.H., Bodine, D.M., and Nienhuis, A.W. 1991. The upstream region of the human Miller, W., Makova, K.D., Nekrutenko, A., and Hardison, R.C. 2004. Comparative genomics. Annu. Rev. Genomics Hum. Genet. 5: 1556.[CrossRef][Medline]
Molete, J.M., Petrykowska, H., Bouhassira, E.E., Feng, Y.Q., Miller, W., and Hardison, R.C. 2001. Sequences flanking hypersensitive sites of the
Molete, J.M., Petrykowska, H., Sigg, M., Miller, W., and Hardison, R. 2002. Functional and binding studies of HS3.2 of the
Myers, R.M., Tilly, K., and Maniatis, T. 1986. Fine structure genetic analysis of a
Pedersen, J.S. and Hein, J. 2003. Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19: 219227. Pennacchio, L.A. and Rubin, E.M. 2001. Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2: 100109.[CrossRef][Medline]
Perez-Stable, C. and Costantini, F. 1990. Role of fetal G
Philipsen, S., Talbot, D., Fraser, P., and Grosveld, F. 1990. The
Philipsen, S., Pruzina, S., and Grosveld, F. 1993. The minimal requirements for activity in transgenic mice of hypersensitive site 3 of the Plessy, C., Dickmeis, T., Chalmel, F., and Strahle, U. 2005. Enhancer sequence conservation between vertebrates is favoured in developmental regulator genes. Trends Genet. 21: 207210.[CrossRef][Medline]
Pruitt, K.D. and Maglott, D.R. 2001. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29: 137140.
Pruzina, S., Hanscombe, O., Whyatt, D., Grosveld, F., and Philipsen, S. 1991. Hypersensitive site 4 of the human
Ryan, T.M., Behringer, R.R., Martin, N.C., Townes, T.M., Palmiter, R.D., and Brinster, R.L. 1989. A single erythroid-specific DNase I super-hypersensitive site activates high levels of human
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., and Lenhard, B. 2004a. JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32: D91D94. Sandelin, A., Bailey, P., Bruce, S., Engstrom, P.G., Klos, J.M., Wasserman, W.W., Ericson, J., and Lenhard, B. 2004b. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics 5: 99.[CrossRef][Medline]
Schwartz, S., Elnitski, L., Li, M., Weirauch, M., Riemer, C., Smit, A., NISC Comparative Sequencing Program, Green, E.D., Hardison, R.C., and Miller, W. 2003a. MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res. 31: 35183524. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. 2003b. Humanmouse alignments with BLASTZ. Genome Res. 13: 103105.
Shelton, D.A., Stegman, L., Hardison, R., Miller, W., Slightom, J.L., Goodman, M., and Gumucio, D.L. 1997. Phylogenetic footprinting of hypersensitive site 3 of the Siepel, A. and Haussler, D. 2003. Combining phylogenetic and hidden Markov models in biosequence analysis. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), pp. 277286. ACM Press, New York. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Kent, W.J., Miller, W., and Haussler, D. 2005. Evolutionarily conserved elements in vertebrate, fly, worm and yeast genomes. Genome Res. (this issue).
Slightom, J., Bock, J., Tagle, D., Gumucio, D., Goodman, M., Stojanovic, N., Jackson, J., Miller, W., and Hardison, R. 1997. The complete sequences of the galago and rabbit Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B. 1998. Comprehensive | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||