|
|
|
|
Vol. 10, Issue 9, 1304-1306, September 2000
REPORT
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human-mouse DNA comparison studies have discovered numerous conserved noncoding sequences of which only a fraction has been functionally investigated A question therefore remains as to whether most of these noncoding sequences are conserved because of functional constraints or are the result of a lack of divergence time.
[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF276990.]
| |
INTRODUCTION |
|---|
|
|
|---|
Based on the supposition that actively conserved human-mouse noncoding sequences will be present in a third mammal, whereas noncoding regions that are similar because of an insufficient accumulation of random mutations will be absent, we sequenced ~200 kb of orthologous human (5q31), mouse (chromosome 11), and dog (chromosome 4) DNA. The functions of conserved noncoding sequences (syntenous gene regulatory elements) are unaffected by relatively small random insertions or deletions of base pairs, and therefore, standard local alignment algorithms that identify ungapped conserved regions are not ideally suited for their discovery. For this reason, comparative analysis was performed by generating pairwise global sequence alignments [human-dog (H/D), human-mouse (H/M), and mouse-dog (M/D)], and we developed an algorithm to search for blocks of similarity in the alignments. To view the conserved regions in the three pairwise sequence alignments simultaneously, we developed a new visualization tool, VISTA (visualization tool for alignment). Inspection of the graphical output of VISTA revealed that the H/D, H/M, and M/D alignments have almost identical patterns of noncoding sequence conservation (Fig. 1). The content and order of the six genes in this 200-kb region are the same for all three species; however, the coding regions of two genes, Interleukin-4 and Interleukin-13, are only moderately conserved (~50% identity).
|
Previous H/M DNA comparison studies have used arbitrary cutoff criteria
(
X% identity over
Y bp) to define
noncoding sequences as evolutionarily conserved (Loots et al. 2000
).
Here, we statistically determine cutoff criteria for defining conserved
noncoding sequences by examining the three pairwise sequence
alignments, H/D, H/M, and M/D, using intersection/union (I/U) analyses
(Table 1). The cutoffs for which the sum of the three
pairwise I/U values (largest number of overlapping, and least number of
unique, conserved noncoding elements) was maximal were as follows: H/D,
92% identity over
120 bp; H/M,
80% identity over
120 bp; and M/D,
77% identity over
120 bp. These
data indicate that the high percent identity noncoding sequences in the
~200-kb region examined are most similar in humans and dogs and
therefore suggest that H/D DNA comparisons may be better than H/M DNA
comparisons for detecting conserved noncoding elements.
|
At the optimal cutoffs, 16 H/D conserved noncoding sequences (CNSs)
were identified of which 14 were present in all three pairwise sequence
alignments. Two of the CNSs (at 97 kb and 108 kb) have been
experimentally determined to be gene regulatory elements supporting the
cutoff criteria obtained from the I/U analyses (Henkel et al. 1992
;
Loots et al. 2000
). The two CNSs present in humans and dogs (at 2 kb
and 98 kb) but not in mice (Fig. 1) may represent gene regulatory
elements that either are not conserved at the sequence level between
humans and mice or have been lost in mice during evolution. Using the
statistically determined percent identity and length thresholds
resulted in few putative false negatives (the CNSs are present in all
three species but fall slightly below the cutoff value); however, a significant number of exons in the genes within the region do not meet
these criteria (Fig. 1). Less stringent cutoff criteria would have
included these exons but resulted in the overprediction of noncoding
sequences as conserved (false positives).
The vast majority of the H/M CNSs identified in the 200-kb region examined are also present in dog. This is an important finding as it suggests that a large fraction of the high percent identity noncoding elements identified through H/M DNA comparison studies are conserved because of functional constraints. A problem with two-species sequence comparison studies is that cutoff values for defining noncoding elements as conserved are based on biologists' intuition for what constitutes a biologically significant threshold. Our simultaneous comparison of orthologous sequences from three mammals allowed us statistically to determine percent identity and length thresholds to define actively CNSs. These cutoff values may be useful guidelines for identifying CNSs in genomic regions for which only human and mouse DNA sequences are available.
| |
METHODS |
|---|
|
|
|---|
Genomic Sequences
Human 5q31 (NT 000170) and mouse chromosome 11 (AC005742)
sequences were obtained as described (Loots et al. 2000
). A dog chromosome 4 bacterial artificial chromosome (BAC) was isolated from
BACPAC resources library (RPCI-81), sequenced in draft format, and the
contigs were ordered and oriented (AF276990).
Sequence Alignments and Visualization
Sequences were globally aligned using GLASS (global
alignment system) (Batzoglou et al.
2000
), and conserved regions were identified by calculating the percent
of identical nucleotides within a 100-nucleotide window moved in
25-nucleotide increments across the alignments. The source code of
VISTA, the Java program for visualization of alignments, is available
upon request, and a VISTA server can be accessed at
http://www-gsd.lbl.gov/vista.
I/U Analysis
Conserved segments with percent identity X and length
Y were defined to be regions in which every contiguous
subsegment of length Y was at least X% identical to
its paired sequence. These segments were then merged to define the
conserved regions. The I/U analyses were performed to define length and
identity cutoffs as follows: The set of conserved regions between the
H/M (denoted by A) and the M/D (denoted by B) were
identified. Regions a
A and
b
B were considered equal if they
overlapped in the mouse sequence. I/U was then obtained by computing
|A
B|/|A
B| where |A
B| = min(|A
B|,|B
A|)
and
|A
B| = |A| + |B|
max(|A
B|,|B
A|).
|A
B| is the number of regions
in A that are equal to regions in B. This number
might be different from |B
A|
because it is possible that multiple regions in one alignment are equal
to one region in the other.
| |
ACKNOWLEDGMENTS |
|---|
We thank Keith Lewis, Willow Dean, and Cathy Blankespoor for DNA sequencing and Nila Patil for valuable remarks on the manuscript. This work was supported by the following grants: U.S. Department of Energy contract DE-AC376SF00098 and NIH GM-5748202 (K.A.F.)
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Present address: Affymetrix, Santa Clara, California 95051 USA.
5 Corresponding author.
E-MAIL kelly_frazer{at}affymetrix.com; FAX (408) 481-0422.
Article and publication are at www.genome.org/cgi/doi/10.1101/gr.142200.
| |
REFERENCES |
|---|
|
|
|---|
Received March 28, 2000; accepted in revised form July 12, 2000.
This article has been cited by other articles:
![]() |
A. Tsirigos and I. Rigoutsos Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs Nucleic Acids Res., May 1, 2008; (2008) gkn155v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-H. Kim, P. N. Bogner, S.-H. Baek, N. Ramnath, P. Liang, H.-R. Kim, C. Andrews, and Y.-M. Park Up-Regulation of Peroxiredoxin 1 in Lung Cancer and Its Implication as a Prognostic and Therapeutic Target Clin. Cancer Res., April 15, 2008; 14(8): 2326 - 2333. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Huang, C. Zhu, H. Wang, E. Horvath, and E. A. Eklund The Interferon Consensus Sequence-binding Protein (ICSBP/IRF8) Represses PTPN13 Gene Transcription in Differentiating Myeloid Cells J. Biol. Chem., March 21, 2008; 283(12): 7921 - 7935. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Van Laer, E. Van Eyken, E. Fransen, J. R. Huyghe, V. Topsakal, J.-J. Hendrickx, S. Hannula, E. Maki-Torkko, M. Jensen, K. Demeester, et al. The grainyhead like 2 gene (GRHL2), alias TFCP2L3, is associated with age-related hearing impairment Hum. Mol. Genet., January 15, 2008; 17(2): 159 - 169. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Long and J. M. Miano Remote Control of Gene Expression J. Biol. Chem., June 1, 2007; 282(22): 15941 - 15945. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Pezzolesi, K. M. Zbuk, K. A. Waite, and C. Eng Comparative genomic and functional analyses reveal a novel cis-acting PTEN regulatory element as a highly conserved functional E-box motif deleted in Cowden syndrome Hum. Mol. Genet., May 1, 2007; 16(9): 1058 - 1071. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sosinsky, B. Honig, R. S. Mann, and A. Califano Discovering transcriptional regulatory regions in Drosophila by a nonalignment method for phylogenetic footprinting PNAS, April 10, 2007; 104(15): 6305 - 6310. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sabherwal, F. Bangs, R. Roth, B. Weiss, K. Jantz, E. Tiecke, G. K. Hinkel, C. Spaich, B. P. Hauffa, H. van der Kamp, et al. Long-range conserved non-coding SHOX sequences regulate expression in developing chicken limb and are associated with short stature phenotypes in human patients Hum. Mol. Genet., January 15, 2007; 16(2): 210 - 222. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. R. Morton and S. I. Wright Selective Constraints on Codon Usage of Nuclear Genes from Arabidopsis thaliana Mol. Biol. Evol., January 1, 2007; 24(1): 122 - 129. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lamy, U. Rothbacher, D. Caillol, and P. Lemaire Ci-FoxA-a is the earliest zygotic determinant of the ascidian anterior ectoderm and directly activates Ci-sFRP1/5 Development, August 1, 2006; 133(15): 2835 - 2844. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Rigoutsos, T. Huynh, K. Miranda, A. Tsirigos, A. McHardy, and D. Platt Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes PNAS, April 25, 2006; 103(17): 6605 - 6610. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li, S. Zhong, and W. H. Wong Reliable prediction of transcription factor binding sites by phylogenetic verification PNAS, November 22, 2005; 102(47): 16945 - 16950. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Pearce, D.-A. Tregouet, A. Samnegard, A. R. Morgan, C. Cox, A. Hamsten, P. Eriksson, and S. Ye Haplotype Effect of the Matrix Metalloproteinase-1 Gene on Risk of Myocardial Infarction Circ. Res., November 11, 2005; 97(10): 1070 - 1076. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-P. Kim, B.-G. Kim, J. Letterio, and W. J. Leonard Smad-dependent Cooperative Regulation of Interleukin 2 Receptor {alpha} Chain Gene Expression by T Cell Receptor and Transforming Growth Factor-{beta} J. Biol. Chem., October 7, 2005; 280(40): 34042 - 34047. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Chi, P. K. Chatterjee, W. Wilson III, S.-X. Zhang, F. J. DeMayo, and R. J. Schwartz Complex cardiac Nkx2-5 gene expression activated by noggin-sensitive enhancers followed by chamber-specific modules PNAS, September 20, 2005; 102(38): 13490 - 13495. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Cooper, E. A. Stone, G. Asimenos, NISC Comparative Sequencing Program, E. D. Green, S. Batzoglou, and A. Sidow Distribution and intensity of constraint in mammalian genomic sequence Genome Res., July 1, 2005; 15(7): 901 - 913. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. S. Pappu, E. J. Ostrin, B. W. Middlebrooks, B. T. Sili, R. Chen, M. R. Atkins, R. Gibbs, and G. Mardon Dual regulation and redundant function of two eye-specific enhancers of the Drosophila retinal determination gene dachshund Development, June 15, 2005; 132(12): 2895 - 2905. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-H. KANG and S.-H. IM Differential Regulation of the IL-10 Gene in Th1 and Th2 T Cells Ann. N.Y. Acad. Sci., June 1, 2005; 1050(1): 97 - 107. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Negre, S. Casillas, M. Suzanne, E. Sanchez-Herrero, M. Akam, M. Nefedov, A. Barbadilla, P. de Jong, and A. Ruiz Conservation of regulatory sequences and gene expression patterns in the disintegrating Drosophila Hox gene complex Genome Res., May 1, 2005; 15(5): 692 - 700. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Lee, S. F. Baum, J. Alvarez, A. Patel, D. H. Chitwood, and J. L. Bowman Activation of CRABS CLAW in the Nectaries and Carpels of Arabidopsis PLANT CELL, January 1, 2005; 17(1): 25 - 36. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Im, A. Hueber, S. Monticelli, K.-H. Kang, and A. Rao Chromatin-level Regulation of the IL10 Gene in T Cells J. Biol. Chem., November 5, 2004; 279(45): 46818 - 46825. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Shnyreva, W. M. Weaver, M. Blanchette, S. L. Taylor, M. Tompa, D. R. Fitzpatrick, and C. B. Wilson Evolutionarily conserved sequence elements that positively regulate IFN-{gamma} expression in T cells PNAS, August 24, 2004; 101(34): 12622 - 12627. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Frazer, C. M. Wade, D. A. Hinds, N. Patil, D. R. Cox, and M. J. Daly Segmental Phylogenetic Relationships of Inbred Mouse Strains Revealed by Fine-Scale Analysis of Sequence Variation Across 4.6 Mb of Mouse Genome Genome Res., August 1, 2004; 14(8): 1493 - 1500. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Hatcher, N. Y.S.-G. Diman, M.-S. Kim, D. Pennisi, Y. Song, M. M. Goldstein, T. Mikawa, and C. T. Basson A role for Tbx5 in proepicardial cell migration during cardiogenesis Physiol Genomics, July 8, 2004; 18(2): 129 - 140. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M.G. Smits, B. F.M. van Zutphen, R. H.A. Plasterk, and E. Cuppen Genetic Variation in Coding Regions Between and Within Commonly Used Inbred Rat Strains Genome Res., July 1, 2004; 14(7): 1285 - 1290. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Frazer, L. Pachter, A. Poliakov, E. M. Rubin, and I. Dubchak VISTA: computational tools for comparative genomics Nucleic Acids Res., July 1, 2004; 32(suppl_2): W273 - W279. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Hardy and L. A. J. O'Neill The Murine Irak2 Gene Encodes Four Alternatively Spliced Isoforms, Two of Which Are Inhibitory J. Biol. Chem., June 25, 2004; 279(26): 27699 - 27708. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Tullai, M. E. Schaffer, S. Mullenbrock, S. Kasif, and G. M. Cooper Identification of Transcription Factor Binding Sites Upstream of Human Genes Regulated by the Phosphatidylinositol 3-Kinase and MEK/ERK Signaling Pathways J. Biol. Chem., May 7, 2004; 279(19): 20167 - 20177. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. T. Dermitzakis, E. Kirkness, S. Schwarz, E. Birney, A. Reymond, and S. E. Antonarakis Comparison of Human Chromosome 21 Conserved Nongenic Sequences (CNGs) With the Mouse and Dog Genomes Shows That Their Selective Constraint Is Independent of Their Genic Environment Genome Res., May 1, 2004; 14(5): 852 - 859. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. B. Rankin, W. Xu, D. G. Silberg, and E. Suh Putative intestine-specific enhancers located in 5' sequence of the CDX1 gene regulate CDX1 expression in the intestine Am J Physiol Gastrointest Liver Physiol, May 1, 2004; 286(5): G872 - G880. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Hou, M. Eren, C. A. Painter, J. W. Covington, J. D. Dixon, J. A. Schoenhard, and D. E. Vaughan Tumor Necrosis Factor {alpha} Activates the Human Plasminogen Activator Inhibitor-1 Gene through a Distal Nuclear Factor {kappa}B Site J. Biol. Chem., April 30, 2004; 279(18): 18127 - 18136. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Nelson, S. A. Duncan, and R. P. Misra Conserved Enhancer in the Serum Response Factor Promoter Controls Expression During Early Coronary Vasculogenesis Circ. Res., April 30, 2004; 94(8): 1059 - 1066. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Abrams, J. Xu, C. Nativelle-Serpentini, S. Dabirshahsahebi, and M. B. Rogers An Evolutionary and Molecular Analysis of Bmp2 Expression J. Biol. Chem., April 16, 2004; 279(16): 15916 - 15928. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Arking, S. S. Chugh, A. Chakravarti, and P. M. Spooner Genomics in Sudden Cardiac Death Circ. Res., April 2, 2004; 94(6): 712 - 723. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sevetson, S. Taylor, and Y. Pan Cbfa1/RUNX2 Directs Specific Expression of the Sclerosteosis Gene (SOST) J. Biol. Chem., April 2, 2004; 279(14): 13849 - 13858. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. B. Yap and L. Pachter Identification of Evolutionary Hotspots in the Rodent Genomes Genome Res., April 1, 2004; 14(4): 574 - 579. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Brudno, A. Poliakov, A. Salamov, G. M. Cooper, A. Sidow, E. M. Rubin, V. Solovyev, S. Batzoglou, and I. Dubchak Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human Genome Res., April 1, 2004; 14(4): 685 - 692. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovcharenko, G. G. Loots, R. C. Hardison, W. Miller, and L. Stubbs zPicture: Dynamic Alignment and Visualization Tool for Analyzing Conservation Profiles Genome Res., March 1, 2004; 14(3): 472 - 477. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Langham, J. Walsh, M. Dunn, C. Ko, S. A. Goff, and M. Freeling Genomic Duplication, Fractionation and the Origin of Regulatory Novelty Genetics, February 1, 2004; 166(2): 935 - 945. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Clark, E. Linton, J. Messing, and J. F. Doebley Inaugural Article: Pattern of diversity in the genomic region near the maize domestication gene tb1 PNAS, January 20, 2004; 101(3): 700 - 707. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Zhang, P. Haws, and Q. Wu Multiple Variable First Exons: A Mechanism for Cell- and Tissue-Specific Gene Regulation Genome Res., January 1, 2004; 14(1): 79 - 89. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Berezikov, V. Guryev, R. H.A. Plasterk, and E. Cuppen CONREAL: Conserved Regulatory Elements Anchored Alignment Algorithm for Identification of Transcription Factor Binding Sites by Phylogenetic Footprinting Genome Res., January 1, 2004; 14(1): 170 - 178. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Nobrega and L. A. Pennacchio Comparative genomic analysis as a tool for biological discovery J. Physiol., January 1, 2004; 554(1): 31 - 39. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Castronuevo, M. A. Thornton, L. E. McCarthy, J. Klimas, and B. P. Schick DNase I Hypersensitivity Patterns of the Serglycin Proteoglycan Gene in Resting and Phorbol 12-Myristate 13-Acetate-stimulated Human Erythroleukemia (HEL), CHRF 288-11, and HL-60 Cells Compared with Neutrophils and Human Umbilical Vein Endothelial Cells J. Biol. Chem., December 5, 2003; 278(49): 48704 - 48712. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, M. Blanchette, NISC Comparative Sequencing Program, D. Haussler, and E. D. Green Identification and Characterization of Multi-Species Conserved Sequences Genome Res., December 1, 2003; 13(12): 2507 - 2518. [Abstract] [Full Text] |