Genome Res. 13:2507-2518, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Identification and Characterization of Multi-Species Conserved Sequences
Elliott H. Margulies1,
Mathieu Blanchette3,
NISC Comparative Sequencing Program1,2,
David Haussler3,4,5 and
Eric D. Green1,2,5
1 Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
2 NIH Intramural Sequencing Center (NISC), National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
3 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95964, USA
4 Howard Hughes Medical Institute, University of California, Santa Cruz, California 95964, USA
Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs ( 70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1602203.
5 Corresponding authors. E-MAIL egreen{at}nhgri.nih.gov; FAX (301)402-2040. E-MAIL haussler{at}cse.ucsc.edu; FAX (831)459-4829.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
M. Bekaert and E. C. Teeling
UniPrime: a workflow-based platform for improved universal primer design
Nucleic Acids Res.,
April 19, 2008;
(2008)
gkn191v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. A. Lettice, A. E. Hill, P. S. Devenney, and R. E. Hill
Point mutations in a distant sonic hedgehog cis-regulator generate a variable regulatory output responsible for preaxial polydactyly
Hum. Mol. Genet.,
April 1, 2008;
17(7):
978 - 985.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. H. Margulies
Confidence in comparative genomics
Genome Res.,
February 1, 2008;
18(2):
199 - 200.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. M. McGaughey, R. M. Vinton, J. Huynh, A. Al-Saif, M. A. Beer, and A. S. McCallion
Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b
Genome Res.,
February 1, 2008;
18(2):
252 - 260.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Stephen, M. Pheasant, I. V. Makunin, and J. S. Mattick
Large-Scale Appearance of Ultraconserved Elements in Tetrapod Genomes and Slowdown of the Molecular Clock
Mol. Biol. Evol.,
February 1, 2008;
25(2):
402 - 408.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. I. Nikolaev, J. I. Montoya-Burgos, K. Popadin, L. Parand, E. H. Margulies, National Institutes of Health Intramural Sequencin, and S. E. Antonarakis
Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements
PNAS,
December 18, 2007;
104(51):
20443 - 20448.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Kheradpour, A. Stark, S. Roy, and M. Kellis
Reliable prediction of regulator targets using 12 Drosophila genomes
Genome Res.,
December 1, 2007;
17(12):
1919 - 1931.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. M. Andres, C. de Hemptinne, and J. Bertranpetit
Heterogeneous Rate of Protein Evolution in Serotonin Genes
Mol. Biol. Evol.,
December 1, 2007;
24(12):
2707 - 2715.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Knibbe, A. Coulon, O. Mazet, J.-M. Fayard, and G. Beslon
A Long-Term Evolutionary Pressure on the Amount of Noncoding DNA
Mol. Biol. Evol.,
October 1, 2007;
24(10):
2344 - 2353.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Pheasant and J. S. Mattick
Raising the estimate of functional human sequences
Genome Res.,
September 1, 2007;
17(9):
1245 - 1253.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L.-W. Chang, B. R. Fontaine, G. D. Stormo, and R. Nagarajan
PAP: a comprehensive workbench for mammalian transcriptional regulatory sequence analysis
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W238 - W244.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. H. Margulies, G. M. Cooper, G. Asimenos, D. J. Thomas, C. N. Dewey, A. Siepel, E. Birney, D. Keefe, A. S. Schwartz, M. Hou, et al.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Genome Res.,
June 1, 2007;
17(6):
760 - 774.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. J. Thomas, K. R. Rosenbloom, H. Clawson, A. S. Hinrichs, H. Trumbower, B. J. Raney, D. Karolchik, G. P. Barber, R. A. Harte, J. Hillman-Jackson, et al.
The ENCODE Project at UC Santa Cruz
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D663 - D667.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Wang, Y. Zhang, Y. Cheng, Y. Zhou, D. C. King, J. Taylor, F. Chiaromonte, J. Kasturi, H. Petrykowska, B. Gibb, et al.
Experimental validation of predicted mammalian erythroid cis-regulatory modules
Genome Res.,
December 1, 2006;
16(12):
1480 - 1492.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones
Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques
Genome Res.,
December 1, 2006;
16(12):
1455 - 1464.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Taylor, S. Tyekucheva, D. C. King, R. C. Hardison, W. Miller, and F. Chiaromonte
ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements
Genome Res.,
December 1, 2006;
16(12):
1596 - 1604.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. GuhaThakurta
Computational identification of transcriptional regulatory elements in DNA sequence
Nucleic Acids Res.,
July 19, 2006;
34(12):
3585 - 3598.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Prabhakar, F. Poulin, M. Shoukry, V. Afzal, E. M. Rubin, O. Couronne, and L. A. Pennacchio
Close sequence comparisons are sufficient to identify human cis-regulatory elements
Genome Res.,
July 1, 2006;
16(7):
855 - 863.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. N. Dewey and L. Pachter
Evolution at the nucleotide level: the problem of multiple whole-genome alignment.
Hum. Mol. Genet.,
April 15, 2006;
15(suppl_1):
R51 - R56.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. K. McEwen, A. Woolfe, D. Goode, T. Vavouri, H. Callaway, and G. Elgar
Ancient duplicated conserved noncoding elements in vertebrates: A genomic and functional analysis
Genome Res.,
April 1, 2006;
16(4):
451 - 465.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Kamal, X. Xie, and E. S. Lander
A large family of ancient repeat elements in the human genome is under strong selection
PNAS,
February 21, 2006;
103(8):
2740 - 2745.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Antonellis, W. R. Bennett, T. R. Menheniott, A. B. Prasad, S.-Q. Lee-Lin, NISC Comparative Sequencing Program, E. D. Green, D. Paisley, R. N. Kelsh, W. J. Pavan, et al.
Deletion of long-range sequences at Sox10 compromises developmental expression in a mouse model of Waardenburg-Shah (WS4) syndrome
Hum. Mol. Genet.,
January 15, 2006;
15(2):
259 - 271.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. SCHATTNER, S. BARBERAN-SOLER, and T. M. LOWE
A computational screen for mammalian pseudouridylation guide H/ACA RNAs
RNA,
January 1, 2006;
12(1):
15 - 25.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. E. Crawford, I. E. Holt, J. Whittle, B. D. Webb, D. Tai, S. Davis, E. H. Margulies, Y. Chen, J. A. Bernat, D. Ginsburg, et al.
Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS)
Genome Res.,
January 1, 2006;
16(1):
123 - 131.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. A. Grice, E. S. Rochelle, E. D. Green, A. Chakravarti, and A. S. McCallion
Evaluation of the RET regulatory landscape reveals the biological relevance of a HSCR-implicated enhancer
Hum. Mol. Genet.,
December 15, 2005;
14(24):
3837 - 3845.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. E. Womack
Advances in livestock genomics: Opening the barn door
Genome Res.,
December 1, 2005;
15(12):
1699 - 1705.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Faham, J. Zheng, M. Moorhead, H. Fakhrai-Rad, E. Namsaraev, K. Wong, Z. Wang, S. G. Chow, L. Lee, K. Suyenaga, et al.
Multiplexed variation scanning for 1,000 amplicons in hundreds of patients using mismatch repair detection (MRD) on tag arrays
PNAS,
October 11, 2005;
102(41):
14717 - 14722.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. D. Keightley, G. V. Kryukov, S. Sunyaev, D. L. Halligan, and D. J. Gaffney
Evolutionary constraints in conserved nongenic sequences of mammals
Genome Res.,
October 1, 2005;
15(10):
1373 - 1378.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. L. Sabol, H. B. Brewer Jr., and S. Santamarina-Fojo
The human ABCG1 gene: identification of LXR response elements that modulate expression in macrophages and liver
J. Lipid Res.,
October 1, 2005;
46(10):
2151 - 2167.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor, et al.
Galaxy: A platform for interactive large-scale genome analysis
Genome Res.,
October 1, 2005;
15(10):
1451 - 1455.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. C. King, J. Taylor, L. Elnitski, F. Chiaromonte, W. Miller, and R. C. Hardison
Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences
Genome Res.,
August 1, 2005;
15(8):
1051 - 1060.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, et al.
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
Genome Res.,
August 1, 2005;
15(8):
1034 - 1050.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. R. Hughes, J.-F. Cheng, N. Ventress, S. Prabhakar, K. Clark, E. Anguita, M. De Gobbi, P. de Jong, E. Rubin, and D. R. Higgs
Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences
PNAS,
July 12, 2005;
102(28):
9830 - 9835.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. M. Cooper, E. A. Stone, G. Asimenos, NISC Comparative Sequencing Program, E. D. Green, S. Batzoglou, and A. Sidow
Distribution and intensity of constraint in mammalian genomic sequence
Genome Res.,
July 1, 2005;
15(7):
901 - 913.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K.-H. KANG and S.-H. IM
Differential Regulation of the IL-10 Gene in Th1 and Th2 T Cells
Ann. N.Y. Acad. Sci.,
June 1, 2005;
1050(1):
97 - 107.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Brosius
Disparity, adaptation, exaptation, bookkeeping, and contingency at the genome level
Paleobiology,
June 1, 2005;
31(2_Suppl):
1 - 16.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. D. McAuliffe, M. I. Jordan, and L. Pachter
Subtree power analysis and species selection for comparative genomics
PNAS,
May 31, 2005;
102(22):
7900 - 7905.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. H. Margulies, J. P. Vinson, NISC Comparative Sequencing Program, W. Miller, D. B. Jaffe, K. Lindblad-Toh, J. L. Chang, E. D. Green, E. S. Lander, J. C. Mullikin, et al.
An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing
PNAS,
March 29, 2005;
102(13):
4795 - 4800.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. H. Zheng, F. Lu, Z.-Y. Wang, F. Zhong, J. Hoover, and R. Mural
Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs
Bioinformatics,
March 15, 2005;
21(6):
703 - 710.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. H. Margulies, NISC Comparative Sequencing Program, V. V. B. Maduro, P. J. Thomas, J. P. Tomkins, C. T. Amemiya, M. Luo, and E. D. Green
Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes
PNAS,
March 1, 2005;
102(9):
3354 - 3359.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Washietl, I. L. Hofacker, and P. F. Stadler
From The Cover: Fast and reliable prediction of noncoding RNAs
PNAS,
February 15, 2005;
102(7):
2454 - 2459.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Ye and X. Huang
MAP2: multiple alignment of syntenic genomic sequences
Nucleic Acids Res.,
January 7, 2005;
33(1):
162 - 170.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. A. Kellner, R. T. Sullivan, B. H. Carlson, NISC Comparative Sequencing Program, and J. W. Thomas
Uprobe: A genome-wide universal probe resource for comparative physical mapping in vertebrates
Genome Res.,
January 1, 2005;
15(1):
166 - 173.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Blanchette, E. D. Green, W. Miller, and D. Haussler
Reconstructing large regions of an ancestral mammalian genome in silico
Genome Res.,
December 1, 2004;
14(12):
2412 - 2423.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Martin, S. Patel, and J. A. Segre
Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation
Genome Res.,
December 1, 2004;
14(12):
2430 - 2438.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. W. Blakesley, N. F. Hansen, J. C. Mullikin, P. J. Thomas, J. C. McDowell, B. Maskeri, A. C. Young, B. Benjamin, S. Y. Brooks, B. I. Coleman, et al.
An intermediate grade of finished genomic sequence suitable for comparative analyses
Genome Res.,
November 1, 2004;
14(11):
2235 - 2244.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. S. Gill, R. Appels, A.-M. Botha-Oberholster, C. R. Buell, J. L. Bennetzen, B. Chalhoub, F. Chumley, J. Dvorak, M. Iwanaga, B. Keller, et al.
A Workshop Report on Wheat Genome Sequencing: International Genome Research on Wheat Consortium
Genetics,
October 1, 2004;
168(2):
1087 - 1096.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Ahituv, E. M. Rubin, and M. A. Nobrega
Exploiting human-fish genome comparisons for deciphering gene regulation
Hum. Mol. Genet.,
October 1, 2004;
13(suppl_2):
R261 - R266.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. W. Scherer and E. D. Green
Human chromosome 7 circa 2004: a model for structural and functional studies of the human genome
Hum. Mol. Genet.,
October 1, 2004;
13(suppl_2):
R303 - R313.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A.-M. Mallon, L. Wilming, J. Weekes, J. G.R. Gilbert, J. Ashurst, S. Peyrefitte, L. Matthews, M. Cadman, R. McKeone, C. A. Sellick, et al.
Organization and Evolution of a Gene-Rich Region of the Mouse Genome: A 12.7-Mb Region Deleted in the Del(13)Svea36H Mouse
Genome Res.,
October 1, 2004;
14(10a):
1888 - 1901.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Angata, E. H. Margulies, E. D. Green, and A. Varki
Large-scale sequencing of the CD33-related Siglec gene cluster in five mammalian species reveals rapid evolution by multiple mechanisms
PNAS,
September 7, 2004;
101(36):
13251 - 13256.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. Ovcharenko, M. A. Nobrega, G. G. Loots, and L. Stubbs
ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes
Nucleic Acids Res.,
July 1, 2004;
32(suppl_2):
W280 - W286.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Bejerano, M. Pheasant, I. Makunin, S. Stephen, W. J. Kent, J. S. Mattick, and D. Haussler
Ultraconserved Elements in the Human Genome
Science,
May 28, 2004;
304(5675):
1321 - 1325.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Blanchette, W. J. Kent, C. Riemer, L. Elnitski, A. F.A. Smit, K. M. Roskin, R. Baertsch, K. Rosenbloom, H. Clawson, E. D. Green, et al.
Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner
Genome Res.,
April 1, 2004;
14(4):
708 - 715.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Silander, K. L. Mohlke, L. J. Scott, E. C. Peck, P. Hollstein, A. D. Skol, A. U. Jackson, P. Deloukas, S. Hunt, G. Stavrides, et al.
Genetic Variation Near the Hepatocyte Nuclear Factor-4{alpha} Gene Predicts Susceptibility to Type 2 Diabetes
Diabetes,
April 1, 2004;
53(4):
1141 - 1149.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|