Genome Res. 14:708-715, 2004
©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Resources
Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner
Mathieu Blanchette1,6,
W. James Kent2,
Cathy Riemer3,
Laura Elnitski3,
Arian F.A. Smit4,
Krishna M. Roskin2,
Robert Baertsch2,
Kate Rosenbloom2,
Hiram Clawson2,
Eric D. Green5,
David Haussler1,2 and
Webb Miller3,7
1 Howard Hughes Medical Institute, University of California at Santa Cruz, Santa Cruz, California 95064, USA
2 Center for Biomolecular Science and Engineering, University of California at Santa Cruz, Santa Cruz, California 95064, USA
3 Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
4 Institute for Systems Biology, Seattle, Washington 98103, USA
5 Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1933104.
6 Present address: School of Computer Science, McGill University, Montreal, Canada.
7 Corresponding author. E-MAIL webb{at}bx.psu.edu; FAX (814) 863-1357.
[Supplemental material, including the Methods section, is available online at www.genome.org. The multiple alignments produced by MULTIZ can be viewed at the Santa Cruz Genome Browser or downloaded in bulk. TBA, simulated test data, and the Gmaj visualization tool can be downloaded from http://bio.cse.psu.edu/.]

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
M. M.R. Petit, H. Lindskog, E. Larsson, P. Wasteson, E. Athley, S. Breuer, M. Angstenberger, D. Hertfelder, E. Mattsson, A. Nordheim, et al.
Smooth Muscle Expression of Lipoma Preferred Partner Is Mediated by an Alternative Intronic Promoter That Is Regulated by Serum Response Factor/Myocardin
Circ. Res.,
July 3, 2008;
103(1):
61 - 69.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Katoh and H. Toh
Recent developments in the MAFFT multiple sequence alignment program
Brief Bioinform,
July 1, 2008;
9(4):
286 - 298.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Z. Chen, H.-Y. Luo, R. K. Basran, T.-H. Hsu, D. W. H. Mang, L. Nuntakarn, C. G. Rosenfield, G. P. Patrinos, R. C. Hardison, M. H. Steinberg, et al.
A T-to-G Transversion at Nucleotide -567 Upstream of HBG2 in a GATA-1 Binding Motif Is Associated with Elevated Hemoglobin F
Mol. Cell. Biol.,
July 1, 2008;
28(13):
4386 - 4393.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Hooghe, P. Hulpiau, F. van Roy, and P. De Bleser
ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species
Nucleic Acids Res.,
July 1, 2008;
36(suppl_2):
W128 - W132.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Nagar, H. Vernitsky, Y. Cohen, D. Dominissini, Y. Berkun, G. Rechavi, N. Amariglio, and I. Goldstein
Epigenetic inheritance of DNA methylation limits activation-induced expression of FOXP3 in conventional human CD25-CD4+ T cells
Int. Immunol.,
June 20, 2008;
(2008)
dxn062v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Liang, Y.-S. Lin, and W.-H. Li
Fast Evolution of Core Promoters in Primate Genomes
Mol. Biol. Evol.,
June 1, 2008;
25(6):
1239 - 1244.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. de Guzman Strong and J. A. Segre
Navigating the genome
J. Cell Sci.,
April 1, 2008;
121(7):
921 - 923.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. De Filippo, R. B. Henderson, M. Laschinger, and N. Hogg
Neutrophil Chemokines KC and Macrophage-Inflammatory Protein-2 Are Newly Synthesized by Tissue Macrophages Using Distinct TLR Signaling Pathways
J. Immunol.,
March 15, 2008;
180(6):
4308 - 4315.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. D. Wilson, D. Wang, R. Wagner, H. Breyssens, M. Gertsenstein, C. Lobe, X. Lu, A. Nagy, R. D. Burke, B. F. Koop, et al.
ARS2 Is a Conserved Eukaryotic Gene Essential for Early Mammalian Development
Mol. Cell. Biol.,
March 1, 2008;
28(5):
1503 - 1514.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. M. McGaughey, R. M. Vinton, J. Huynh, A. Al-Saif, M. A. Beer, and A. S. McCallion
Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b
Genome Res.,
February 1, 2008;
18(2):
252 - 260.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Lunter, A. Rocco, N. Mimouni, A. Heger, A. Caldeira, and J. Hein
Uncertainty in homology inferences: Assessing and improving genomic sequence alignment
Genome Res.,
February 1, 2008;
18(2):
298 - 309.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Torarinsson, Z. Yao, E. D. Wiklund, J. B. Bramsen, C. Hansen, J. Kjems, N. Tommerup, W. L. Ruzzo, and J. Gorodkin
Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions
Genome Res.,
February 1, 2008;
18(2):
242 - 251.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Fujita and H. Iba
Putative promoter regions of miRNA genes involved in evolutionarily conserved regulatory systems among vertebrates
Bioinformatics,
February 1, 2008;
24(3):
303 - 308.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. G. Jegga, A. Inga, D. Menendez, B. J. Aronow, and M. A. Resnick
Functional evolution of the p53 regulatory network through its target response elements
PNAS,
January 22, 2008;
105(3):
944 - 949.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. C. Nickel, D. Tefft, and M. D. Adams
Human PAML browser: a database of positive selection on human genes using phylogenetic methods
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D800 - D808.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Karolchik, R. M. Kuhn, R. Baertsch, G. P. Barber, H. Clawson, M. Diekhans, B. Giardine, R. A. Harte, A. S. Hinrichs, F. Hsu, et al.
The UCSC Genome Browser Database: 2008 update
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D773 - D779.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. L. Tress, J.-J. Wesselink, A. Frankish, G. Lopez, N. Goldman, A. Loytynoja, T. Massingham, F. Pardi, S. Whelan, J. Harrow, et al.
Determination and validation of principal gene products
Bioinformatics,
January 1, 2008;
24(1):
11 - 17.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. Miller, K. Rosenbloom, R. C. Hardison, M. Hou, J. Taylor, B. Raney, R. Burhans, D. C. King, R. Baertsch, D. Blankenberg, et al.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser
Genome Res.,
December 1, 2007;
17(12):
1797 - 1808.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. F. Lin, J. W. Carlson, M. A. Crosby, B. B. Matthews, C. Yu, S. Park, K. H. Wan, A. J. Schroeder, L. S. Gramates, S. E. St. Pierre, et al.
Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes
Genome Res.,
December 1, 2007;
17(12):
1823 - 1836.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Cutler, L. A. Marshall, N. Chin, H. Baribault, and P. D. Kassner
Significant gene content variation characterizes the genomes of inbred mouse strains
Genome Res.,
December 1, 2007;
17(12):
1743 - 1754.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Siepel, M. Diekhans, B. Brejova, L. Langton, M. Stevens, C. L.G. Comstock, C. Davis, B. Ewing, S. Oommen, C. Lau, et al.
Targeted discovery of novel human exons by comparative genomics
Genome Res.,
December 1, 2007;
17(12):
1763 - 1773.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. A. Huntley and A. G. Clark
Evolutionary Analysis of Amino Acid Repeats across the Genomes of 12 Drosophila Species
Mol. Biol. Evol.,
December 1, 2007;
24(12):
2598 - 2609.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Coulombe-Huntington and J. Majewski
Intron Loss and Gain in Drosophila
Mol. Biol. Evol.,
December 1, 2007;
24(12):
2842 - 2850.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Caceres, National Institutes of Health Intramural Sequencin, R. T. Sullivan, and J. W. Thomas
A recurrent inversion on the eutherian X chromosome
PNAS,
November 20, 2007;
104(47):
18571 - 18576.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. A. F. Noor, D. A. Garfield, S. W. Schaeffer, and C. A. Machado
Divergence Between the Drosophila pseudoobscura and D. persimilis Genome Sequences in Relation to Chromosomal Inversions
Genetics,
November 1, 2007;
177(3):
1417 - 1428.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. DeCaprio, J. P. Vinson, M. D. Pearson, P. Montgomery, M. Doherty, and J. E. Galagan
Conrad: Gene prediction using conditional random fields
Genome Res.,
September 1, 2007;
17(9):
1389 - 1398.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Kim and C. Lee
QPRIMER: a quick web-based application for designing conserved PCR primers from multigenome alignments
Bioinformatics,
September 1, 2007;
23(17):
2331 - 2333.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Asthana, W. S. Noble, G. Kryukov, C. E. Grant, S. Sunyaev, and J. A. Stamatoyannopoulos
Widely distributed noncoding purifying selection in the human genome
PNAS,
July 24, 2007;
104(30):
12410 - 12415.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Brudno, A. Poliakov, S. Minovitsky, I. Ratnere, and I. Dubchak
Multiple whole genome alignments and novel biomedical applications at the VISTA portal
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W669 - W674.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F.-C. Chen, C.-J. Chen, and T.-J. Chuang
INDELSCAN: a web server for comparative identification of species-specific and non-species-specific insertion/deletion events
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W633 - W638.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. B. Palmer, P. Majumder, M. R. Green, P. A. Wade, and J. M. Boss
A 3' Enhancer Controls Snail Expression in Melanoma Cells
Cancer Res.,
July 1, 2007;
67(13):
6113 - 6120.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Lunter
Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes
Bioinformatics,
July 1, 2007;
23(13):
i289 - i296.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. S. Rozowsky, D. Newburger, F. Sayward, J. Wu, G. Jordan, J. O. Korbel, U. Nagalakshmi, J. Yang, D. Zheng, R. Guigo, et al.
The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci
Genome Res.,
June 1, 2007;
17(6):
732 - 745.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. H. Margulies, G. M. Cooper, G. Asimenos, D. J. Thomas, C. N. Dewey, A. Siepel, E. Birney, D. Keefe, A. S. Schwartz, M. Hou, et al.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Genome Res.,
June 1, 2007;
17(6):
760 - 774.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. C. King, J. Taylor, Y. Zhang, Y. Cheng, H. A. Lawson, J. Martin, ENCODE groups for Transcriptional Regulation and M, F. Chiaromonte, W. Miller, and R. C. Hardison
Finding cis-regulatory elements using comparative genomics: Some lessons from ENCODE data
Genome Res.,
June 1, 2007;
17(6):
775 - 786.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Zheng, A. Frankish, R. Baertsch, P. Kapranov, A. Reymond, S. W. Choo, Y. Lu, F. Denoeud, S. E. Antonarakis, M. Snyder, et al.
Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution
Genome Res.,
June 1, 2007;
17(6):
839 - 851.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Washietl, J. S. Pedersen, J. O. Korbel, C. Stocsits, A. R. Gruber, J. Hackermuller, J. Hertel, M. Lindemeyer, K. Reiche, A. Tanzer, et al.
Structured RNAs in the ENCODE selected regions of the human genome
Genome Res.,
June 1, 2007;
17(6):
852 - 864.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
O. Emanuelsson, U. Nagalakshmi, D. Zheng, J. S. Rozowsky, A. E. Urban, J. Du, Z. Lian, V. Stolc, S. Weissman, M. Snyder, et al.
Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome
Genome Res.,
June 1, 2007;
17(6):
886 - 897.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Zeggini, M. N. Weedon, C. M. Lindgren, T. M. Frayling, K. S. Elliott, H. Lango, N. J. Timpson, J. R. B. Perry, N. W. Rayner, R. M. Freathy, et al.
Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes
Science,
June 1, 2007;
316(5829):
1336 - 1341.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. V. Alekseyenko and C. J. Lee
Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases
Bioinformatics,
June 1, 2007;
23(11):
1386 - 1393.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Piriyapongsa, L. Marino-Ramirez, and I. K. Jordan
Origin and Evolution of Human microRNAs From Transposable Elements
Genetics,
June 1, 2007;
176(2):
1323 - 1337.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. V. Alekseyenko, N. Kim, and C. J. Lee
Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes
RNA,
May 1, 2007;
13(5):
661 - 670.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. L. Chandler, K. J. Chandler, K. A. McFarland, and D. P. Mortlock
Bmp2 Transcription in Osteoblast Progenitors Is Regulated by a Distant 3' Enhancer Located 156.3 Kilobases from the Promoter
Mol. Cell. Biol.,
April 15, 2007;
27(8):
2934 - 2951.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Hou, P. Berman, C.-H. Hsu, and R. S. Harris
HomologMiner: looking for homologous genomic groups in whole genomes
Bioinformatics,
April 15, 2007;
23(8):
917 - 925.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Rhesus Macaque Genome Sequencing and Analysis Cons, R. A. Gibbs, J. Rogers, M. G. Katze, R. Bumgarner, G. M. Weinstock, E. R. Mardis, K. A. Remington, R. L. Strausberg, J. C. Venter, et al.
Evolutionary and Biomedical Insights from the Rhesus Macaque Genome
Science,
April 13, 2007;
316(5822):
222 - 234.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. A. Harris, J. Rogers, and A. Milosavljevic
Human-Specific Changes of Genome Structure Detected by Genomic Triangulation
Science,
April 13, 2007;
316(5822):
235 - 237.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Liang and L. F. Landweber
Hypothesis: RNA editing of microRNA target sites in humans?
RNA,
April 1, 2007;
13(4):
463 - 467.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. A. Saunders, H. Liang, and W.-H. Li
Human polymorphism at microRNAs and microRNA target sites
PNAS,
February 27, 2007;
104(9):
3300 - 3305.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Kumar and A. Filipski
Multiple sequence alignment: In pursuit of homologous DNA positions
Genome Res.,
February 1, 2007;
17(2):
127 - 135.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Kim and S. Sinha
Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment
Bioinformatics,
February 1, 2007;
23(3):
289 - 297.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F. G. Jorgensen, M. H. Schierup, and A. G. Clark
Heterogeneity in Regional GC Content and Differential Usage of Codons and Amino Acids in GC-Poor and GC-Rich Regions of the Genome of Apis mellifera
Mol. Biol. Evol.,
February 1, 2007;
24(2):
611 - 619.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Kim, A. V. Alekseyenko, M. Roy, and C. Lee
The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D93 - D98.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. M. Kuhn, D. Karolchik, A. S. Zweig, H. Trumbower, D. J. Thomas, A. Thakkapallayil, C. W. Sugnet, M. Stanke, K. E. Smith, A. Siepel, et al.
The UCSC genome browser database: update 2007
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D668 - D673.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Ferretti, C. Poitras, D. Bergeron, B. Coulombe, F. Robert, and M. Blanchette
PReMod: a database of genome-wide mammalian cis-regulatory module predictions
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D122 - D126.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. J. Thomas, K. R. Rosenbloom, H. Clawson, A. S. Hinrichs, H. Trumbower, B. J. Raney, D. Karolchik, G. P. Barber, R. A. Harte, J. Hillman-Jackson, et al.
The ENCODE Project at UC Santa Cruz
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D663 - D667.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Coulombe-Huntington and J. Majewski
Characterization of intron loss events in mammals
Genome Res.,
January 1, 2007;
17(1):
23 - 32.
[Abstract]
[Full Text]< | |