Published online before print
February 12, 2003, 10.1101/gr.424203
Vol 13, Issue 3, 496-502, March 2003
METHODS
SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model
Marina Alexandersson1,
Simon Cawley2 and
Lior Pachter3,4
1Department of Statistics, University of California,
Berkeley, Berkeley, California 94720, USA; 2Affymetrix Inc.,
Santa Clara, California 95051, USA; 3Department of
Mathematics, University of California, Berkeley,
Berkeley, California 94720, USA
Comparative-based gene recognition is driven by the principle that
conserved regions between related organisms are more likely than
divergent regions to be coding. We describe a probabilistic framework
for gene structure and alignment that can be used to simultaneously
find both the gene structure and alignment of two syntenic genomic
regions. A key feature of the method is the ability to enhance gene
predictions by finding the best alignment between two syntenic
sequences, while at the same time finding biologically meaningful
alignments that preserve the correspondence between coding exons. Our
probabilistic framework is the generalized pair hidden Markov model, a
hybrid of (1) generalized hidden Markov models, which have been used
previously for gene finding, and (2) pair hidden Markov models, which
have applications to sequence alignment. We have built a gene finding
and alignment program called SLAM, which aligns and identifies complete
exon/intron structures of genes in two related but unannotated
sequences of DNA. SLAM is able to reliably predict gene structures for
any suitably related pair of organisms, most notably with fewer
false-positive predictions compared to previous methods (examples are
provided for Homo sapiens/Mus musculus and
Plasmodium falciparum/Plasmodium vivax comparisons).
Accuracy is obtained by distinguishing conserved noncoding sequence
(CNS) from conserved coding sequence. CNS annotation is a novel feature
of SLAM and may be useful for the annotation of UTRs, regulatory
elements, and other noncoding features.
4 Corresponding author.
E-MAIL lpachter{at}math.berkeley.edu; FAX (510) 642-8204.
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.424203. Article published online before print in February
2003.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
E. H. Margulies
Confidence in comparative genomics
Genome Res.,
February 1, 2008;
18(2):
199 - 200.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. de Groot, T. Mailund, and J. Hein
Comparative annotation of viral genomes with non-conserved gene structure
Bioinformatics,
May 1, 2007;
23(9):
1080 - 1089.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Flannick, A. Novak, B. S. Srinivasan, H. H. McAdams, and S. Batzoglou
Graemlin: General and robust alignment of multiple large interaction networks
Genome Res.,
September 1, 2006;
16(9):
1169 - 1181.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang
GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W280 - W284.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, and B. Morgenstern
AUGUSTUS: ab initio prediction of alternative transcripts.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W435 - W439.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. van Baren and M. R. Brent
Iterative gene prediction and pseudogene removal improves genome annotation.
Genome Res.,
May 1, 2006;
16(5):
678 - 685.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. R. Brent
Genome annotation past, present, and future: How to define an ORF at each locus
Genome Res.,
December 1, 2005;
15(12):
1777 - 1786.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Flaherty, G. Giaever, J. Kumm, M. I. Jordan, and A. P. Arkin
A latent variable model for chemogenomic profiling
Bioinformatics,
August 1, 2005;
21(15):
3286 - 3293.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. H. Majoros, M. Pertea, and S. L. Salzberg
Efficient implementation of a generalized pair hidden Markov model for comparative gene finding
Bioinformatics,
May 1, 2005;
21(9):
1782 - 1788.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. B. Wahl, U. Heinzmann, and K. Imai
LongSAGE analysis significantly improves genome annotation: identifications of novel genes and alternative transcripts in the mouse
Bioinformatics,
April 15, 2005;
21(8):
1393 - 1400.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Pachter and B. Sturmfels
Parametric inference for biological sequence analysis
PNAS,
November 16, 2004;
101(46):
16138 - 16143.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Birney, M. Clamp, and R. Durbin
GeneWise and Genomewise
Genome Res.,
May 1, 2004;
14(5):
988 - 995.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. T. Dermitzakis, E. Kirkness, S. Schwarz, E. Birney, A. Reymond, and S. E. Antonarakis
Comparison of Human Chromosome 21 Conserved Nongenic Sequences (CNGs) With the Mouse and Dog Genomes Shows That Their Selective Constraint Is Independent of Their Genic Environment
Genome Res.,
May 1, 2004;
14(5):
852 - 859.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Dewey, J. Q. Wu, S. Cawley, M. Alexandersson, R. Gibbs, and L. Pachter
Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat
Genome Res.,
April 1, 2004;
14(4):
661 - 664.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Chakrabarti and L. Pachter
Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER
Genome Res.,
April 1, 2004;
14(4):
716 - 720.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. J. Sabo, R. Humbert, M. Hawrylycz, J. C. Wallace, M. O. Dorschner, M. McArthur, and J. A. Stamatoyannopoulos
Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries
PNAS,
March 30, 2004;
101(13):
4537 - 4542.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. M. Meyer and R. Durbin
Gene structure conservation aids similarity based gene prediction
Nucleic Acids Res.,
February 4, 2004;
32(2):
776 - 783.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. H. Margulies, M. Blanchette, NISC Comparative Sequencing Program, D. Haussler, and E. D. Green
Identification and Characterization of Multi-Species Conserved Sequences
Genome Res.,
December 1, 2003;
13(12):
2507 - 2518.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. J. Kent, R. Baertsch, A. Hinrichs, W. Miller, and D. Haussler
Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes
PNAS,
September 30, 2003;
100(20):
11484 - 11489.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Cawley, L. Pachter, and M. Alexandersson
SLAM web server for comparative gene finding and alignment
Nucleic Acids Res.,
July 1, 2003;
31(13):
3507 - 3509.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|