Vol 13, Issue 1, 108-117, January 2003
METHODS
Comparative Gene Prediction in Human and Mouse
Genís Parra1,
Pankaj Agarwal2,
Josep F. Abril1,
Thomas Wiehe3,
James W. Fickett4 and
Roderic Guigó1,5
1Grup de Recerca en Informàtica Biomèdica.
Institut Municipal d'Investigació Medica / Universitat Pompeu
Fabra / Centre de Regulació Genòmica 08003 Barcelona,
Catalonia, Spain; 2GlaxoSmithKline, King of Prussia,
Pennsylvania 19406, USA; 3Freie Universität Berlin and
Berlin Center for Genome Based Bioinformatics (BCB), 14195 Berlin,
Germany; 4AstraZeneca R&D Boston,
Waltham, Massachusetts 02451, USA
The completion of the sequencing of the mouse genome
promises to help predict human genes with greater accuracy. While
current ab initio gene prediction programs are remarkably sensitive
(i.e., they predict at least a fragment of most genes), their
specificity is often low, predicting a large number of false-positive
genes in the human genome. Sequence conservation at the protein level
with the mouse genome can help eliminate some of those false positives.
Here we describe SGP2, a gene prediction program that combines ab
initio gene prediction with TBLASTX searches between two genome
sequences to provide both sensitive and specific gene predictions. The
accuracy of SGP2 when used to predict genes by comparing the human and
mouse genomes is assessed on a number of data sets, including
single-gene data sets, the highly curated human chromosome 22
predictions, and entire genome predictions from ENSEMBL. Results
indicate that SGP2 outperforms purely ab initio gene prediction
methods. Results also indicate that SGP2 works about as well with 3x
shotgun data as it does with fully assembled genomes. SGP2 provides a
high enough specificity that its predictions can be experimentally
verified at a reasonable cost. SGP2 was used to generate a complete set
of gene predictions on both the human and mouse by comparing the
genomes of these two species. Our results suggest that another few
thousand human and mouse genes currently not in ENSEMBL are
worth verifying experimentally.
5 Corresponding author.E-MAIL rguigo{at}imim.es; FAX 34
93 224-0875.
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.871403.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith
Proteogenomics: needs and roles to be filled by proteomics in genome annotation
Brief Funct Genomic Proteomic,
March 10, 2008;
(2008)
eln010v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. Fullwood, J. J. S. Tan, P. W. P. Ng, K. P. Chiu, J. Liu, C. L. Wei, and Y. Ruan
The use of multiple displacement amplification to amplify complex DNA libraries
Nucleic Acids Res.,
March 1, 2008;
36(5):
e32 - e32.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Siepel, M. Diekhans, B. Brejova, L. Langton, M. Stevens, C. L.G. Comstock, C. Davis, B. Ewing, S. Oommen, C. Lau, et al.
Targeted discovery of novel human exons by comparative genomics
Genome Res.,
December 1, 2007;
17(12):
1763 - 1773.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. M. Andres, C. de Hemptinne, and J. Bertranpetit
Heterogeneous Rate of Protein Evolution in Serotonin Genes
Mol. Biol. Evol.,
December 1, 2007;
24(12):
2707 - 2715.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Lyle, P. Prandini, K. Osoegawa, B. ten Hallers, S. Humphray, B. Zhu, E. Eyras, R. Castelo, C. P. Bird, S. Gagos, et al.
Islands of euchromatin-like sequence and expressed polymorphic sequences within the short arm of human chromosome 21
Genome Res.,
November 1, 2007;
17(11):
1690 - 1696.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. A. Cogburn, T. E. Porter, M. J. Duclos, J. Simon, S. C. Burgess, J. J. Zhu, H. H. Cheng, J. B. Dodgson, and J. Burnside
Functional Genomics of the Chicken A Model Organism
Poult. Sci.,
October 1, 2007;
86(10):
2059 - 2094.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Coghlan and R. Durbin
Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron exon structure
Bioinformatics,
June 15, 2007;
23(12):
1468 - 1475.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. R. Gingeras
Origin of phenotypes: Genes and transcripts
Genome Res.,
June 1, 2007;
17(6):
682 - 690.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Parra, K. Bradnam, and I. Korf
CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes
Bioinformatics,
May 1, 2007;
23(9):
1061 - 1067.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Mix, A. V. Lobanov, and V. N. Gladyshev
SECIS elements in the coding regions of selenoprotein transcripts are functional in higher eukaryotes
Nucleic Acids Res.,
January 28, 2007;
35(2):
414 - 423.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang
GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W280 - W284.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Agrawal and G. D. Stormo
Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans
Bioinformatics,
May 15, 2006;
22(10):
1239 - 1244.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. van Baren and M. R. Brent
Iterative gene prediction and pseudogene removal improves genome annotation.
Genome Res.,
May 1, 2006;
16(5):
678 - 685.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. W. Burt
Chicken genome: Current status and future opportunities
Genome Res.,
December 1, 2005;
15(12):
1692 - 1698.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. R. Brent
Genome annotation past, present, and future: How to define an ORF at each locus
Genome Res.,
December 1, 2005;
15(12):
1777 - 1786.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. E. Allen and S. L. Salzberg
JIGSAW: integration of multiple sources of evidence for gene prediction
Bioinformatics,
September 15, 2005;
21(18):
3596 - 3603.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Castelo, A. Reymond, C. Wyss, F. Camara, G. Parra, S. E. Antonarakis, R. Guigo, and E. Eyras
Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes
Nucleic Acids Res.,
April 4, 2005;
33(6):
1935 - 1939.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Ding, A. Sabo, N. Berkowicz, R. R. Meyer, Y. Shotland, M. R. Johnson, K. H. Pepin, R. K. Wilson, and J. Spieth
EAnnot: A genome annotation tool using experimental evidence
Genome Res.,
December 1, 2004;
14(12):
2503 - 2509.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Taher, O. Rinner, S. Garg, A. Sczyrba, and B. Morgenstern
AGenDA: gene prediction by cross-species sequence comparison
Nucleic Acids Res.,
July 1, 2004;
32(suppl_2):
W305 - W308.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Stanke, R. Steinkamp, S. Waack, and B. Morgenstern
AUGUSTUS: a web server for gene finding in eukaryotes
Nucleic Acids Res.,
July 1, 2004;
32(suppl_2):
W309 - W312.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Birney, M. Clamp, and R. Durbin
GeneWise and Genomewise
Genome Res.,
May 1, 2004;
14(5):
988 - 995.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Dewey, J. Q. Wu, S. Cawley, M. Alexandersson, R. Gibbs, and L. Pachter
Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat
Genome Res.,
April 1, 2004;
14(4):
661 - 664.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Q. Wu, D. Shteynberg, M. Arumugam, R. A. Gibbs, and M. R. Brent
Identification of Rat Genes by TWINSCAN Gene Prediction, RT-PCR, and Direct Sequencing
Genome Res.,
April 1, 2004;
14(4):
665 - 671.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Chakrabarti and L. Pachter
Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER
Genome Res.,
April 1, 2004;
14(4):
716 - 720.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. M. Meyer and R. Durbin
Gene structure conservation aids similarity based gene prediction
Nucleic Acids Res.,
February 4, 2004;
32(2):
776 - 783.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D.F. Kinane and T.C. Hart
GENES AND GENE POLYMORPHISMS ASSOCIATED WITH PERIODONTAL DISEASE
Crit. Rev. Oral. Biol. Med.,
November 1, 2003;
14(6):
430 - 449.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Cawley, L. Pachter, and M. Alexandersson
SLAM web server for comparative gene finding and alignment
Nucleic Acids Res.,
July 1, 2003;
31(13):
3507 - 3509.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Guigo, E. T. Dermitzakis, P. Agarwal, C. P. Ponting, G. Parra, A. Reymond, J. F. Abril, E. Keibler, R. Lyle, C. Ucla, et al.
Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes
PNAS,
February 4, 2003;
100(3):
1140 - 1145.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|