Genome Res. 15:54-66, 2005
©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00
Methods
Gene and alternative splicing annotation with AIR
Liliana Florea1,4,5,
Valentina Di Francesco2,
Jason Miller1,
Russell Turner1,
Alison Yao2,
Michael Harris2,
Brian Walenz1,
Clark Mobarry1,
Gennady V. Merkulov3,
Rosane Charlab3,
Ian Dew1,
Zuoming Deng3,
Sorin Istrail1,
Peter Li2 and
Granger Sutton1
1 Informatics Research, Applied Biosystems, Rockville, Maryland 20850, USA
2 Advanced Solutions, Celera Genomics, Rockville, Maryland 20850, USA
3 Scientific Content and Applications, Celera Genomics, Rockville, Maryland 20850, USA
Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2889405.
4 Present address: Department of Computer Science, George Washington University, Washington, DC 20052, USA.
5 Corresponding author. E-mail florea{at}gwu.edu; fax (240) 453-3324.
[Supplemental material is available online at www.genome.org and https://panther.appliedbiosystems.com/publications.jsp.]

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
M. B. Lucitt, T. S. Price, A. Pizarro, W. Wu, A. K. Yocum, C. Seiler, M. A. Pack, I. A. Blair, G. A. FitzGerald, and T. Grosser
Analysis of the Zebrafish Proteome during Embryonic Development
Mol. Cell. Proteomics,
May 1, 2008;
7(5):
981 - 994.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Stanke, M. Diekhans, R. Baertsch, and D. Haussler
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
Bioinformatics,
March 1, 2008;
24(5):
637 - 644.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Tanner, Z. Shen, J. Ng, L. Florea, R. Guigo, S. P. Briggs, and V. Bafna
Improving gene annotation using peptide mass spectrometry
Genome Res.,
February 1, 2007;
17(2):
231 - 239.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Xia, J. Bi, and Y. Li
Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition
Nucleic Acids Res.,
December 4, 2006;
34(21):
6305 - 6313.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Yao, R. Charlab, and P. Li
Systematic identification of pseudogenes through whole genome expression evidence profiling
Nucleic Acids Res.,
September 11, 2006;
34(16):
4477 - 4485.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Zhang, X. S. Liu, Q.-R. Liu, and L. Wei
Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species
Nucleic Acids Res.,
July 18, 2006;
34(12):
3465 - 3475.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Xing, T. Yu, Y. N. Wu, M. Roy, J. Kim, and C. Lee
An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs
Nucleic Acids Res.,
June 6, 2006;
34(10):
3150 - 3160.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Florea
Bioinformatics of alternative splicing and its regulation
Brief Bioinform,
March 1, 2006;
7(1):
55 - 69.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Zhang and W. Gish
Improved spliced alignment from an information theoretic approach
Bioinformatics,
January 1, 2006;
22(1):
13 - 20.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. D. Wu and C. K. Watanabe
GMAP: a genomic mapping and alignment program for mRNA and EST sequences
Bioinformatics,
May 1, 2005;
21(9):
1859 - 1875.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|