Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Genome Res. 6:829-845, 1996
©1996 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Aaronson, J S
Right arrow Articles by Elliston, K O
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aaronson, J S
Right arrow Articles by Elliston, K O
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.

J S Aaronson, B Eckman, R A Blevins, J A Borkowski, J Myerson, S Imran, and K O Elliston

Merck Research Laboratories, Department of Bioinformatics, Rahway, New Jersey 07065, USA. aaronson@merck.com

Abstract

A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan
A hitchhiker's guide to expressed sequence tag (EST) analysis
Brief Bioinform, January 1, 2007; 8(1): 6 - 21.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Sorek and H. M. Safer
A novel algorithm for computational identification of contaminated EST libraries
Nucleic Acids Res., February 1, 2003; 31(3): 1067 - 1074.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. Iseli, B. J. Stevenson, S. J. de Souza, H. B. Samaia, A. A. Camargo, K. H. Buetow, R. L. Strausberg, A. J.G. Simpson, P. Bucher, and C. V. Jongeneel
Long-Range Heterogeneity at the 3' Ends of Human mRNAs
Genome Res., July 1, 2002; 12(7): 1068 - 1074.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. K. Nam, S. Lee, G. Zhou, X. Cao, C. Wang, T. Clark, J. Chen, J. D. Rowley, and S. M. Wang
Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription
PNAS, April 30, 2002; 99(9): 6152 - 6156.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Gemund, C. Ramu, B. Altenberg-Greulich, and T. J. Gibson
Gene2EST: a BLAST2 server for searching expressed sequence tag (EST) databases with eukaryotic gene-sized queries
Nucleic Acids Res., March 15, 2001; 29(6): 1272 - 1277.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
O. Ohara and G. Temple
Directional cDNA library construction assisted by the in vitro recombination reaction
Nucleic Acids Res., February 15, 2001; 29(4): e22 - e22.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. Muilu, P. Rodriguez-Tomé, and A. Robinson
GBuilder---An Application for the Visualization and Integration of EST Cluster Data
Genome Res., January 1, 2001; 11(1): 179 - 184.
[Abstract] [Full Text]


Home page
Genome ResHome page
L. Huminiecki and R. Bicknell
In Silico Cloning of Novel Endothelial-Specific Genes
Genome Res., November 1, 2000; 10(11): 1796 - 1806.
[Abstract] [Full Text]


Home page
Genome ResHome page
S. Kawamoto, J. Yoshii, K. Mizuno, K. Ito, Y. Miyamoto, T. Ohnishi, R. Matoba, N. Hori, Y. Matsumoto, T. Okumura, et al.
BodyMap: A Collection of 3' ESTs for Analysis of Human Gene Expression Information
Genome Res., November 1, 2000; 10(11): 1817 - 1827.
[Abstract] [Full Text]


Home page
Genome ResHome page
M. Hirosawa, K.-i. Ishikawa, T. Nagase, and O. Ohara
Detection of Spurious Interruptions of Protein-Coding Regions in Cloned cDNA Sequences by GeneMark Analysis
Genome Res., September 1, 2000; 10(9): 1333 - 1341.
[Abstract] [Full Text]


Home page
Genome ResHome page
J. Stollberg, J. Urschitz, Z. Urban, and C. D. Boyd
A Quantitative Evaluation of SAGE
Genome Res., August 1, 2000; 10(8): 1241 - 1248.
[Abstract] [Full Text]


Home page
Genome ResHome page
A. E. Lash, C. M. Tolstoshev, L. Wagner, G. D. Schuler, R. L. Strausberg, G. J. Riggins, and S. F. Altschul
SAGEmap: A Public Gene Expression Resource
Genome Res., July 1, 2000; 10(7): 1051 - 1060.
[Abstract] [Full Text]


Home page
Genome ResHome page
C.-H. Lai, C.-Y. Chou, L.-Y. Ch'ang, C.-S. Liu, and W.-c. Lin
Identification of Novel Human Genes Evolutionarily Conserved in Caenorhabditis elegans by Comparative Proteomics
Genome Res., May 1, 2000; 10(5): 703 - 713.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. Dias Neto, R. Garcia Correa, S. Verjovski-Almeida, M. R. S. Briones, M. A. Nagai, W. da Silva Jr., M. A. Zago, S. Bordin, F. F. Costa, G. H. Goldman, et al.
Shotgun sequencing of the human transcriptome with ORF expressed sequence tags
PNAS, March 28, 2000; 97(7): 3491 - 3496.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. T. Miller, A. G. Christoffels, C. Gopalakrishnan, J. Burke, A. A. Ptitsyn, T. R. Broveak, and W. A. Hide
A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base
Genome Res., November 1, 1999; 9(11): 1143 - 1155.
[Abstract] [Full Text]


Home page
Genome ResHome page
R. M. Ewing, A. B. Kahla, O. Poirot, F. Lopez, S. Audic, and J.-M. Claverie
Large-Scale Statistical Analyses of Rice ESTs Reveal Correlated Patterns of Gene Expression
Genome Res., October 1, 1999; 9(10): 950 - 959.
[Abstract] [Full Text]


Home page
Hum Mol GenetHome page
J.-M. Claverie
Computational methods for theidentification of differential and coordinated gene expression
Hum. Mol. Genet., September 1, 1999; 8(10): 1821 - 1832.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Shintani, C. O'hUigin, S. Toyosawa, V. Michalová, and J. Klein
Origin of Gene Overlap: The Case of TCP1 and ACAT2
Genetics, June 1, 1999; 152(2): 743 - 754.
[Abstract] [Full Text]


Home page
Genome ResHome page
D. Gautheret, O. Poirot, F. Lopez, S. Audic, and J.-M. Claverie
Alternate Polyadenylation in Human mRNAs: A Large-Scale Analysis by EST Clustering
Genome Res., May 1, 1998; 8(5): 524 - 530.
[Abstract] [Full Text]


Home page
Genome ResHome page
L. C. Bailey Jr., D. B. Searls, and G. C. Overton
Analysis of EST-Driven Gene Annotation in Human Genomic Sequence
Genome Res., April 1, 1998; 8(4): 362 - 376.
[Abstract] [Full Text]


Home page
Genome ResHome page
J. Jiang and H. J. Jacob
EbEST: An Automated Tool Using Expressed Sequence Tags to Delineate Gene Structure
Genome Res., March 1, 1998; 8(3): 268 - 275.
[Abstract] [Full Text]


Home page
Genome ResHome page
J. Burke, H. Wang, W. Hide, and D. B. Davison
Alternative Gene Form Discovery and Candidate Gene Selection from Gene Indexing Projects
Genome Res., March 1, 1998; 8(3): 276 - 290.
[Abstract] [Full Text]


Home page
Genome ResHome page
S. Audic and J.-M. Claverie
The Significance of Digital Gene Expression Profiles
Genome Res., October 1, 1997; 7(10): 986 - 995.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
G. Miller, R. Fuchs, and E. Lai
IMAGE cDNA Clones, UniGene Clustering, and ACeDB: An Integrated Resource for Expressed Sequence Information
Genome Res., October 1, 1997; 7(10): 1027 - 1032.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. A. Ruddy, G. S. Kronmal, V. K. Lee, G. A. Mintier, L. Quintana, R. Domingo Jr., N. C. Meyer, A. Irrinki, E. E. McClelland, A. Fullan, et al.
A 1.1-Mb Transcript Map of the Hereditary Hemochromatosis Locus
Genome Res., May 1, 1997; 7(5): 441 - 456.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. K. Nam, S. Lee, G. Zhou, X. Cao, C. Wang, T. Clark, J. Chen, J. D. Rowley, and S. M. Wang
Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription
PNAS, April 30, 2002; 99(9): 6152 - 6156.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. Zhuo, W. D. Zhao, F. A. Wright, H.-Y. Yang, J.-P. Wang, R. Sears, T. Baer, D.-H. Kwon, D. Gordon, S. Gibbs, et al.
Assembly, Annotation, and Integration of UNIGENE Clusters into the Human Genome Draft
Genome Res., May 1, 2001; 11(5): 904 - 918.
[Abstract] [Full Text]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.