Genome Research cityscape

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Published online before print March 17, 2008, 10.1101/gr.070227.107
Genome Res. 18:763-770, 2008
©2008 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/08 $5.00
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supplemental Research Data
Right arrow All Versions of this Article:
gr.070227.107v1
gr.070227.107v2
18/5/763    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Google Scholar
Right arrow Articles by Brockman, W.
Right arrow Articles by Jaffe, D. B.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Brockman, W.
Right arrow Articles by Jaffe, D. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Methods

Quality scores and SNP detection in sequencing-by-synthesis systems

William Brockman1,3,4, Pablo Alvarez1,3,5, Sarah Young1, Manuel Garber1, Georgia Giannoukos1, William L. Lee1, Carsten Russ1, Eric S. Lander1,2, Chad Nusbaum1, and David B. Jaffe1,6

1 Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, USA; 2 Whitehead Institute for Biomedical Research, MIT, Cambridge, Massachusetts 02139, USA

Promising new sequencing technologies, based on sequencing-by-synthesis (SBS), are starting to deliver large amounts of DNA sequence at very low cost. Polymorphism detection is a key application. We describe general methods for improved quality scores and accurate automated polymorphism detection, and apply them to data from the Roche (454) Genome Sequencer 20. We assess our methods using known-truth data sets, which is critical to the validity of the assessments. We developed informative, base-by-base error predictors for this sequencer and used a variant of the phred binning algorithm to combine them into a single empirically derived quality score. These quality scores are more useful than those produced by the system software: They both better predict actual error rates and identify many more high-quality bases. We developed a SNP detection method, with variants for low coverage, high coverage, and PCR amplicon applications, and evaluated it on known-truth data sets. We demonstrate good specificity in single reads, and excellent specificity (no false positives in 215 kb of genome) in high-coverage data.


3 These authors contributed equally to this work.

4 Present addresses: Google, Inc., Cambridge, MA 02142, USA;

5 Akamai, Cambridge, MA 02142, USA.

6 Corresponding author.

E-mail jaffe{at}broad.mit.edu; fax (617) 452-4588.

[Supplemental material is available online at www.genome.org.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.070227.107.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Genome Res.Home page
R. A. Holt and S. J.M. Jones
The new paradigm of flow cell sequencing
Genome Res., June 1, 2008; 18(6): 839 - 846.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.
Copyright © 2008 by Cold Spring Harbor Laboratory Press.