Genome Research scroll

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Siegel, A. F.
Right arrow Articles by van den Engh, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Siegel, A. F.
Right arrow Articles by van den Engh, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 9, Issue 3, 297-307, March 1999

METHODS
Analysis of Sequence-Tagged-Connector Strategies for DNA Sequencing

Andrew F. Siegel,1,3 Barbara Trask,2 Jared C. Roach,2 Gregory G. Mahairas,2 Leroy Hood,2 and Ger van den Engh2

1  Departments of Management Science, Finance, and Statistics and 2 Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195 USA

The BAC-end sequencing, or sequence-tagged-connector (STC), approach to genome sequencing involves sequencing the ends of BAC inserts to scatter sequence tags (STCs) randomly across the genome. Once any BAC or other large segment of DNA is sequenced to completion by conventional shotgun approaches, these STC tags can be used to identify a minimum tiling path of BAC clones overlapping the nucleation sequence for sequence extension. Here, we explore the properties of STC-sequencing strategies within a mathematical model of a random target with homologous repeats and imperfect sequencing technology to understand the consequences of varying various parameters on the incidence of problem clones and the cost of the sequencing project. Problem clones are defined as clones for which either (A) there is no identifiable overlapping STC to extend the sequence in a particular direction or (B) the identified STC with minimum overlap comes from a nonoverlapping clone, either owing to random false matches or repeat-family homology. Based on the minimum overlap, we estimate the number of clones to be entirely sequenced and, then, using cost estimates, identify the decision rule (the degree of sequence similarity required before a match is declared between an STC and a clone) to minimize overall sequencing cost. A method to optimize the overlap decision rule is highly desirable, because both the total cost and the number of problem clones are shown to be highly sensitive to this choice. For a target of 3 Gb containing ~800 Mb of repeats with 85%-90% identity, we expect <10 problem clones with 15 times coverage by 150-kb clones. We derive the optimal redundancy and insert sizes of clone libraries for sequencing genomes of various sizes, from microbial to human. We estimate that establishing the resource of STCs as a means of identifying minimally overlapping clones represents only 1%-3% of the total cost of sequencing the human genome, and, up to a point of diminishing returns, a larger STC resource is associated with a smaller total sequencing cost.


3   Corresponding author.
1   These sequencing error rates are considerably higher than those currently achievable for substitution errors in single reads but were chosen to improve robsutness of the results to polymorphism (<1%) as well as to encompass insertion and deletion errors (1%-2%) in single reads.


9:297-307 ©1999 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/99 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Genome Res.Home page
A. Milosavljevic, R. A. Harris, E. J. Sodergren, A. R. Jackson, K. J. Kalafus, A. Hodgson, A. Cree, W. Dai, M. Csuros, B. Zhu, et al.
Pooled genomic indexing of rhesus macaque
Genome Res., February 1, 2005; 15(2): 292 - 301.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
A. T. Bull, A. C. Ward, and M. Goodfellow
Search and Discovery Strategies for Biotechnology: the Paradigm Shift
Microbiol. Mol. Biol. Rev., September 1, 2000; 64(3): 573 - 606.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
L. Mao, T. C. Wood, Y. Yu, M. A. Budiman, J. Tomkins, S.-s. Woo, M. Sasinowski, G. Presting, D. Frisch, S. Goff, et al.
Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors
Genome Res., July 1, 2000; 10(7): 982 - 990.
[Abstract] [Full Text]


Home page
Genome Res.Home page
S. Batzoglou, B. Berger, J. Mesirov, and E. S. Lander
Sequencing a Genome by Walking with Clone-End Sequences: A Mathematical Analysis
Genome Res., December 1, 1999; 9(12): 1163 - 1174.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. G. Mahairas, J. C. Wallace, K. Smith, S. Swartzell, T. Holzman, A. Keller, R. Shaker, J. Furlong, J. Young, S. Zhao, et al.
Sequence-tagged connectors: A sequence approach to mapping and scanning the human genome
PNAS, August 17, 1999; 96(17): 9739 - 9744.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.