|
Vol. 9, Issue 3, 297-307, March 1999
METHODS
Analysis of Sequence-Tagged-Connector Strategies for DNA Sequencing
Andrew F.
Siegel,1,3
Barbara
Trask,2
Jared C.
Roach,2
Gregory G.
Mahairas,2
Leroy
Hood,2 and
Ger
van den Engh2
1 Departments of Management Science, Finance, and
Statistics and 2 Department of Molecular Biotechnology,
University of Washington, Seattle, Washington 98195 USA
The BAC-end sequencing, or sequence-tagged-connector (STC), approach
to genome sequencing involves sequencing the ends of BAC inserts to
scatter sequence tags (STCs) randomly across the genome. Once any BAC
or other large segment of DNA is sequenced to completion by
conventional shotgun approaches, these STC tags can be used to identify
a minimum tiling path of BAC clones overlapping the nucleation sequence
for sequence extension. Here, we explore the properties of
STC-sequencing strategies within a mathematical model of a random
target with homologous repeats and imperfect sequencing technology to
understand the consequences of varying various parameters on the
incidence of problem clones and the cost of the sequencing project.
Problem clones are defined as clones for which either (A) there is no
identifiable overlapping STC to extend the sequence in a particular
direction or (B) the identified STC with minimum overlap comes from a
nonoverlapping clone, either owing to random false matches or
repeat-family homology. Based on the minimum overlap, we estimate the
number of clones to be entirely sequenced and, then, using cost
estimates, identify the decision rule (the degree of sequence
similarity required before a match is declared between an STC and a
clone) to minimize overall sequencing cost. A method to optimize the
overlap decision rule is highly desirable, because both the total cost
and the number of problem clones are shown to be highly sensitive to
this choice. For a target of 3 Gb containing ~800 Mb of repeats with 85%-90% identity, we expect <10 problem clones with 15 times
coverage by 150-kb clones. We derive the optimal redundancy and insert sizes of clone libraries for sequencing genomes of various sizes, from
microbial to human. We estimate that establishing the resource of STCs
as a means of identifying minimally overlapping clones represents only
1%-3% of the total cost of sequencing the human genome, and, up to a
point of diminishing returns, a larger STC resource is associated with
a smaller total sequencing cost.
3
Corresponding author.
1
These sequencing error rates are considerably higher
than those currently achievable for substitution errors in single reads but were chosen to improve robsutness of the results to polymorphism (<1%) as well as to encompass insertion and deletion errors
(1%-2%) in single reads.
9:297-307 ©1999 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/99 $5.00

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
A. Milosavljevic, R. A. Harris, E. J. Sodergren, A. R. Jackson, K. J. Kalafus, A. Hodgson, A. Cree, W. Dai, M. Csuros, B. Zhu, et al.
Pooled genomic indexing of rhesus macaque
Genome Res.,
February 1, 2005;
15(2):
292 - 301.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. T. Bull, A. C. Ward, and M. Goodfellow
Search and Discovery Strategies for Biotechnology: the Paradigm Shift
Microbiol. Mol. Biol. Rev.,
September 1, 2000;
64(3):
573 - 606.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Mao, T. C. Wood, Y. Yu, M. A. Budiman, J. Tomkins, S.-s. Woo, M. Sasinowski, G. Presting, D. Frisch, S. Goff, et al.
Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors
Genome Res.,
July 1, 2000;
10(7):
982 - 990.
[Abstract]
[Full Text]
|
 |
|

|
 |

|
 |
 
S. Batzoglou, B. Berger, J. Mesirov, and E. S. Lander
Sequencing a Genome by Walking with Clone-End Sequences: A Mathematical Analysis
Genome Res.,
December 1, 1999;
9(12):
1163 - 1174.
[Abstract]
[Full Text]
|
 |
|

|
 |

|
 |
 
G. G. Mahairas, J. C. Wallace, K. Smith, S. Swartzell, T. Holzman, A. Keller, R. Shaker, J. Furlong, J. Young, S. Zhao, et al.
Sequence-tagged connectors: A sequence approach to mapping and scanning the human genome
PNAS,
August 17, 1999;
96(17):
9739 - 9744.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|