|
|
|
|
Vol. 9, Issue 9, 853-858, September 1999
LETTER
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
We report results using a microdevice for DNA sequencing using samples from chromosome 17, obtained from the Whitehead Institute Center for Genome Research (WICGR) production line. The device had an effective separation distance of 11.5 cm and a lithographically defined injection width of 150 µm. The four-color raw data were processed, base-called by the sequencing software Trout, and compared to the corresponding ABI 377 sequence from WICGR. With a criteria of 99% accuracy, we achieved average continuous reads of 505 bases in 27 min with 3% linear polyacrylamide (LPA) at 150 V/cm, and 460 bases in 22 min with 4% LPA at 200 V/cm at a temperature of 45°C. In the best case, up to 565 bases could be base-called with the same accuracy in <25 min. In some instances, Trout allowed for accurate base-calling down to a resolution R as low as R = 0.35. This may be due in part to the high signal-to-noise ratio of the microdevice. Unlike many results reported on capillary machines, no additional sample cleanup other than ethanol precipitation was required. In addition, DNA fragment biasing (i.e., discrimination against larger fragments) was reduced significantly through the unique sample injection mechanism of the microfabricated device. This led to increased signal strength for long fragments, which is of great importance for the high performance of the microdevice.
| |
INTRODUCTION |
|---|
|
|
|---|
Significant advancement in the technology of DNA analysis is
expected from the use of microfabricated
electrophoretic devices for sequencing and
genotyping. In this approach photolithography, combined with
wet-etching and thermal wafer bonding, is used to construct enclosed
intricate microchannel structures in glass and fused-silica substrates;
these structures are then utilized for electrophoresis (Harrison et al.
1993
). It has been speculated that these devices will allow DNA
separations approaching the theoretical limits of electrophoresis and
in a format that will reduce analysis time and extend parallelism and
automation (Freemantle 1999
), which might hence increase throughput
well beyond current capillary array machines. For example, in recent
experiments we have demonstrated genotyping at 10- to 100-fold reduced
analysis times on microdevices when compared to capillaries and slab
gels, respectively (Schmalzing et al. 1997
, 1999
). DNA sequencing of single-color pGEM and four-color M13 DNA standard sequencing samples has been demonstrated on 3.5-, 11.5-, and 7-cm-long microdevices (Woolley et al. 1995
; Schmalzing et al. 1998
; Liu et al. 1999
). The
feasibility of ultra-high sample throughput has been proven through
still modest multiplexing up to 96 microchannels (Simpson et al. 1998
;
Koutny et al. 1999
). However, to the best of our knowledge, all
published studies on DNA sequencing by microdevices have been performed
using DNA standard samples such as M13 or pGEM. Practical sequencing
must deal with additional factors such as variable salt and
template concentrations (Ruiz-Martinez et al. 1998
;
Salas-Solano et al. 1998
), highly sample-specific compression regions,
and the interplay between electrophoretic separation and base-calling
software typical of production DNA sequencing samples. We report
initial results on how microdevices perform under practical conditions
using DNA sequencing samples as prepared for high throughput,
cost-sensitive sequencing under the Human Genome Project. Our results
suggest that much of the anticipated throughput improvement for
microdevice sequencing is feasible.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
The electrophoretic microdevice used in this study consists of a
12.5-cm-long straight separation channel with an effective separation
distance from injection to detection point of 11.5 cm (see Fig.
1A). The channel cross section is nearly semicircular with a 40-µm radius. A 1.0-cm-long loading channel intersects the
separation channel at a distance 0.5 cm below the cathodic end and
connects the sample and waste reservoirs. The intersection geometry
defines a 150-µm-long injection plug of ~0.5-nl volume. The
loading and injection mechanism is illustrated in Figure 1B. The
channel surfaces were passivated to neutralize electro-osmotic flow and
minimize sample adsorption (Hjerten 1985
). In this study we have used a
replaceable high molecular weight linear polyacrylamide (LPA)
separation matrix in 1× TBE with 3.5 M urea and 30%
(wt/vol) formamide, chosen for its extremely high performance in DNA
sequencing applications (Carrilho et al. 1996
). The analysis
temperature was 45°C.
|
We analyzed 12 different samples from the chromosome 17 with two
separation conditions. The samples were taken randomly from the
Whitehead Institute Center for Genome Research (WICGR) production line
after ethanol precipitation. No additional sample treatment was
performed. Six samples were sequenced under condition 1 [C1: 4%
(wt/vol) LPA at 200 V/cm]. Another six were sequenced under condition
2 [C2: 3% (wt/vol) LPA at 150 V/cm]. Based on previous work, these
conditions are expected to yield near optimum electrophoretic performance of the microdevice for DNA sequencing (Schmalzing et al.
1998
). The four-color microdevice raw data were processed using manual
editing of results generated by Trout sequencing software. Minor
modifications of the Trout color matrix and temporal filters were used
to adjust the software for the unusually high data rate and custom
detector of the microelectrophoresis device. The final sequence was
compared in blind tests with the sequence previously generated at the
WICGR on an ABI 377 sequencer. Table 1 summarizes the
results. For the data comparisons we defined read
length as the contiguous length of sequence, measured in bases, which
has a base-calling accuracy of
99%.
|
The electrophoretic condition C1 produced an average read length of 460 bases with a root square deviation (RSD) of 66 bases in an average
total run time of 21.7 min (RSD = 1.2 min). Condition C2 resulted in
a somewhat longer average read length of 505 bases (RSD = 36 bases)
in 26.7 min (RSD = 1.2 min). The extended read of C2 can be
attributed to relative reductions in voltage and LPA concentration,
which usually improve the electrophoretic separation of longer DNA
fragments (Carrilho et al. 1996
). In some cases, either condition
generated runs with exceptionally long reads, for example, C1, 520 bases for sample 3; and C2, 565 bases for sample 7.
We evaluated several aspects of the errors in the called sequences.
Most of the runs (9 of 12) were error-free between 100 and 450 bases.
Errors clustered at the beginning and the end of the runs. Because of
anomalous electrophoretic migration in the early part of the run,
base-calling was unreliable below ~35 bases for both conditions.
However, there was no loss of valuable sequence information because
most of the sequence in this region was M13 vector sequence. Some
errors occurred between 35 and 100 bases; several of them (7 of 20)
could be attributed to strong compressions. Less severe compression was
also noticed in other parts of the runs but could usually be
base-called by manual editing. This observation suggests that even
stronger denaturing conditions should be used in the future. C1 had
noticeably fewer errors than C2 in this region (5 vs. 15), in agreement
with the general finding that both higher voltage (less time for
diffusion) and higher concentration of sieving matrix (smaller pore
size) increase the separation of small fragments (Carrilho et al. 1996
).
The frequency of base-calling errors increased steadily for the
late-migrating fragments. The increase was much more gradual for C2
than for C1 because of the superior electrophoretic performance of C2
in this region, which outweighed the higher error rate at the beginning
of the C2 runs. Interestingly, the vast majority of base-calling errors
(19 of 23) from 451 to 500 bases was associated with multiplets.
Typically only n
1 bases (n
2) were
called when actually n bases were present in a given
multiplet. Inspection of the raw data revealed that in this region
multiplets started to become single broad peaks lacking the fine
structure directly indicative of the number of bases constituting the
multiplets. We speculate that further adaptation of the sequencing
software for microdevice electrophoresis could reduce this type of
error and lead to longer average read lengths. Beyond 500 bases, all types of base-calling errors could be seen. The sequences became unassignable at ~600 bases. There was still DNA fragment sizing in
this region, but single-base resolution was drastically below 0.5.
The question arises as to the extent that sequencing performance is
limited by specific raw data patterns unique to the microdevice format,
combined with our current degree of optimization of the base-calling
software. One way to express electrophoretic performance without these
confounding factors is through resolution, which is a function of the
separation mechanism and the peak-broadening mechanisms alone.
Resolution of 0.5 is defined as the point at which the migration time
difference between two Gaussian peaks equals their average full widths
at half maximum (Luckey et al. 1993
). This resolution is commonly set
as a minimum requirement for accurate robust sequencing (Best et al.
1994
), although specialized base-calling software has been reported to
operate to a resolution as small as 0.25 (B.L. Karger, unpubl.). In
Figure 2 we plot average resolution as a function of DNA fragment size
for the two electrophoretic conditions C1 and C2 (n = 6 in
both cases). The two curves have the typical shape of
resolution curves in DNA sequencing. Resolution was lower at the
beginning of the runs when good selectivity was compromised by high
diffusion and at the end when poor selectivity dominated over low
diffusion. The best performance was found in the mid-range of fragment
sizes, where selectivity and diffusion were balanced. The graph in
Figure 2 shows total average read lengths of 415 bases for C1 (from
base 35 to 450) and of 475 bases for C2 (from base 35 to 510), assuming
a minimum resolution criteria of 0.5. Trout extended these read lengths
on average by 45 bases for C1 and by 30 bases for C2. For sample 7, which gave the longest read with 565 bases, Trout increased the read
length by 90 bases beyond what would be expected using the minimum
resolution criteria of 0.5. Trout was reading this sequence with no
errors up to base 600, where the resolution was only 0.35.
|
The signal-to-noise ratio varied from run to run in the range between
30 and 70. This may be a consequence of variability in the PCR
amplifications or the inefficiency of the simple ethanol cleanup
procedure to fully remove salt (Salas-Solano et al. 1998
). Residual
salt in the sample might influence the electrical resistance in
the loading channel resulting in salt-dependent DNA velocities during the 2-min-long loading process and thereby introduce some variability of sample concentration during injection (Huang et al.
1988
). The DNA velocities in the separation channel would not be
affected, as only a miniscule amount of sample salt enters the
separation channel during injection. In contrast to capillaries, the
signals remained relatively stable in amplitude during the microdevice
runs and did not decrease with increasing DNA fragment size, as is
often observed for capillary electrophoresis. The microdevice
cross-injector seems to permit representative DNA sample loading
regardless of the composition of the sample. In addition, neither
base-calling nor run times were found to be compromised.
Figure 3 shows the profile of the processed four-color sequencing run of sample 7 performed on a microfabricated device with an effective separation length of 11.5 cm filled with 3% LPA, 150 V/cm, at 45°C. The data presented have been processed and base-called by Trout, followed by manual editing. Errors are indicated as hyphens in the letter sequence. The primer region (eluting at ~4 min) and the very end of the run are not shown. The presentation starts with the four-letter sequence TCCC (bases 32-35), which is the last section of the M13 vector sequence adjacent to the insert. Sequencing fragments of 400, 500, and 600 bases in length passed the detector after ~16.5, 19.1, and 21.3 min, respectively. The total run time was 25.7 min, and the total read length was 565 bases.
|
In conclusion, we have shown that microdevices are capable of high-quality DNA sequencing with practical de novo sequencing samples. Our data suggest that significantly less sample pretreatment might be required for consistent operation of microdevices when compared to capillaries. Analysis time is also reduced significantly in many cases. Robust continuous read lengths exceeded 500 bases in <30 min with limited optimization of base-calling software.
| |
METHODS |
|---|
|
|
|---|
Micromachining
Microdevices were built from 150-mm-diam. fused-silica wafers
(Hoya, Tokyo, Japan) using photolithography, chemical wet-etching methods, laser drilling to form access holes, and thermal bonding (Koutny et al. 1996
). Individual microdevices were cut from the bonded
wafer pairs using a wafer saw (CHIPS, North Peabody, MA). Glass
reservoirs (Ace Glass, Vineland, NJ) of 50 µl volume were affixed
around the exit holes to hold sample and buffer.
Instrumentation
The apparatus for single-channel microdevice DNA analysis was
described previously (Schmalzing et al. 1997
).
Chemistry
The entire microchannel structure was chemically modified by
grafting acrylamide to the channel surfaces according to the procedure
of Hjerten (1985)
. High-molecular-weight LPA was used as replaceable
sieving material. It was synthesized in-house by inverse emulsion
polymerization (Goetzinger et al. 1998
). Appropriate amounts of the LPA
powder were dissolved in 1 × TBE buffer containing 3.5 M
urea and 30% (vol/vol) formamide.
Sample Preparation
The samples were prepared at WICGR. The vector was M13mp18 with ~2 kb human DNA inserts from Chromosome 17. The GenBank clone names were hRPK.1090_M_7 and hRPK.721_K_11. DYEnamic ET M13(21) primer chemistry (Amersham) was used to prepare the sequencing reaction mixtures. Template DNA (200 ng) was added to each of the four monomer reactions, which were thermocycled for 20 cycles using standard conditions, pooled together, and ethanol precipitated.
Slab Gel Electrophoresis
The standard samples for comparison were run on an ABI 377 sequencer at WICGR using 52-cm plates with a 48-cm well to read. They were run at 2.4 kV or ~46 V/cm for 10 hr. The gel was 5% Long Ranger cross-linked acrylamide.
Microdevice Electrophoresis
Between each run the polymeric buffer solution present in the entire microchannel structure was replaced from the anodic end of the separation channel using a syringe attached to a mechanical fixture. The device was preelectrophoresed for 10 min at 200 V/cm and 45°C. The sequencing samples were dissolved in 20 µl of deionized water, heated to 95°C for 2 min, chilled on ice, and pipetted into the sample reservoir attached to one end of the loading channel. For representative sample loading, the samples were electrophoresed for 2 min at 200 V/cm across the separation channel. For injection and separation, the voltages were switched to create the desired field strength in the separation channel. Sequencing was performed at 45°C. To prevent leakage of excess sample into the separation channel, an electric field of ~20 V/cm was applied to both side arms of the loading channel during electrophoresis.
Data Analysis
The ABI 377 data was signal processed using Plan package (Ewing and
Green 1998
) and base-called using Phred (Ewing et al. 1998
). Plan is a
Unix-based signal processing tool, similar in format to ABI processing
software. It utilizes a mobility correction file (specific to the dye
chemistry), a multicomponent matrix (specific to the sequencing
machine) for color separation, amplitude normalization for the four
channels, baseline substraction, and a smoothing algorithm. The
microdevice data were collected using custom software written in HPVEE
(Hewlett Packard). The microdevice data was processed further using the
base-caller Trout. Trout is available on the WICGR ftp site
(genome.wi.mit.edu) in the directory distribution/software/trout.
Documentation is provided with the program.
Resolution Measurement
Single-base resolution R was calculated using the
relationship R = [(2 ln 2)1/2
(t2
t1)]/[(fwhm1 + fwhm2)
b], where t is the migration time of the
nth peak, fwhm is the full width at half-maximum of
the nth peak, and
b is the difference between
the two peaks in base numbers (
b > 1). C Grams
software (Galactic Industries, Salem, NH) was used to measure
t and fwhm of both isolated and partially resolved
peaks in the A traces of the four-color runs of samples 1-12.
| |
ACKNOWLEDGMENTS |
|---|
We thank WICGR for kindly providing access to the base-caller Trout. We also thank Mark Daly and Steve Rozen for valuable advice on Trout. This work was supported by the National Institutes of Health (grant HG01389) and by Air-force Office of Scientific Research (F49620-98-1-0235).
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL ehrlich{at}wi.mit.edu; FAX (617) 258-7663.
| |
REFERENCES |
|---|
|
|
|---|
Received May 27, 1999; accepted in revised form July 7, 1999.
This article has been cited by other articles:
![]() |
M. L. Metzker Emerging technologies in DNA sequencing Genome Res., December 1, 2005; 15(12): 1767 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Meldrum Automation for Genomics, Part Two: Sequencers, Microarrays, and Future Trends Genome Res., September 1, 2000; 10(9): 1288 - 1303. [Abstract] [Full Text] |
||||
![]() |
D. Schmalzing, A. Belenky, M. A. Novotny, L. Koutny, O. Salas-Solano, S. El-Difrawy, A. Adourian, P. Matsudaira, and D. Ehrlich Microchip electrophoresis: a method for high-speed SNP detection Nucleic Acids Res., May 1, 2000; 28(9): e43 - e43. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Liu, H. Ren, Q. Gao, D. J. Roach, R. T. Loder Jr., T. M. Armstrong, Q. Mao, I. Blaga, D. L. Barker, and S. B. Jovanovich Automated parallel DNA sequencing on multiple channel microchips PNAS, May 9, 2000; 97(10): 5369 - 5374. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||