|
|
|
Published online before print
March 12, 2003, 10.1101/gr.886203
METHODS Efficient High-Throughput Resequencing of Genomic DNAWashington University, Division of Dermatology, St. Louis, Missouri 63110, USA
Targeted resequencing of genomic DNA from organisms such as humans is an important tool enabling experimental access to variation within the species and between similar species. Taking full advantage of the reference genome sequences in designing robust, specific PCR assays and using stringent conditions, resequencing can be done efficiently without purification of the PCR product. By using a 10-fold greater amount of one primer when setting up the PCR initially in a new version of asymmetric PCR, one simply adds the rest of the sequencing reagents at the end of PCR and allows the sequencing reaction to proceed, with the excess PCR primer serving as the sequencing primer. We demonstrated that this streamlined protocol can be used with PCR products up to 1300 bp and had up to a 97% success rate in high-throughput analysis of allele frequencies for >30,000 single-nucleotide polymorphisms (SNPs). SNP primers and characterization results are provided at http://snp.wustl.edu.
The detailed study of genetic variation within populations and among related studies has been greatly aided by techniques of resequencing coupled with the availability of genomic draft sequences from a number of organisms including humans (International Human Genome Sequencing Consortium 2001
Our group has developed a high-throughput pipeline using targeted
resequencing of pooled human DNAs as a means to characterize the allele
frequencies of SNP candidates (Marth et al. 2001
A time-consuming and costly step in our original targeted resequencing was the use of size-exclusion columns to remove the PCR primers and dNTPs at the end of PCR, followed by the readdition of one primer for cycle sequencing, for example (Taillon-Miller et al. 1999
To examine the range of effective lengths one can use with the streamlined resequencing approach, we designed a series of PCR primers approximately every 100 or 200 bases in an 800-bp interval in the human -glucuronidase (GUSB) gene on Chromosome 7. We
used high-throughput primer design methods to avoid locating any primer
in repetitive DNA, and to yield uniform but stringent (55°C optimal
Tm with reaction annealing at 58°C and a hot start
Taq DNA polymerase) PCR reactions, in which any left primer
could be paired with any right primer (see Methods). Using different
combinations of PCR primers, we were able to generate PCR products
ranging from 100790 bp in length.
We analyzed various size PCR products from the grid by agarose gel
electrophoresis using DNA from each of three individuals to see the
effects of 1:1 primer ratios compared with 10:1 ratios and also
dNTP concentration using 0.25, 0.5, 1.0, and 2.0 nmole of each dNTP per
reaction in 10-µL reaction volumes. Theoretically, with 1.0 pmole of
primer used in the reaction, an 800-bp product would maximally require
Given a PCR success of up to 790 bp with the streamlined protocol, we sought to increase this range. Because of repetitive sequences, we could not increase the primer grid to the right, but we were able to increase the range to the left, yielding products up to 1310 bp with primers of the opposite orientation. Additionally, we chose a new primer, 40left, to replace the problematic primer 1left (see Methods). Appropriately sized, unique bands were produced using these primers combined with right primers. For PCR products up to the maximum size we tested, we observed little difference in the amount of product produced with PCR extension times of 30 or 60 seconds (data not shown). We performed DNA sequencing with a range of PCR products using the streamlined protocol with a 10:1 primer ratio and reduced dNTP concentration. One concern about the protocol was that without a cleanup step, the Taq DNA polymerase and unincorporated dNTPs would be carried over from the PCR to the sequencing reaction and could perhaps unbalance the dNTP/ddNTP ratios established by the manufacturer for sequencing, leading to faint signals for shorter products. This effect did not prove to be a practical problem (data not shown). With longer PCR products, we obtained good sequence up to >500 bp. The practical extent of sequencing was limited by the quality of information from the capillary sequencer, not by the protocol.
A method for resequencing called direct amplification and sequencing
(DEXAS) is nominally simpler than our protocol because the PCR and
sequencing steps are performed in the same tube at the same time using
two kinds of DNA polymerases (Kilger and Paabo 1997
Asymmetric PCR is a well-known approach to produce single-stranded DNA
for a variety of applications. Because the PCR primers used are in
different concentrations, the method presented here is, by definition,
a version of asymmetric PCR. However, the intent of various forms of
early asymmetric PCR differed from the intent in this protocol. The
initial presentation of asymmetric PCR some 15 years ago (before cycle
sequencing was possible) was in response to the problems of trying to
sequence double-stranded PCR products that rapidly reannealed to block
priming. The first protocol used a 100:1 ratio of PCR primers to
produce a mixture of double- and single-strand DNA, which, after
purification, was sequenced by adding back the initially rare primer
and other reagents (Gyllensten and Erlich 1989 Although we used the protocol for high-throughput resequencing, it could be applied to projects of any size. We have used our methods to design primers for all human SNPs in dbSNP possible, and the sequences are freely available (http://snp.wustl.edu). For other design needs, one can use the parameters listed in the Methods section below with the Web-based version of the Primer3 program. Our collaborators have successfully used the protocol to resequence various exons in human and mouse DNA. The streamlined protocol has been used as the workhorse for our project to characterize the allele frequencies of human SNP candidates (e.g., Fig. 2). More than 30,000 SNPs in three populations have been successfully characterized and made publicly available (http://snp.wustl.edu; also dbSNP and The SNP Consortium Web sites). In general, the rate of technical success with the protocol has been high; for example, one of us (E.G.L.) has had a success rate of 97% with 6099 SNPs. When DNA sequence is available, the streamlined protocol provides a successful and efficient means to study genetic variation as long as the PCR product is unique and one of the PCR primers is exhausted at the end of PCR.
Primer Design Briefly, to pick primers for high-throughput targeted resequencing, we used several steps: targets were chosen, the DNA sequences were obtained from databases, repetitive sequences were determined and marked using RepeatMasker software (http://ftp.genome.washington.edu/RM), primers were chosen using set, stringent parameters and Primer3 software (0.9, Unix version, http://www-genome.wi.mit.edu/genome_software/; Rozen and Skaletsky 2000 To provide a grid of left and right primers for testing the 10-to-1 protocol, we obtained genomic sequence surrounding SNP rs2008188 in intron 1 of GUSB from the Golden Path (April 2001 freeze, June 2002 chr7:6406417064073837, http://genome.ucsc.edu/). The sequence was RepeatMasked and formatted with a Perl script in which the parameters for Primer3 were set including an optimal product size of 800 bp. After this pair was designed, the left primer (designated 1left) was fixed as a parameter for Primer3, and the product lengths were successively adjusted to create a series of right primers producing additional product lengths up to 790 bp at 100-bp or 200-bp increments. Similarly, the farthest right primer (designated 800right) was fixed as a parameter for Primer3, and the product lengths were adjusted to choose a series of left primers. Later, to extend the product range up to 1300 bp and to replace one problematic primer, additional primers were chosen using similar methods. For the primer name, the number refers to the approximate location (in base pairs) on an arbitrary grid, and any left primer could be paired with any right primer for PCR. For the first group the primers were 1left, 5'-ACTTGTAAATGCTGCCAAAT-3'; 100left, 5'-TTTCGCAAGTAATATACAACAGA-3'; 200left, 5'-TCACTATAGCTGACTCTCCTGTT-3'; 400left, 5'-CCAACTTTGTTTCCAATATTCT-3'; 600left, 5'-GGTACTGCTCTAGCAGACTTTT-3'; 700left, 5'-AAAATAAAGATCCACTTGATGGT-3'; 100right, 5'-CTGTTGTATATTACTTGCGAAAAG-3'; 200right, 5'-GATTTACTTTTGGGATACACTCA-3'; 400right, 5'-GAAGCTGGTTTAATCCATGTAG-3'; 600right, 5'-GTTCACTGAAGAGTACCAGAAAA-3'; 700right, 5'-ACCATCAAGTGGATCTTTATTTT-3'; 800right, 5'-TTTTATTCTGGGTTACATCATTC-3'. For the second group, the primers were 500left, 5'-ATTCTCACTCTTA CGCTTTACCT-3'; 400left, 5'-ATCTTCAGTTTATGGTAA GTCCA-3'; 300left, 5'-GTTATTCTCTTTGAAGACCAAT CT-3'; and 40left, 5'-GTTGAAACTCACCTGTATTTGAT3'. The primer pairs 1left with 800right and 500left with 800right, respectively, produce 790-bp and 1310-bp PCR products.
PCR/Sequencing Protocol
Each sequencing reaction contained 2.5 µL of the PCR reaction, 6.5
µL of water, 2.0 µL of BigDye version 3 mix (Applied Biosystems),
and 1.0 µL of 5x sequencing buffer, according to the protocol of the
dye manufacturer. The thermocycling program consisted of an initial
step at 96°C for 2 min, then 25 cycles of denaturation at 96°C for
15 sec, annealing at 50°C for 1 sec, and extension at 60°C for 4
min. Removal of the extra dye was performed according to the
manufacturer's protocol using columns in 96- or 384-well format
(Princeton Separation or Genetix). Electrophoresis and sequencing
detection were performed using an ABI PRISM 3700 DNA Analyzer (Applied
Biosystems). Electropherograms were aligned with Sequencher software
(Gene Codes), and allele frequencies were estimated as described (Kwok
et al. 1994
http://snp.wustl.edu; SNP primers and characterization results are provided at the common link homepage at Washington University. http://ftp.genome.washington.edu/RM; The Repeat Masker Server at the University of Washington. http://genome.ucsc.edu/; The University of California at Santa Cruz Genome Browser. http://snp.cshl.org/; The SNP Consortium. http://www-genome.wi.mit.edu/genome_software/; The Whitehead Institute for Biomedical Research. http://www.ncbi.nlm.nih.gov/blast/Blast.cgi; BLAST. http://www.ncbi.nlm.nih.gov/SNP/; dbSNP.
We thank other members of the SNP characterization team including Mathew Minton, Nicholas Addleman, Andrew J. Reinhart, Rachel Donaldson, and Nicholas Pavelka, who have commented on and extensively used the streamlined protocol. We thank one reviewer for providing references to asymmetric PCR. This work was funded in part by grants from the NIH (HG01720 and GM63340) and The SNP Consortium. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
1 Present address: University of California, San Francisco, Cardiovascular Research Institute, San Francisco, California 94143-0130, USA. E-MAIL kwok{at}cvrimail.ucsf.edu; FAX (415) 476-2283. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.886203. Article published online before print in March 2003.
locus in primates. Proc. Natl. Acad. Sci. 86: 9986-9990.
Received October 8, 2002; accepted in revised format January 23, 2003. 13:717-720 © by 2003 Cold Spring Harbor Laboratory Press ISSN 1088-9051/03 $5.00 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||