|
|
|
|
Genome Res. 14:126-133, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Methods High-Throughput MALDI-TOF Discovery of Genomic Sequence Polymorphisms1 Methexis Genomics NV, B-9052 Zwijnaarde, Belgium 2 SEQUENOM, Inc., San Diego, California 92121, USA
We describe a comparative sequencing strategy that is based on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of complete base-specific cleavage reactions of a target sequence. The target is converted to a DNA/RNA mosaic structure after PCR amplification using in vitro transcription. Cleavage with defined specificity is achieved by ribonucleases. The set of cleavage products is subjected to mass spectrometry without prior fractionation. The presented resequencing assay is particularly useful for single-nucleotide polymorphism (SNP) discovery. The combination of mass spectra from four complementary cleavage reactions detects approximately 98% of all possible homozygous and heterozygous SNPs in target sequences with a length of up to 500 bases. In general, both the identity and location of the sequence variation are determined. This was exemplified by the discovery of SNPs in the human gene coding for the cholesteryl ester transfer protein using a panel of 96 genomic DNAs.
The identification and scoring of polymorphisms and mutations are playing an increasingly important role in medical genetics. Single-nucleotide polymorphisms (SNPs) are, by virtue of their abundance and stability, the focus of attention in large-scale association studies and pharmacogenomics (Lander 1996
A variety of techniques are used to discover sequence variations. These include denaturing gradient gel electrophoresis (Sheffield et al. 1989
In this report, we present a novel approach to comparative sequence analysis. The use of RNase-based approaches for detection of sequence variations by mass spectrometry has been proposed by our groups and others earlier (Rodi et al. 2002
Assay Concept The method described here for the identification of sequence variations consists of four individual base-specific cleavage reactions, followed by analysis of each mixture of cleavage products by MALDI-TOF MS. Sequence variations are identified on the basis of discrepancies between the experimentally determined masses and predicted masses using in silico-generated mass spectra from a reference sequence. In the present study only qualitative spectral changes were considered, although in many cases normalized peak areas allow determination of the relative quantity of a fragment (Buetow et al. 2001
10,000 Da. Fragments that fall within this window and that range in size from 4-mers to 30-mers can be calculated to cover 74% ( n=4->30 Dn·n/N) of the target sequence. Thus, a single mass fingerprint generated by complete cleavage does not allow for a comprehensive screen for sequence variations. This deficiency is offset in the present methodology, where all four basespecific cleavage reactions are performed. The significance of complementary base-specific cleavages extends beyond the mere increase in likelihood that a sequence variation is detected. Typically, the combination of the information in the four spectra also results in the identity of the substitution as well as its unambiguous localization. When ambiguity does occur, in most cases it is restricted to a small number of closely spaced positions (generally two adjacent identical bases; see below). In essence, the combined information from all four base-specific cleavages already largely determines the sequence of the underlying target amplicon. The use of a reference sequence allows the resolution of ambiguities remaining after reconstruction of sequence candidates from the cleavage pattern. In most cases, initial ambiguities are the loss of orientation of a sequence stretch between inverted repeats.
Assay Development
For conditioning of the nucleic acid cleavage products prior to MALDI-MS analysis, two alternative procedures were used in the present study. In the first procedure, the transcript is allowed to hybridize onto an immobilized oligonucleotide that is complementary to the transcript's 3'-end. Note that all transcripts can be captured with a generic T7 and SP6 oligonucleotide (see Fig. 1). The captured transcript is extensively washed with ammonium to replace the metal ions and then digested with RNase under MALDI-MS-compatible conditions. In the second protocol, the transcription, cleavage, and sample conditioning are carried out as a series of add-on reactions in a single tube. This approach avoids the use of solid phases. Conditioning is realized through the addition of ammonium-loaded cation exchange resin to the reaction vessel, and the resultant mixture is directly applied onto a chip array for MALDI analysis.
Proof of Concept
Figure 2 illustrates the discovery of a G/A SNP in a 660-bp amplicon using the present methodology. The concept of multiple observations becomes evident when analyzing the mass signal changes in the four base-specific cleavage reactions. In a homozygous individual, the T-specific cleavage of the forward transcript reveals a missing signal of known identity at 2613.6 Da, whereas a new mass signal appears at 2597.6 Da. The most straightforward explanation for this observation is that one of the G residues on the 2613.6-Da fragment is replaced by an A, reducing its mass by 16 Da. For a heterozygous individual the spectrum would contain both peaks. The C-specific reaction on the forward transcript confirms the observation of a G/A substitution seen in the T-reaction, but does not help to further define the sequence variation; that is, the forward reactions, individually or in combination, do not allow one to pinpoint which of the four successive G residues is mutated. Unambiguous positioning of the G/A polymorphism can, however, be drawn from the complementary spectra. In the T- and C-specific reaction on the reverse strand, the G/A substitution would either create or eliminate a cleavage site, and the resultant spectral changes would permit the unequivocal localization of the polymorphism. In the T-reaction on mutant samples, a 2348.4-Da fragment is observed which can only be explained by assuming that the first of the four successive G's is replaced. Substitution of the other G residues would have resulted in a fragment of 2637.6, 2926.8, or 3216.0 Da, all of which would have been visible as distinct mass signal peaks. The mass signal that appears in the mutant C (reverse)-spectrum yields confirmatory information. The mass signal at 3326.1 Da would normally be expected to disappear in the homozygous mutant sample. In this particular case, two cleavage products with identical nucleotide compositions (and thus identical masses) were generated from different regions of the target sequence. As a consequence, the sequence change leads only to a signal-to-noise reduction of the corresponding mass signal. Close inspection of all of the spectra shown in Figure 2 reveals that overall peak intensity correlates well with the relative amount of cleavage products; that is, most of the peaks that relate to the G/A sequence change have in the case of heterozygous substitution roughly half the intensity seen with homozygous samples. It should also be pointed out that the interrogated amplicon incorporates multiple SNPs (see Fig. 2 legend). In general, the presence of additional SNPs does not confound the analysis, although it becomes more difficult the more the test sequence diverges from the reference DNA. The identification of SNPs in the present study required visual interpretation of spectra (see the Methods section); proprietary software has since become available for automated SNP discovery based on the integration of the information in the four complementary spectra (Böcker 2003
Simulation Study We performed a number of simulation studies to demonstrate the performance of the MS-based cleavage assays for the discovery of homozygous and heterozygous single-nucleotide changes (substitutions, insertions, and deletions) as well as to explore the effect of the length of the interrogated target sequence. To that end, a program was used that systematically introduced all possible single-nucleotide mutations in a given test sequence and categorized the sequence variations depending on the ability to detect them using the four cleavage reactions, the ability to determine the nature of the sequence change, and the ability to unambiguously localize them. In the simulations, only mass signals between 1100 Da and 10,000 Da were considered. We assumed a mass resolution (m/ m) of 1000 in the 5-10-kDa region, a value routinely achieved with state-of-the-art equipment when desorbing nucleic acids from chip arrays. In the region below 5000 Da, we considered peaks detectable only when separated by 5 Da. Four alternative cleavage schemes were tested (Fig. 1): (1) RNase-A cleavage of the dC- and dU-transcripts of each strand, (2) RNase-A cleavage of the dC- and dT-transcripts of the two strands, (3) RNase-A and RNase-T1 cleavage of the dC-transcripts of both strands, and (4) RNase-A cleavage of the dC- and dU-transcripts of the top strand, RNase-T1 cleavage of the top strand dC-transcript, and RNase-A on the dC-transcript of the bottom strand. Details of the model system are given in the Methods section. The results of the simulations where one of the above reaction sets is used for the identification of single-nucleotide substitutions are summarized in Table 1. The data obtained with alternative sets of cleavage reactions are not significantly different (data not shown). The results demonstrate that the MS-analyses of complementary cleavage reactions represent a very sensitive method; overall, >98% of all possible SNPs can be detected in target sequences with a length of up to 500 bases. Even with sequences of 1000 bases, over 95% of the SNPs can be observed. As discussed above, the sequence content of the mass spectra is considerable and the method, in general, also permits determination of the identity as well as the positions of the SNPs. The one other sizable category consists of SNPs that cannot be localized unambiguously. This almost invariably is the result of mutations at nearby (in most cases adjacent) positions yielding indistinguishable profiles. Taken together, more than 95% of the single-nucleotide substitutions in 500-bp target sequences can be identified and localized to a single position or within a few closely spaced identical bases. A further simulation study covering around 4 Mb of sequence in over 16 randomly selected gene regions (including coding and noncoding sequence) revealed detection rates equivalent to those obtained for the 64-kB region depicted in Table 1. This supports the notion that our original data set is not biased towards increased detection rates. A more extensive simulation study including a detailed account of sequence dependency of our method will be published elsewhere.
Discovery of SNPs in the CETP Gene A contiguous 5-kb region of the gene coding for CETP (cholesteryl ester transfer protein), running from exon 9to exon 11, was selected as a target sequence for the present SNP discovery study. Gene ENSG00000087237 (nucleotide position 11486-16410) from the Ensembl genomic database was taken as the reference sequence. The target region was divided into 10 amplicons ranging in size from 373 to 646 bp. DNA samples of 96 independent individuals of Caucasian ancestry were analyzed in duplicate using the captured transcript format and the homogeneous assay format (refer to the Methods section). The following cleavage reactions were performed (see also Fig. 1): an RNase-A digest of the dC-containing T7 transcript (i.e., T-reaction), an RNase-A digest of the dC- and dU-containing SP6-transcripts (i.e., A- and G-reactions), and RNase-T1 cleavage of the dC SP6-transcript (i.e., C-reaction). Thus, a total of 7680 spectra were acquired; the assay failure rate was 2%. A total of 27 candidate SNPs (Table 2) were discovered using the two assay formats; the results obtained with the two alternative procedures were only different when assay failure of indicative reactions occurred.
Of the 27 SNPs, 16 were described previously. All of the SNPs identified by our approach were experimentally validated by means of MassEXTEND reactions. The ambiguity that remains after the cleavage reactions for particular SNPs does not present a problem. In fact, the primer extension assays could be designed such that they not only validate the candidate SNPs but, additionally, resolve the positional uncertainty, when present. Three public SNPs were not identified in the present study. Reexamination of the data set showed that two of these polymorphisms are absent from our sample collection. The third SNP, however, is present in some of our samples as evidenced by the MassEXTEND assays and is a clear case of a false negative. This SNP was missed because (1) of genetic linkage to another nearby SNP, and (2) the concurrence of both SNPs on a cleavage product is mass-neutral (Table 2; SNPs at position 14176 and 14185). No false positives were identified in the present study. The robustness of the methodology rests primarily on the fact that a sequence variation is associated with one or more distinct and characteristic signals in the various cleavage reactions. To illustrate this feature, all base-specific cleavage patterns for a newly discovered CETP SNP are displayed in Figure 3. This G/A polymorphism was detected in a heterozygous sample. The sequence change generates in aggregate four additional mass signals.
In principle, it should be possible to use sample pools to detect SNPs. This analysis was not explored in the present study, but preliminary experiments with samples consisting of various ratios of two allelic sequences suggest a detection threshold of 5%. As MALDI-TOF MS resequencing results in highly discernible new signals, this approach for SNP discovery using pooled samples should have a clear advantage over capillary electrophoresis (CE) sequencing methods, where SNP signals are coincident and often difficult to discern.
The base-specific cleavage assays described here, especially the homogeneous format, are readily amenable to automation. Combined with the chip-based MassARRAY platform, this allows high-throughput serial spectrum acquisition. At a rate of 14,592 cleavage reactions (38 x 384-chip elements) performed on about 500 base-pair-long target sequences,
Cleavage Assay Captured Transcript Assay The CETP regions of interest were first PCR-amplified starting from human genomic DNA using primers that incorporate T7 [5'-CAGTAATACGACTCACTATAGGGAGA] and SP6 [5'-CGATTTAGGAGACACTATAGAAGAG] promoter sequences. The PCR reactions were carried out in a total volume of 20 µL using 5 pmol of each primer, 200 µM dNTP, 0.1 µL Taq DNA polymerase (5 U/µL; Promega), 1.5 mM MgCl2, and a buffer supplied with the enzyme. Typically, 2 µL of the PCR reaction (25-250 ng amplicon) was directly used as template in a 10-µL transcription reaction. A mutant polymerase (25 units T7 or SP6 R&DNA polymerase; Epicentre) was used to incorporate either dCTP or dUTP/dTTP in the transcripts. Ribonucleotides were used at 1 mM and the dNTP substrate at 5 mM; other components in the reaction were as recommended by the supplier. The reaction additionally contained 10 µg streptavidin-coated paramagnetic beads (Seradyn) preloaded with a 5'-biotinylated oligonucleotide (sequence as shown above) that is complementary to the generic 3'-end of the T7- or SP6-transcripts. Incubation was performed at 37°C for 2 h. Following transcription, the mixture was heated and slowly cooled so as to allow the full-length in vitro transcripts to anneal to the immobilized oligonucleotide. Using an automated 96-channel pipetter equipped with a washing station and a magnetic particle collector, the captured transcript was washed twice with 5 M NH4Cl and once with 100 mM (NH4)3-citrate. The beads were finally resuspended in 10 µL of 30 mM (NH4)3-citrate containing an appropriate amount of RNase (Roche Diagnostics) and incubated at 37°C for about 30 min to digest the transcripts to completion.
Homogeneous Format
MassEXTEND Assay
Mass Spectrometry Measurements
Identification of SNPs
Simulations
We thank Dr. Charles Cantor for valuable discussions and Gabi Sperling and Julia Clemens for expert technical assistance. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1692304.
3 These two authors contributed equally to this work.
4 Corresponding author.
Altshuler, D., Pollara, V.J., Cowles, C.R., Van Etten, W.J., Baldwin, J., Linton, L., and Lander, E.S. 2000. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407: 513-516.[CrossRef][Medline]
Bansal, A., van den Boom, D., Kammerer, S., Honisch, C., Adam, G., Cantor, C.R., Kleyn, P., and Braun, A. 2002. Association testing by DNA pooling: An effective initial screen. Proc. Natl. Acad. Sci. 99: 16871-16874. Böcker S. 2003. SNP and mutation discovery using base-specific cleavage and MALDI-TOF mass spectrometry. Bioinformatics (Suppl.) 19: 44-53.
Buetow, K.H., Edmonson, M., MacDonald, R., Clifford, R., Yip, P., Kelley, J., Little, D.P., Strausberg, R., Koester, H., Cantor, C.R., et al. 2001. High-throughput development and characterization of a genome-wide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl. Acad. Sci. 98: 581-584.
Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M.S., and Fodor, S.P. 1996. Accessing genetic information with high-density DNA arrays. Science 274: 610-614.
Cotton, R.G., Rodrigues, N.R., and Campbell, R.D. 1988. Reactivity of cytosine and thymine in single-base mismatches with hydroxylamine and osmium tetroxide and its application to the study of mutations. Proc. Natl. Acad. Sci. 85: 4397-4401.
Elso, C., Toohey, B., Reid, G.E., Poetter, K., Simpson, R.J., and Foote, S.J. 2002. Mutation detection using mass spectrometric separation of tiny oligonucleotide fragments. Genome Res. 12: 1428-1433. Forrest, S. and Cotton, R.G. 1990. Methods of detection of single base substitutions in clinical genetic practice. Mol. Biol. Med. 7: 451-459.[Medline]
Gibbs, R.A., Nguyen, P.N., McBride, L.J., Koepf, S.M., and Caskey, C.T. 1989. Identification of mutations leading to the Lesch-Nyhan syndrome by automated direct DNA sequencing of in vitro amplified cDNA. Proc. Natl. Acad. Sci. 86: 1919-1923. Glavac, D. and Dean, M. 1995. Applications of heteroduplex analysis for mutation detection in disease genes. Hum. Mutat. 6: 281-286.[CrossRef][Medline] Hacia, J.G. 1999. Resequencing and mutational analysis using oligonucleotide arrays. Nat. Genet. 21: 42-47.[CrossRef][Medline] Hacia, J.G., Brody, L.C., Chee, M.S., Fodor, S.P.A., and Collins, F.S. 1996. Detection of heterozygous mutations in BRCA1 using high density oligonucleotide arrays and two-colour fluorescence analysis. Nat. Genet. 14: 441-447.[CrossRef][Medline]
Hartmer, R., Storm, N., Boecker, S., Rodi, C.P., Hillenkamp, F., Jurinke, C., and van den Boom, D. 2003. RNase T1 mediated base-specific cleavage and MALDI-TOF MS for high-throughput comparative sequence analysis. Nucleic Acids Res. 31: e47. Kirpekar, F., Douthwaite, S., and Roepstorff, P. 2000. Mapping posttranscriptional modifications in 5S ribosomal RNA by MALDI mass spectrometry. RNA 6: 296-306.[Abstract]
Krebs, S., Medugorac, I., Seichter, D., and Förster, M. 2003. RNaseCut: A MALDI mass spectrometry-based method for SNP Discovery. Nucleic Acids Res. 31: e37. Kwok, P.Y., Carlson, C., Yager, T.D., Ankener, W., and Nickerson, D.A. 1994. Comparative analysis of human DNA variations by fluorescence-based sequencing of PCR products. Genomics 23: 138-144.[CrossRef][Medline]
Lander, E.S. 1996. The new genomics: Global view of biology. Science 274: 536-539. Little, D.P., Braun, A., O'Donnell, M.J., and Köster, H. 1997. Mass spectrometry from miniaturized arrays for full comparative DNA analysis. Nature Med. 3: 1413-1416.[CrossRef][Medline] McCarthy, J.J. and Hilfiker, R. 2000. The use of single-nucleotide polymorphism maps in pharmacogenomics. Nat. Biotechnol. 18: 505-508.[CrossRef][Medline]
Mohlke, K.L., Erdos, M.R., Scott, L.J., Fingerlin, T.E., Jackson, A.U., Silander, K., Hollstein, P., Boehnke, M., and Collins, F.S. 2002. High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools. Proc. Natl. Acad. Sci. 99: 16928-16933. Mullikin, J.C., Hunt, S.E., Cole, C.G., Mortimore, B.J., Rice, C.M., Burton, J., Matthews, L.H., Pavitt, R., Plumb, R.W., and Sims, S.K. 2000. An SNP map of human chromosome 22. Nature 407: 516-520.[CrossRef][Medline]
Myers, R.M., Larin, Z., and Maniatis, T. 1985. Detection of single base substitutions by ribonuclease cleavage at mismatches in RNA: DNA duplexes. Science 230: 1242-1246.
Nickerson, D.A., Tobe, V.O., and Taylor, S.L. 1997. PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25: 2745-2751. O'Donovan, M.C., Oefner, P.J., Roberts, S.C., Austin, J., Hoogendoorn, B., Guy, C., Speight, G., Upadhyaya, M., Sommer, S.S., and McGuffin, P. 1998. Blind analysis of denaturing high-performance liquid chromatography as a tool for mutation detection. Genomics 52: 44-49.[CrossRef][Medline]
Orita, M., Iwahana, H., Kanazawa, H., Hayashi, K., and Sekiya, T. 1989. Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc. Natl. Acad. Sci. 86: 2766-2770.
Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al. 2001. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294: 1719-1723. Risch, N. and Merikangas, K. 1996. The future of genetic studies of complex human diseases. Science 273: 1516-1517.[Medline] Rodi, C.P., Darnhofer-Patel, B., Stanssens, P., Zabeau, M., and van den Boom, D. 2002. A strategy for the rapid discovery of disease markers using the MassARRAY system. BioTechniques 32: S62-S69.
Sheffield, V.C., Cox, D.R., Lerman, L.S., and Myers, R.M. 1989. Attachment of a 40-base-pair G+C-rich sequence (GC-clamp) to genomic DNA fragments by the polymerase chain reaction results in improved detection of single-base changes. Proc. Natl. Acad. Sci. 86: 232-236. Sousa, R. and Padilla, R. 1995. A mutant T7 RNA polymerase as a DNA polymerase. EMBO J. 14: 4609-4621.[Medline]
Youil, R., Kemper, B.W., and Cotton, R.G. 1995. Screening for mutations by enzyme mismatch cleavage with T4 endonuclease VII. Proc. Natl. Acad. Sci. 92: 87-91. Zabeau, M. and Stanssens, P. 2000. Diagnostic sequencing by a combination of specific cleavage and mass spectrometry. Patent Application No. PCT/EP0003904.2.
Received June 25, 2003;
accepted in revised format October 31, 2003.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||