|
|
|
|
Genome Res. 14:414-425, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Methods Parallel Genotyping of Over 10,000 SNPs Using a One-Primer Assay on a High-Density Oligonucleotide Array1 Affymetrix, Inc., Santa Clara, California 95051, USA 2 Center for Inherited Disease Research (CIDR), Johns Hopkins University School of Medicine, Baltimore, Maryland 21224, USA 3 Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802, USA 4 Genetics and Molecular Biology Branch, National Human Genome Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
The analysis of single nucleotide polymorphisms (SNPs) is increasingly utilizedto investigate the genetic causes of complex human diseases. Here we present a high-throughput genotyping platform that uses a one-primer assay to genotype over 10,000 SNPs per individual on a single oligonucleotide array. This approach uses restriction digestion to fractionate the genome, followed by amplification of a specific fractionated subset of the genome. The resulting reduction in genome complexity enables allele-specific hybridization to the array. The selection of SNPs was primarily determined by computer-predicted lengths of restriction fragments containing the SNPs, andwas further driven by strict empirical measurements of accuracy, reproducibility, andaverage call rate, which we estimate to be >9.5%, >99.9%, and>95%, respectively. With average heterozygosity of 0.38 andgenome scan resolution of 0.31 cM, the SNP array is a viable alternative to panels of microsatellites (STRs). As a demonstration of the utility of the genotyping platform in whole-genome scans, we have replicated and refined a linkage region on chromosome 2p for chronic mucocutaneous candidiasis and thyroid disease, previously identified using a panel of microsatellite (STR) markers.
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation in the human genome (Brookes 1999
Almost all currently available SNP genotyping methods (for review, see Tsuchihashi and Dracopoli 2002
Here we present a robust high-throughput SNP genotyping platform that is based on the approach proposed by Dong et al. (2001
Complexity Reduction Assay Figure 1 is a schematic of the one-primer amplification assay that reduces the complexity of the genome, and enables allele-specific hybridization. The assay involves five primary steps, starting with restriction digestion, ligation of adaptor, amplification, fragmentation, and labeling, prior to hybridization to the oligonucleotide array (Fig. 1). The complexity reduction occurs at the PCR step which preferentially amplifies restriction fragments that are between 250 and 1000 bp. The sequence complexity of the PCR products is estimated to be 60 Mbases, which represents a 50-fold reduction in genome complexity. The adaptor sequence was selected to have no homology with known genome sequences. The one primer used in the PCR is the forward strand of the adaptor; thus only two oligonucleotides are necessary for genotyping over 10,000 SNPs. In contrast, alternative genotyping methods, such as single-base extension (SBE; Nikiforov et al. 1994
The choice of restriction enzyme determines the sequence content of the reduced fraction of the genome. The locations of restriction sites vary for each restriction enzyme, and sequence complexity is directly proportional to the frequency of restriction sites. We chose to use Xba I in the current implementation of the assay, and have also demonstrated the assay with Bgl II and EcoRI (Kennedy et al. 2003
Oligonucleotide Array Design and SNP Content
The relative allele signal (RAS) is a measure of the signal intensities contributed from the A allele probes compared to signals from both A and B allele probes. In the ideal case, RAS values range from 1 for AA homozygotes to 0 for BB homozygotes, with AB heterozygotes in between at 0.5. For each SNP, two median RAS values are calculated separately for the five forward and five reverse probe quartets. The two median RAS values define points for each of the individuals assayed (Fig. 2B). A genotype-calling algorithm, described by Liu et al. (2003 The 11,555 SNPs are the result of a selection process that progressively imposed stricter criteria to cull SNPs from the TSC repository that were the most compatible with the one-primer amplification assay and allele-specific hybridization. SNP selection was primarily determined by computer-predicted fragment lengths based on restriction sites immediately upstream and downstream of the SNP sites. The restriction fragment predictions were initially done on BAC sequence records from GenBank, and later on contig sequence records from the UCSC Golden Path. Restriction fragment length predictions that spanned known contig gaps or other sequence gaps (>30 N's), particularly in draft sequence records were omitted. RepeatMasker (A.F.A. Smit and P. Green, unpubl., http://ftp.genome.washington.edu/RM/RepeatMasker.html) was run on the sequences flanking the SNP sites to check for proximity to known repeat regions. SNPs located inside or within 30 bp of known repeat regions were omitted. A total of 55,605 candidate SNPs drawn from the January 2001 and September 2001 releases of the TSC database were predicted to be on Xba I fragments in the size range of 250 to 1000 bp. Four primary selection criteria were applied to these SNPs: (1) clustering into the three expected genotype groups in 133 ethnically diverse individuals, (2) Mendelian inheritance across 33 families, (3) reproducibility across as many as 12 replicates, and (4) SNP call rates across more than 300 experiments. Additional criteria, including Hardy-Weinberg distribution, uniqueness of map positions, and cross-hybridization predictions, were applied to define the final set of 11,555 SNPs. A detailed accounting of the SNP selection is described in the Supplemental material (SNP Selection).
Genotyping Accuracy
Concordance Analysis
The first concordance measure was a comparison with allele frequencies reported by the TSC. The TSC initiated the Allele Frequency Project to determine allele frequencies for The second concordance measure was a comparison with genotypes generated by a proprietary high-throughput SBE platform. We compared genotype calls for 538 SNPs out of 11,560 SNPs across 40 individuals, and found 98 discordances in 21,191 comparisons giving a concordance of 99.5% (Table 1). A tallying of the discordances by SNP revealed that five of the 538 SNPs accounted for a disproportionate 59 of the 98 discordances in 37 of the individuals. That five SNPs representing less than 1% of the sampling set contributed to over 60% of the discordances is an indication of nonrandom and systematic error in either the reference SBE calls or the array-based calls; therefore, the five SNPs were excluded from the set of 11,555 SNPs. The remaining 39 of the 98 discordant calls were scattered among 28 SNPs across 25 individuals, which is a pattern more consistent with random and nonsystematic errors. We attempted to resolve these discordances by comparing both SBE-based and array-based calls with genotypes determined by dideoxy sequencing. Interestingly, 66% of array calls that were discordant with SBE calls were found to be concordant with sequencing calls. Assuming that genotypes concordant with sequencing are correct, the concordance with the SBE genotypes is an underestimate of the actual genotyping accuracy.
The third concordance study is a comparison with a set of 60 SNPs genotyped by dideoxy sequencing in six individuals from the Human Variation Panel. To ensure equal representation of genotype calls in the comparison, the SNPs and individuals were chosen so that each SNP had two AA homozygotes, two AB heterozygotes, and two BB homozygotes. Sequencing-based genotypes were obtained for 341 of the 360 attempted calls. Of the 341 sequencing calls, there was one discordance with the array-based calls (Table 1). Based on trace data from three independent sequencing data sources, this one discordance with sequencing was due to an unexpected polymorphism immediately adjacent to the SNP site in one of the six individuals. Neither the TSC nor dbSNP had a record of a polymorphism at this adjacent position. The occurrence of unreported polymorphisms in close proximity to the interrogated SNP site can destabilize hybridization to probes for one or both alleles, and lead to erroneous genotype calls such as in this isolated instance.
Mendelian Inheritance Error Analysis
Reproducibility
Detection of Sample Contamination and Degradation A potential problem, particularly in high sample-throughput situations, is inadvertent mixing of DNA from different individuals. To assess the ability of the platform to identify cases of mixed samples, DNA from two individuals were combined in various amounts. The two individuals were from the Human Variation Panel, and reference genotypes based on SBE were available for both. Figure 3A shows that call rates, which are the percentage of SNPs assigned a genotype instead of a no call, decreased as the proportion of the second individual was increased, whereas the detection rate, a measure of the number of SNPs passing the signal versus noise discrimination filter, remained constant. As the two individuals were increasingly mixed together, the RAS values were gradually shifted in any SNP where the genotypes differed. For example, in SNPs that were homozygous in one individual but heterozygous in the second, the RAS points would gradually shift toward a midpoint between the two genotypes. The occurrence of no calls steadily increased as more RAS values shifted outside the call zones drawn around the median points (described above). Interestingly, the concordance with reference genotypes remained high whereas call rates fell much more rapidly, demonstrating that the genotyping algorithm gives priority to high accuracy over call rates by assigning no calls rather than muddled and incorrect genotypes.
A blinded set of 61 samples from the Center for Inherited Disease Research (CIDR) was genotyped, among which were mixtures containing contamination of a second individual. Blinded samples with detection rates >98% and usually low call rates of 87.3%, 83.2%, 79.4%, and 69.3% (Fig. 3B) were successfully detected as mixed samples of 20%, 40%, 50%, and 50%, respectively. Eight samples in the blinded set that previously had low and intermediate call rates with STR genotyping, all had >90% call rates by our SNP genotyping method (Fig. 3B). Another potential problem with starting samples is sheared or degraded DNA. Five degraded DNA samples, as judged by gel electrophoresis, were genotyped, but resulted in low call rates, which ranged from 82.8% to 86.7%. SNP call rates were plotted against the predicted lengths of Xba I restriction fragments containing the SNPs (Fig. 3C). For comparison, SNP call rates from 75 nondegraded DNA samples were plotted, showing that the SNP call rates in nondegraded samples were for the most part independent of the predicted amplicon length. In contrast, SNP call rates in the five degraded samples were lower across the size range, and as expected, particularly lower for SNPs predicted to be on longer amplicons.
Call Rates and Marker Heterozygosities
To discount the possibility of biases introduced from the ethnicities in the training data set, a second algorithm training set was constructed using 103 individuals from five other ethnic groups. Genotypes were determined based on this alternative training set, and the overall call rate across 307 individuals was 96.5%. To assess the accuracy of these new genotypes, calls were compared with SBE genotypes in 40 individuals. The concordance was 99.5%, where there were 105 discordances in 20,953 comparisons, which was essentially the same as the 99.5% concordance based on the original training set (Table 1). Similarly, the occurrence of inheritance errors in the five CEPH trios (described above) was fairly consistent between the two algorithm training sets, at 0.083% compared with 0.036% previously. Thus, call rates as well as measures of genotyping accuracy are not dependent on a particular algorithm training data set.
Heterozygosities of the 11,555 genotyped SNPs were calculated across the 13 ethnic groups (Table 3A). When all 307 individuals were aggregated together, the overall median and mean heterozygosity values were 0.41 and 0.38, respectively. For comparison, commonly used panels of
Genome-Wide Coverage Of the 11,555 genotyped SNPs, 11,384 (98.5%) are currently mapped to unique positions in the UCSC Golden Path (release hg13, November 2002). The remaining unmapped SNPs are on sequence records that allowed restriction fragment predictions, but could not be assigned to unique positions in this build of the Golden Path. To visualize the genome-wide coverage of the genotyped SNPs, physical maps of the chromosomes were plotted with red vertical bars representing the presence of at least one SNP in 100-kb regions, and black vertical bars representing large contig gaps that are 100,000 N's or longer (Fig. 4A). Large contig gaps are gaps between map contigs, and also represent large regions of heterochromatin, including centromeres and telomeres. The SNPs are well distributed across the genome, but coverage is not absolutely uniform, with some regions containing fewer markers than other regions. The distribution of the genotyped SNPs is determined by the occurrence of Xba I sites in the genome, which is essentially random, but certain regions have fewer sites. Also, regions of the genome heavily represented by BAC and contig records in draft stages containing many clone gaps will have given fewer predictions of restriction fragments and contributed proportionately fewer candidate SNPs from which the 11,555 were selected. SNPs with physical map positions were assigned genetic distances by interpolating against a high-resolution genetic map based on 5136 STR markers, made available by deCODE (Kong et al. 2002
Inter-SNP distances provide an estimate of SNP coverage across the genome and are a useful measure of marker utility. Physical distances between the 11,384 mapped SNPs were calculated with and without accounting for large contig gaps. There were 927 contig gaps of size 10,000 N's or longer which all together totaled over 209 Mb (roughly 9% of the genome). In addition, 568 of the 11,384 SNPs had at least one contig gap in between. The median and mean inter-SNP distances were 104.0 kb and 209.8 kb, respectively, when the large contig gaps (10,000 N's or longer) were excluded from the genome. The longest inter-SNP distance was a 4-Mbase stretch in chromosome 7. We found that 49% of the genotyped SNPs are less than 100 kb apart, and 97% of the SNPs are less than 1 Mb apart (Suppl. Table S-4B). The median and mean when including the contig gaps were artificially higher at 116.2 kb and 254.1 kb, respectively (Suppl. Table S-4A), and the longest inter-SNP distance was the 24-Mbase centromere in chromosome 1. Inter-SNP genetic distances were estimated based on interpolated genetic distances (Suppl. Table S-4A). The median inter-SNP distance was 0.10 cM, and the mean was 0.31 cM. The longest distance was a 9.98-cM span in chromosome 19. Fifty percent of the genotyped SNPs had distances less than 0.1 cM, and over 92% had distances less than 1 cM (Suppl. Table S-4B). Because of cases where multiple SNPs were located between pairs of unresolved STRs, the interpolated genetic distances of these genotyped SNPs were not unique, and account for the 684 SNPs, or 6%, with zero inter-SNP distances. For comparison, panels of 400 STRs that are commonly used for genome-wide scans in linkage analysis have average intermarker distances of 10 cM (Dubovsky et al. 1995
Replication of Linkage at a Disease Locus
We have shown that the complexity reduction and parallel genotyping platform is a highly accurate method for high-throughput genome-wide SNP genotyping. Based on concordance measures with current genotyping methods, and analysis of inheritance, the genotyping accuracy is conservatively estimated to be >99.5%. The reproducibility in as many as nine replicate experiments was 99.99%. The assay procedures have been rigorously optimized to achieve robustness.
As demonstrated by the replication of linkage of chronic mucocutaneous candidiasis and/or thyroid disease on chromosome 2p, the set of 11,555 SNPs genotyping on the array presents a highly attractive alternative to panels of STR markers for whole-genome scans. The average heterozygosity of the 11,555 SNPs in 307 individuals from 13 ethnic groups was 0.38. The call rate across the 307 individuals, representing DNA isolated by a variety of methods, was 95.9%. The genotyped SNPs are spaced on average every 210 kb across the genome, and based on interpolated genetic distances are spaced every 0.31 cM. The average inter-SNP distance of 0.31 cM suggests that the genome scan resolution of the 11,555 SNPs on the array may be as much as 30-fold higher than currently used panels of In study designs that involve isolated populations, a proportion of the 11,555 SNPs are likely to be noninformative. The utility of any given SNP is highly dependent on the ethnic context of the individuals that are genotyped. Moreover, very rare polymorphisms and mutations may disrupt the ability to genotype particular SNPs in certain individuals. The occurrence of an unexpected polymorphism immediately adjacent to an SNP site resulted in an erroneous genotype call. Similarly, rare polymorphisms or mutations that disrupt Xba I restriction sites can result in no calls or possibly incorrect genotype calls.
The genotype-calling algorithm prioritizes accuracy over call rates. The current implementation of call zones drawn around median points assumes that the scatter of RAS points about a median point is (1) circular and normally distributed, and (2) equal for all three genotypes. Neither of these simplifying assumptions, however, is true. There are cases where there is orientation-specific hybridization asymmetry that appears as RAS points scattered in one axis but not the other; and, there are instances of allele-specific hybridization asymmetry that appears as RAS points scattered for heterozygotes and one of the homozygotes, but not for the opposite homozygotes. The noncircular and unequal distributions of RAS points observed in the training data, however, are lost when the clustering process aggregates over 100 points down to three median points. Model-based algorithms that retain the RAS point distributions could capture more genotype calls from RAS points scattered outside call zones, while correctly filtering out spurious RAS points as no calls. However, such model-based algorithms have been difficult to generalize across all SNPs, particularly in low-heterozygosity SNPs where there are very few minor allele homozygotes from which to construct meaningful distribution models. Improvements to the genotype-calling algorithm and alternative approaches, exemplified by Cutler et al. (2001
The scalability of the genotyping platform is driven by two underlying trends: (1) continuous SNP discovery, and (2) increasing density of oligonucleotide arrays. The November 2002 release of the TSC database contains over 1.8 million SNPs, which are a subset of the over 4 million human reference SNP (RS) cluster records contained in the current build of the dbSNP (build 114). In the near future, these public SNP repositories combined with large private repositories will undoubtedly contain a complete catalog of SNPs in the genome. To access greater numbers of SNPs, the complexity reduction assay can be run in parallel on different fractions of the genome defined by more than one restriction enzyme. The parallel use of multiple restriction enzymes should also result in a more uniform distribution of SNPs across the genome by compensating for the scarcity of particular restriction sites in certain regions. Multiple genome fractions coupled with very-high-density oligonucleotide arrays containing several million probes will enable the parallel genotyping of hundreds of thousands of SNPs. Ultimately, scaling to upwards of half a million SNPs should enable whole-genome case-control association studies that may help identify the causative genes and mechanisms at work in complex diseases (for review, see Jorde 2000 In conclusion, we have developed a genotyping platform that represents a new approach to genome-wide SNP genotyping. The platform extracts information value from publicly available data, in the form of SNP content provided by the TSC and sequences from Human Genome Project, by combining a simple complexity reduction assay with the enormous capacity and allele-specific sensitivity of high-density oligonucleotide arrays. The high levels of throughput, genotyping accuracy, marker heterozygosity, and genome-wide coverage each contribute to the functionality of the genotyping platform to greatly broaden the scope of previous studies, and accelerate advancements across a range of applications, starting with linkage analysis, LOH, population genetics, and ultimately whole-genome association studies.
Preparation of Reduced Complexity Samples To increase sample throughputs, procedures were carried out in 96-well plates. For each individual assayed, 250 ng of genomic DNA was digested with 10 U of Xba I (New England BioLabs) in a volume of 15 µL for 2 h at 37°C. Following heat inactivation at 70°C for 20 min, 0.25 µM of adaptor (5'phosphate-CTAGAGATCAGGCGTCTGTCGTGCTCATAA-3', and 5'-ATTATGAGCACGACAGACGCCTGATCT-3' synthesized by QIAGEN) was ligated to the digested DNA with T4 DNA Ligase (New England BioLabs) in 25 µL for 2 h at 16°C. The ligation was stopped by heating to 70°C for 20 min, and then diluted fourfold with water. For each sample, four PCRs were run using 10 µL of the diluted ligation reaction (25 ng of starting DNA) in 100 µL volumes containing 0.75 µM of primer (5'phosphate-CTAGAGATCAGGCGTCTGTCGTGCTCATAA-3'), 0.25 mM dNTPs, 2.5 mM MgCl2, 10 U AmpliTaq Gold (Applied Biosystems), and PCR Buffer (Applied Biosystems). Thirty-five cycles of PCRs were done in either MJ DNA Engine Tetrad (MJ Research) or GeneAmp PCR System 9700 (Applied Biosystems) cyclers. The cycling program in the MJ Tetrads was 95°C denaturation for 20 sec, 59°C annealing for 15 sec, and 72°C extension for 15 sec. The denaturation, annealing, and extension times were each increased to 30 sec when using the GeneAmp cycler. As a check, 3 µL of PCR products were visualized on 2% TBE agarose gels to confirm the size range of amplicons. PCR products from the four reactions were combined and purified over MinElute 96 UF PCR Purification plates (QIAGEN). PCR amplicons from the four 100 µL reactions were recovered in 40 µL of EB buffer (QIAGEN). PCR yields, based on absorbance readings at 260 nm, were typically 30 µg. To allow efficient hybridization to the 25-mer oligonucleotides on the array, PCR amplicons were fragmented with DNAse I (Amersham Biosciences). Here, 0.24 U of DNAse I was added to 20 µg of purified PCR amplicons in a 55 µL volume containing 50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, and 1 mM dithiothreitol for 30 min at 37°C, followed by heat inactivation at 95°C for 15 min. Fragmentation products were visualized on 4% TBE agarose gels. The 3' ends of the fragmented amplicons were biotinlyated by adding 143 µM of a proprietary DNA labeling reagent (Affymetrix) using Terminal Deoxynucleotidyl Transferase (Promega) in a 70 µL volume containing 100 mM cacodylic acid (pH 6.8), 0.1 mM dithiothreitol, and 1 mM CoCl2 for 2 h at 37°C, followed by heat inactivation at 95°C for 15 min.
Genotyping by Allele-Specific Hybridization
Sources of Samples and Reference Genotypes TSC allele frequency data from the TSC Allele Frequency Project were downloaded from the FTP site: ftp://snp.cshl.org/pub/SNP/frequency/. Allele frequency contributors included the Whitehead Institute, Sanger Center, Washington University, Orchid Biosciences, Celera, and Motorola. For SNPs that had frequencies reported by more than one contributor, the frequency value based on the higher number of individuals was used in the allele frequency comparison. Genotypes based on single base extension (SBE) were obtained from one of the allele frequency contributors. Dideoxy sequencing was performed by Qiagen Genomics, SeqWright, and Lark Technologies.
Restriction Fragment Predictions and SNP Mapping
Linkage Analysis
We thank Kimberly F. Doheny, Elizabeth W. Pugh, and Paul Boyce for facilitating our linkage study and providing critical comments; Richard Chiles, Fred Christians, Carsten Rosenow, Teresa Webster and David Kulp for helpful discussions, and Sarah A. Tishkoff, Jonathan S. Friedlaender, Theodore G. Schurr, and W. Scott Watkins for contributing valuable DNA samples. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2014904.
5 Corresponding author. [Supplemental material is available online at www.genome.org. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: S.A. Tishkoff, J.S. Friedlaender, T.G. Schurr, and W.S. Watkins.]
Abecasis, G.R., Cherny, S.S., Cookson, W.O., and Cardon, L.R. 2002. MerlinRapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30: 97101.[CrossRef][Medline] Altshuler, D., Pollara, V.J., Cowles, C.R., Van Etten, W.J., Baldwin, J., Linton, L., and Lander, E.S. 2000. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407: 513516.[CrossRef][Medline] Atkinson, T.P., Schaffer, A.A., Grimbacher, B., Schroeder Jr., H.W., Woellner, C., Zerbe, C.S., and Puck, J.M. 2001. An immune defect causing dominant chronic mucocutaneous candidiasis and thyroid disease maps to chromosome 2p in a single family. Am. J. Hum. Genet. 69: 791803.[CrossRef][Medline] Brookes, A.J. 1999. The essence of SNPs. Gene 234: 177186.[CrossRef][Medline] Carrasquillo, M.M., McCallion, A.S., Puffenberger, E.G., Kashuk, C.S., Nouri, N., and Chakravarti, A. 2002. Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease. Nat. Genet. 32: 237244.[CrossRef][Medline]
Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M.S., and Fodor, S.P. 1996. Accessing genetic information with high-density DNA arrays. Science 274: 610614.
Cutler, D.J., Zwick, M.E., Carrasquillo, M.M., Yohn, C.T., Tobin, K.P., Kashuk, C., Mathews, D.J., Shah, N.A., Eichler, E.E., Warrington, J.A., et al. 2001. High-throughput variation detection and genotyping using microarrays. Genome Res. 11: 19131925.
Dong, S., Wang, E., Hsie, L., Cao, Y., Chen, X., and Gingeras, T.R. 2001. Flexible use of high-density oligonucleotide arrays for single-nucleotide polymorphism discovery and validation. Genome Res. 11: 14181424.
Dubovsky, J., Sheffield, V.C., Duyk, G.M., and Weber, J.L. 1995. Sets of short tandem repeat polymorphisms for efficient linkage screening of the human genome. Hum. Mol. Genet. 4: 449452.
Fan, J.B., Chen, X., Halushka, M.K., Berno, A., Huang, X., Ryder, T., Lipshutz, R.J., Lockhart, D.J., and Chakravarti, A. 2000. Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. Genome Res. 10: 853860.
Fodor, S.P., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A.T., and Solas, D. 1991. Light-directed, spatially addressable parallel chemical synthesis. Science 251: 767773.
Grant, S.F., Steinlicht, S., Nentwich, U., Kern, R., Burwinkel, B., and Tolle, R. 2002. SNP genotyping on a genome-wide amplified DOP-PCR template. Nucleic Acids Res. 30: e125.
Hall, J.G., Eis, P.S., Law, S.M., Reynaldo, L.P., Prudent, J.R., Marshall, D.J., Allawi, H.T., Mast, A.L., Dahlberg, J.E., Kwiatkowski, R.W., et al. 2000. Sensitive detection of DNA polymorphisms by the serial invasive signal amplification reaction. Proc. Natl. Acad. Sci. 97: 82728277.
Jordan, B., Charest, A., Dowd, J.F., Blumenstiel, J.P., Yeh Rf, R.F., Osman, A., Housman, D.E., and Landers, J.E. 2002. Genome complexity reduction for SNP genotyping analysis. Proc. Natl. Acad. Sci. 99: 29422947.
Jorde, L.B. 2000. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10: 14351444. Kennedy, G.C., Matsuzaki, H., Dong, S., Liu, W.M., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., et al. 2003. Large-scale genotyping of complex DNA. Nat. Biotechnol. 21: 12331237.[CrossRef][Medline]
Kent, W.J. 2002. BLATThe BLAST-like alignment tool. Genome Res. 12: 656664. Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31: 241247.[CrossRef][Medline] Kruglyak, L. and Nickerson, D.A. 2001. Variation is the spice of life. Nat. Genet. 27: 234236.[CrossRef][Medline] Kruglyak, L., Daly, M.J., Reeve-Daly, M.P., and Lander, E.S. 1996. Parametric and nonparametric linkage analysis: A unified multipoint approach. Am. J. Hum. Genet. 58: 13471363.[Medline] Kwok, P.Y. 2001. Methods for genotyping single nucleotide polymorphisms. Annu. Rev. Genomics Hum. Genet. 2: 235258.[CrossRef][Medline]
Liu, W.-m., Di, X., Yang, G., Matsuzaki, H., Huang, J., Mei, R., Ryder, T.B., Webster, T.A., Dong, S., Liu, G., et al. 2003. Algorithms for large scale genotyping microarrays. Bioinformatics 19: 23972403.
Miller, R.D. and Kwok, P.Y. 2001. The birth and death of human single-nucleotide polymorphisms: New experimental evidence and implications for human history and medicine. Hum. Mol. Genet. 10: 21952198. Mullikin, J.C., Hunt, S.E., Cole, C.G., Mortimore, B.J., Rice, C.M., Burton, J., Matthews, L.H., Pavitt, R., Plumb, R.W., Sims, S.K., et al. 2000. An SNP map of human chromosome 22. Nature 407: 516520.[CrossRef][Medline]
Nikiforov, T.T., Rendle, R.B., Goelet, P., Rogers, Y.H., Kotewicz, M.L., Anderson, S., Trainor, G.L., and Knapp, M.R. 1994. Genetic Bit Analysis: A solid phase method for typing single nucleotide polymorphisms. Nucleic Acids Res. 22: 41674175. O'Connell, J.R. and Weeks, D.E. 1998. PedCheck: A program for identification of genotype incompatibilities in linkage analysis. Am. J. Hum. Genet. 63: 259266.[CrossRef][Medline]
Thorisson, G.A. and Stein, L.D. 2003. The SNP Consortium website: Past, present and future. Nucleic Acids Res. 31: 124127. Tsuchihashi, Z. and Dracopoli, N.C. 2002. Progress in high throughput SNP genotyping methods. Pharmacogenomics J. 2: 103110.[CrossRef][Medline]
Wang, D.G., Fan, J.B., Siao, C.J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., et al. 1998. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280: 10771082. Xiong, M. and Jin, L. 1999. Comparison of the power and accuracy of biallelic and microsatellite markers in population-based gene-mapping methods. Am. J. Hum. Genet. 64: 629640.[CrossRef][Medline]
http://snp.cshl.org/; SNP Consortium (TSC) home page. http://www.ncbi.nlm.nih.gov/SNP/; dbSNP (NCBI) home page. http://ftp.genome.washington.edu/RM/RepeatMasker.html; RepeatMasker documentation. http://www.ncbi.nlm.nih.gov/Genbank/index.html; GenBank (NCBI) home page. http://genome.ucsc.edu/; UCSC Genome Bioinformatics home page. http://bioperl.org/; BioPerl home page.
http://www.nature.com/ng/journal/v31/n3/suppinfo/ng917_S1.html; Web supplement to Kong et al. (2002
Received October 6, 2003;
accepted in revised format January 6, 2004.
This article has been cited by other articles:
|