|
|
|
|
Published online before print
September 13, 2004, 10.1101/gr.2730004 Genome Res. 14:1821-1831, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Pattern of Sequence Variation Across 213 Environmental Response Genes1 Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA 2 Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112-5330, USA 3 Division of Pediatric Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio 45229 USA
To promote the clinical and epidemiological studies that improve our understanding of human genetic susceptibility to environmental exposure, the Environmental Genome Project (EGP) has scanned 213 environmental response genes involved in DNA repair, cell cycle regulation, apoptosis, and metabolism for single nucleotide polymorphisms (SNPs). Many of these genes have been implicated by loss-of-function mutations associated with severe diseases attributable to decreased protection of genomic integrity. Therefore, the hypothesis for these studies is that individuals with functionally significant polymorphisms within these genes may be particularly susceptible to genotoxic environmental agents. On average, 20.4 kb of baseline genomic sequence or 86% of each gene, including a substantial amount of introns, all exons, and 1.3 kb upstream and downstream, were scanned for variations in the 90 samples of the Polymorphism Discovery Resource panel. The average nucleotide diversity across the 4.2 MB of these 213 genes is 6.7 x 10-4, or one SNP every 1500 bp, when two random chromosomes are compared. The average candidate environmental response gene contains 26 PHASE inferred haplotypes, 34 common SNPs, 6.2 coding SNPs (cSNPs), and 2.5 nonsynonymous cSNPs. SIFT and Polyphen analysis of 541 nonsynonymous cSNPs identified 57 potentially deleterious SNPs. An additional eight polymorphisms predict altered protein translation. Because these genes represent 1% of all known human genes, extrapolation from these data predicts the total genomic set of cSNPs, nonsynonymous cSNPs, and potentially deleterious nonsynonymous cSNPs. The implications for the use of these data in direct and indirect association studies of environmentally induced diseases are discussed.
The link between environmental agents and disease risk has been recognized for more than a century with the discovery of the link between coal soot and scrotal cancer in young chimney sweeps (Doll 1975
The initial efforts of the EGP have focused on the discovery and annotation of single nucleotide polymorphisms (SNPs), the most common form of human genetic variation, in candidate environmental disease genes and the development of databases integrating sequence polymorphism data into individually annotated human and mouse gene models (GeneSNPs, http://www.genome.utah.edu/genesnps; PolyDom, http://polydoms.cchmc.org; Trafac, http://genometrafac.cchmc.org). These efforts began with 213 candidate environmental susceptibility genes from a list of 550 candidates submitted based on their involvement in processes influenced by environmental exposure. Broadly grouped into pathways related to cell cycle, cell signaling, cell structure, DNA repair, gene expression, and metabolism, many of these genes have been implicated by loss-of-function mutations associated with severe diseases attributable to decreased protection of genomic integrity. Thus, the hypothesis for these studies is that individuals with functionally significant polymorphisms within these genes may be particularly susceptible to genotoxic environmental agents. For example, the tumor suppressor genes RB1, ATM, and MLH1 are associated with familial cancers and may also harbor minor risk alleles that may conspire with exposure to environmental agents to diminish the fidelity of DNA repair and increase the lifetime risk of developing cancer (Alonso et al. 2001
The first phase of the EGP is now complete, and herein we report the discovery of 23,443 SNPs in 213 candidate environmental response genes by systematically resequencing DNA samples from 90 individuals of the polymorphism discovery resource (PDR) panel. The PDR panel is a representative panel of individuals drawn from the United States population, including Americans of European, African, Mexican, and Asian descent and Native Americans. The explicit objective of the panel is to facilitate detection of polymorphic sites that occur in any one of the represented populations (Collins et al. 1998 The results of this analysis, cataloged in the GeneSNPs and dbSNP databases, suggest the presence of a significant number of polymorphisms that may confer sensitivity to environmental agents, and are stimulating ongoing efforts to (1) develop mouse models of potentially functional polymorphisms (Comparative Mouse Genomics Centers Consortium); (2) explore the common variant-common disease hypothesis via molecular epidemiology studies of environmentally induced diseases; (3) address the ethical, legal, and social implications (ELSIs) of the genetics of environmental disease susceptibility; and (4) improve strategies for the discovery of genetic variations responsible for the elevated sensitivity to environmental agents.
Candidate Genes Comprehensive polymorphism discovery was performed by resequencing 213 candidate environmental response genes involved in DNA repair (70), apoptosis (41), cell cycle control (62), and drug metabolism (40). These genes are distributed across all the human chromosomes except for the Y chromosome, and altogether represent slightly <1% of all known human genes (Ewing and Green 2000 86% of the genomic sequence for each gene was scanned for variation across 90 DNA samples from the PDR (Collins et al. 1998 5%. Of the 25% of our SNPs previously reported, only 1763 variations (29%) included allele frequency estimates.
Sequence Diversity in Candidate Genes The average nucleotide diversity was classified by gene structure in the 3' flanking region (7.0 x 10-4, ±11.7 x 10-4), intronic sequence (6.9 x 10-4, ±4.2 x 10-4), and 5' UTR (6.9 x 10-4, ±18.0 x 10-4), and was similar to the 213-gene-wide average of 6.7 x 10-4. However, nucleotide diversity in coding regions was half that of noncoding regions, or 3.5 x 10-4 (Fig. 1A; Supplemental Table 2). Interestingly, in noncoding regions conserved in the mouse, rat, and dog, the average nucleotide diversity was intermediate (5.2 x 10-4, or one SNP every 1928 bp between any two chromosomes) between that of the coding and 5' flanking sequence and that of the intronic sequence (Fig. 1B; Supplemental Table 3). To show that this result was not due to the difference in target size for which nucleotide diversity was being determined, we also performed a random sampling of the introns by sampling the same amount of nucleotides identified as conserved noncoding sequence (Fig. 1B).
A View of Sequence Diversity in the Average Candidate Gene On average, 20.4 kb of reference baseline sequence was scanned for each candidate gene. The cell cycle gene E2F transcription factor 2 (E2F2) is representative of the average gene structure, sequence diversity, and size. A representation of the polymorphism distribution and gene structure of E2F2 is shown in Figure 2, and each of the candidate genes examined by the EGP is available in a similar format via the GeneSNPs database (http://genome.utah.edu/genesnps). E2F2 is coded by seven exons distributed across 21.3 kb. One hundred twelve single nucleotide substitutions and six small insertion/deletion polymorphisms were identified in this gene and are depicted by position in the gene by vertical descending bars with length that is proportional to the MAF in the study population. However, because of the masked population stratification of the PDR panel, allele frequencies within the constituent ethnic subpopulations may vary from the frequencies of the whole panel. The nucleotide diversity across E2F2 is 6.9 x 10-4, or one SNP every 1.4 kb between two random chromosomes. The number of common polymorphisms is similar to the 213-gene average, with 41% of the total (46 of 112 SNPs) having a MAF >5% in the PDR 90 panel. Also typical of the average gene, E2F2 has four cSNPs, with two nonsynonymous cSNPs indicated by the red vertical bars in Figure 2.
Site Selection for Functional and Association Studies In this study, the average candidate gene contains 34 common SNPs. Because functional analysis via animal models or genotype-phenotype studies is costly, reducing the number of sites for further analysis (from an average of 34) is a major consideration in designing effective association studies. Two complementary approaches have been proposed to identify phenotypically important SNPs. Using the direct interrogation approach focuses on testing the nonsynonymous (potentially functional) variations in coding sequence for specific phenotypes (Collins et al. 1997 10% of total nonsynonymous SNPs) were identified as potentially deleterious by both of these approaches (Table 1). For a subset of these variants (n = 36), we were able to identify a functional domain associated with the polymorphism by using annotation from the Human Gene Mutation Database. Notably, we predict intolerant cSNPs in 31 genes with no entry in the Human Gene Mutation Database (Table 1). Seven of these predicted intolerant cSNPS have allele frequencies >5% in the PDR, four of which are not listed in the Human Gene Mutation Database and are discussed below.
Of the 57 cSNPs predicted to be intolerant, seven are reported to be associated with a known phenotype (three of which intersect with the set of seven common predicted intolerant SNPs mentioned above). BRCA1 Q356R (MAF = 1%) is implicated in the breast cancers of a mother and two daughters of a Swiss family (Schoumacher et al. 2001
To gauge the potential for functional consequences of the remaining 50 variants identified as "intolerant," by both SIFT and Polyphen, we determined the number of the SIFT "intolerant" classifications for 32 candidate genes with known disease mutations. For these 32 genes, we queried all known mutations (n = 545) from the Human Gene Mutation Database (Stenson et al. 2003 We also identified eight variations predicted to truncate or alter protein translation (Table 2). SMUG1, HGF, RAD23A, and ERCC4 had nonsense SNPs that predict truncation at positions 136, 1156, 140, and 2169, respectively, in the polypeptides. RAG1, MSH6, and MGST2 had insertion/deletion polymorphisms that predict an altered reading frame and premature termination of translation. We identified a one-base insertion/deletion in codon 461 of RAG1. MSH6 has a 4-bp insertion/deletion in codon 4159 that predicts a nonsynonymous K4159D substitution and truncation of the last two amino acids. MGST2 has a one-base insertion/deletion in codon 352. GTF2H3 contains a SNP that abolishes the start codon. With the exception of SMUG1, all of these polymorphisms were found in the heterozygous state in a single individual for an allele frequency of <1% in the PDR90. The SMUG1 nonsense cSNP was observed in two heterozygotes in the PDR 90 panel (MAF = 1%).
Many approaches are emerging to identify functional sites in noncoding regions. Trafac (Transcription Factor binding site Comparison) is a Web-accessible tool for identifying transcription regulatory regions by using a comparative sequence analysis approach (Jegga et al. 2002
Site Selection for Indirect Association Studies Indirect association studies rely on linkage disequilibrium between genetic markers to measure the association between the SNP genotyped, as well as the SNPs in linkage disequilibrium with the assayed site and the disease phenotype (Collins et al. 1997
Therefore, to provide insight into the process of site selection to facilitate indirect association studies for these 213 candidate environmental response genes, we examined the extent of correlation between nucleotide diversity with two metrics of genomic variation: common SNPs (MAF > 5%; Fig. 4A) and haplotypes (Fig. 4B). Intragenic nucleotide diversity was modestly correlated (r2 = 0.44, P < 0.001) to the frequency of common SNPs (Fig. 4A). This was not unexpected because this measure of nucleotide diversity is sensitive to allele frequency. However, only 29% of the variability in the number of haplotypes per gene is associated with variability in nucleotide diversity per gene (r2 = 0.29; P < 0.001; Fig. 4B). Indeed, considerable gene-to-gene variation was observed in the number of haplotypes per gene, ranging from a low of three (FEN1) to a high of 102 (CCND2). Overall, the mean number of haplotypes per gene for the EGP was 26, which is lower than previous estimates from a set of genes related to inflammation, blood pressure regulation, and lipid metabolism (Crawford et al. 2004b
The gene-to-gene variability observed by the EGP in nucleotide diversity is also evident by site correlations or LD. Figure 5 illustrates the extremes observed in LD, as measured by the metric r2, across the genes involved in environmental responses. For genes with average or high LD, such as BNIP1 (Fig. 5A) and BRCA1 (Fig. 5C), respectively, few sites are required for genotyping in association studies. However, for genes with very weak LD, such as CCND2 (Fig. 5B), many more sites will be required for a genetic association study because very few sites within this gene are correlated. It is important to note that the extent of LD across a gene is independent of gene size. For example, LD extends across the 85-kb BRCA1, whereas fewer correlated sites are present in the smaller CCND2. For these genes with weak LD, attempts to choose sites with either LD-based (Carlson et al. 2004
This study presents one of the most comprehensive sets of genebased SNPs assembled, including both coding and noncoding SNPs, and provides an important view of the structure of sequence diversity across the human genome. Our analysis of 1% of the potential candidate genes located in the human genome reveals the range of gene-to-gene variation in overall nucleotide diversity, linkage disequilibrium, and number of haplotypes. As previously described for coding regions, there is wide-ranging gene-to-gene variation in coding region SNPs (Cargill et al. 1999
By extrapolating our findings from 213 candidate genes to the human genome, containing an estimated 24,000 to 35,000 total genes (Ewing and Green 2000
By using SIFT and Polyphen to score potential functionally intolerant nonsynonymous cSNPs, we identified 57 SNPs predicted to alter protein function. Combined with the eight variations predicting altered polypeptide translation (four nonsense SNPs, three frameshifts, and one abolished start codon), the extrapolation of these observations predicts 7300 to 18,500 (assuming there are 24,000 to 35,000 genes) potentially deleterious SNPs in all human genes (with MAF > 1%). Of these 65 potential intolerant polymorphisms, only seven had a MAF >5%, suggesting a genic set of 790 to 1150 common deleterious SNPs. Although this estimate could potentially reflect relatively high conservation from a functional bias of these 213 genes, the gene-to-gene variation we observe in different measures of sequence diversity is consistent with previous observations of sets of genes encoding proteins involved in inflammation, lipid metabolism, and endocrine function (Cargill et al. 1999
Informing Association Studies To be explored further are the variants in the regulatory regions and conserved noncoding regions of these genes. Our observations of a general trend of lower nucleotide diversity in conserved noncoding regions identified by cross-species comparisons suggest these regions may be undergoing selection in human populations. Further refinement of these regions will become practical as more mammalian genomes are sequenced. The 767 variants in conserved noncoding regions have the potential to dys-regulate expression levels and alter target-cell specificities of their respective genes and contribute to the development or progression of environmental diseases (see Supplemental Table 3). Examination of these polymorphisms in the context of gene feature views such as the "regulograms" provided by tools like Trafac, as shown in Figure 3, will facilitate these studies by identifying candidate SNPs in consensus transcription factor binding site regions.
Our limited ability to a priori predict SNPs with functional consequences has led to the development of large-scale projects to discover and type common variation to identify regions with high and low linkage disequilibrium and haplotypes (The International HapMap Consortium 2003
Future of the EGP
Candidate Environmental Response Genes The targeted candidate genes for the EGP encode well-characterized groups of interacting proteins involved in pathways for DNA repair, cell cycle control, drug metabolism, and apoptosis. The candidate genes were selected by soliciting recommendations from investigators studying toxicogenomics and environmental susceptibility (Olden and Wilson 2000
SNP Discovery
The study population consisted of 90 DNA samples obtained from the PDR (Collins et al. 1998 Reaction products were air dried and diluted to 30 µL with ddH2O. Chromatograms were generated from reaction products on Applied Biosystems ABI 3700 or ABI 3730 capillary sequencers. Data flow was tracked by using a custom-designed LIMS system.
All chromatograms were base-called by using Phred, assembled into contigs by using Phrap, and scanned for SNPs with Polyphred, version 4.1 (Nickerson et al. 1997
Identification of Potential Functional Variants
Identification of Conserved Noncoding Sequence
Identification of Putative cis-Regulatory Regions
Haplotype Analysis
We acknowledge the talented technical and analytical staff that produced and reviewed these data: Moon Wook-Chung, Monica A. Montoya, Christine Vo, Laura A. Witrak, Annie M. Sherwood, Mike Daniels, Amy N. Olson, Brent J. Leithauser, Tasha K. Downing, Becky Borrayo, J. Tucker Jackson, Christopher Baier, R. Luke Daniels, Christa L. Poel, J. Kristofer Sherwood, Dan Nguyen, Wendy S. Schackwitz, Peggy D. Robertson, Sally W. Chambers, Ann C. Braun, Katy E. Miyamoto, Jonathan Alder, Diane Dunn, M. Hadi Islam, and Allan Tingey. We thank Drs. Samuel Wilson and Elizabeth Maull for their critical review of this manuscript and Eric Torskey for his assistance in its preparation. This work supported by National Institute of Environmental Health Sciences grants N01-ES-15478 (D.A.N.), N01-ES-35501 (R.B.W.), and U01-ES-11038 (B.J.A.).
4 Corresponding author. E-MAIL debnick{at}u.washington.edu; FAX (206) 221-6498. Supplemental material is available online at www.genome.org. All sequence data from this study have been submitted to GenBank and are available from our Web site at http://egp.gs.washington.edu and at other sites listed herein. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2730004. Article published online ahead of print in September 2004.
Alonso, J., Garcia-Miguel, P., Abelairas, J., Mendiola, M., Sarret, E., Vendrell, M.T., Navajas, A., and Pestana, A. 2001. Spectrum of germline RB1 gene mutations in Spanish retinoblastoma patients: Phenotypic and molecular epidemiological implications. Hum. Mutat. 17: 412-422.[CrossRef][Medline]
Altshuler, D., Hirschhorn, J.N., Klannemark, M., Lindgren, C.M., Vohl, M.C., Nemesh, J., Lane, C.R., Schaffner, S.F., Bolk, S., Brewer, C., et al. 2000. The common PPAR
Aoufouchi, S., Flatter, E., Dahan, A., Faili, A., Bertocci, B., Storck, S., Delbos, F., Cocea, L., Gupta, N., Weill, J.C., et al. 2000. Two novel human and mouse DNA polymerases of the polX family. Nucleic Acids Res. 28: 3684-3693. Aynacioglu, A.S., Brockmoller, J., Bauer, S., Sachse, C., Guzelbey, P., Ongen, Z., Nacak, M., and Roots, I. 1999. Frequency of cytochrome P450 CYP2C9 variants in a Turkish population and functional relevance for phenytoin. Br. J. Clin. Pharmacol. 48: 409-415.[CrossRef][Medline] Barth, M.L., Fensom, A., and Harris, A. 1995. Identification of seven novel mutations associated with metachromatic leukodystrophy. Hum. Mutat. 6: 170-176.[CrossRef][Medline]
Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. 2004. The Pfam protein families database. Nucleic Acids Res. 32: D138-D141.
Bell, D.A., Taylor, J.A., Butler, M.A., Stephens, E.A., Wiest, J., Brubaker, L.H., Kadlubar, F.F., and Lucier, G.R. 1993. Genotype/phenotype discordance for human arylamine N-acetyltransferase (NAT2) reveals a new slow-acetylator allele common in African-Americans. Carcinogenesis 14: 1689-1692.
Blackburn, A.C., Coggan, M., Tzeng, H.F., Lantum, H., Polekhina, G., Parker, M.W., Anders, M.W., and Board, P.G. 2001. GSTZ1d: A new allele of glutathione transferase Botstein, D. and Risch, N. 2003. Discovering genotypes underlying human phenotypes: Past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33(Suppl): 228-237. Buchholz, T.A., Weil, M.M., Ashorn, C.L., Strom, E.A., Sigurdson, A., Bondy, M., Chakraborty, R., Cox, J.D., McNeese, M.D., and Story, M.D. 2004. A Ser49Cys variant in the ataxia telangiectasia, mutated, gene that is more common in patients with breast carcinoma compared with population controls. Cancer 100: 1345-1351.[CrossRef][Medline] Cargill, M., Altshuler, D., Ireland, J., Sklar, P., Ardlie, K., Patil, N., Shaw, N., Lane, C.R., Lim, E.P., Kalyanaraman, N., et al. 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22: 231-238.[CrossRef][Medline] Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., and Nickerson, D.A. 2004. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74: 106-120.[CrossRef][Medline] Chakravarti, A. 1999. Population genetics: Making sense out of sequence. Nat. Genet. 21: 56-60.[CrossRef][Medline]
Collins, F.S., Guyer, M.S., and Charkravarti, A. 1997. Variations on a theme: Cataloging human DNA sequence variation. Science 278: 1580-1581.
Collins, F.S., Brooks, L.D., and Chakravarti, A. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8: 1229-1231. Crawford, D.C., Bhangale, T., Li, N., Hellenthal, G., Rieder, M.J., Nickerson, D.A., and Stephens, M. 2004a. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36: 700-706.[CrossRef][Medline] Crawford, D.C., Carlson, C.S., Rieder, M.J., Carrington, D.P., Yi, Q., Smith, J.D., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004b. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74: 610-622.[CrossRef][Medline] Dahlqvist, A., Hammond, J.B., Crane, R.K., Dunphy, J.V., and Littman, A. 1963. Intestinal lactase deficiency and lactose intolerance in adults: Preliminary report. Gastroenterology 45: 488-491.[Medline] Doll, R. 1975. Pott and the path to prevention. Arch. Geschwulstforsch. 45: 521-531.[Medline] Ewing, B. and Green, P. 2000. Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25: 232-234.[CrossRef][Medline] Frosst, P., Blom, H.J., Milos, R., Goyette, P., Sheppard, C.A., Matthews, R.G., Boers, G.J., den Heijer, M., Kluijtmans, L.A., van den Heuvel, L.P., et al. 1995. A candidate genetic risk factor for vascular disease: A common mutation in methylenetetrahydrofolate reductase. Nat. Genet. 10: 111-113.[CrossRef][Medline]
Garcia-Diaz, M., Dominguez, O., Lopez-Fernandez, L.A., de Lera, L.T., Saniger, M.L., Ruiz, J.F., Parraga, M., Garcia-Ortiz, M.J., Kirchhoff, T., del Mazo, J., et al. 2000. DNA polymerase Haemmerli, U.P., Kistler, H., Ammann, T., Marthaler, T., Semenza, G., Auricchio, S., and Prader, A. 1965. Acquired milk intolerance in the adult caused by lactose malabsorption due to a selective deficiency of intestinal lactase activity. Am. J. Med. 38: 7-30.[CrossRef][Medline] Halushka, M.K., Fan, J.B., Bentley, K., Hsie, L., Shen, N., Weder, A., Cooper, R., Lipshutz, R., and Chakravarti, A. 1999. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 22: 239-247.[CrossRef][Medline]
Hutchison, D.C., Cook, P.J., and Barter, C.E. 1970. Pulmonary emphysema and Hyytinen, E.R., Frierson Jr., H.F., Sipe, T.W., Li, C.L., Degeorges, A., Sikes, R.A., Chung, L.W., and Dong, J.T. 1999. Loss of heterozygosity and lack of mutations of the XPG/ERCC5 DNA repair gene at 13q33 in prostate cancer. Prostate 41: 190-195.[CrossRef][Medline] The International HapMap Consortium 2003. The international HapMap project. Nature 426: 789-796.[CrossRef][Medline]
Jegga, A.G., Sherwood, S.P., Carman, J.W., Pinski, A.T., Phillips, J.L., Pestian, J.P., and Aronow, B.J. 2002. Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res. 12: 1408-1417. Johnson, G.C., Esposito, L., Barratt, B.J., Smith, A.N., Heward, J., Di Genova, G., Ueda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F., et al. 2001. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29: 233-237.[CrossRef][Medline] Jorde, L.B., Watkins, W.S., Kere, J., Nyman, D., and Eriksson, A.W. 2000. Gene mapping in isolated populations: New roles for old friends? Hum. Hered. 50: 57-65.[CrossRef][Medline] Kaiser, J. 2003. Tying Genetics to the risk of environmental diseases. Science 300: 563. Klotz, A.P. 1964. Intestinal lactase deficiency and diarrhea in adults. Am. J. Dig. Dis. 10: 345-354.[CrossRef][Medline] Kruglyak, L. and Nickerson, D.A. 2001. Variation is the spice of life. Nat. Genet. 27: 234-236.[CrossRef][Medline] Ladiges, W., Kemp, C., Packenham, J., and Velazquez, J. 2004. Human gene variation: from SNPs to phenotypes. Mutat. Res. 545: 131-139.[Medline] Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.[CrossRef][Medline] Li, W.H. and Sadler, L.A. 1991. Low nucleotide diversity in man. Genetics 129: 513-523.[Abstract]
Lieberman, J., Mittman, C., and Schneider, A.S. 1969. Screening for homozygous and heterozygous Mathonnet, G., Krajinovic, M., Labuda, D., and Sinnett, D. 2003. Role of DNA mismatch repair genetic polymorphisms in the risk of childhood acute lymphoblastic leukaemia. Br. J. Haematol. 123: 45-48.[CrossRef][Medline] Motulsky, A.G. 1972. Hemolysis in glucose-6-phosphate dehydrogenase deficiency. Fed. Proc. 31: 1286-1292.[Medline]
Ng, P.C. and Henikoff, S. 2003. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31: 3812-3814.
Nickerson, D.A., Tobe, V.O., and Taylor, S.L. 1997. PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25: 2745-2751. Nickerson, D.A., Taylor, S.L., Weiss, K.M., Clark, A.G., Hutchinson, R.G., Stengard, J., Salomaa, V., Vartiainen, E., Boerwinkle, E., and Sing, C.F. 1998. DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nat. Genet. 19: 233-240.[CrossRef][Medline] Olden, K. and Wilson, S. 2000. Environmental health and genomics: Visions and implications. Nat. Rev. Genet. 1: 149-153.[Medline]
Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. 1995. MatInd and MatInspector: New fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23: 4878-4884.
Ramensky, V., Bork, P., and Sunyaev, S. 2002. Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 30: 3894-3900. Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R., et al. 2001. Linkage disequilibrium in the human genome. Nature 411: 199-204.[CrossRef][Medline] Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., et al. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928-933.[CrossRef][Medline] Schoumacher, F., Glaus, A., Mueller, H., Eppenberger, U., Bolliger, B., and Senn, H.J. 2001. BRCA1/2 mutations in Swiss patients with familial or early-onset breast and ovarian cancer. Swiss Med. Wkly. 131: 223-226.[Medline] Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shiel, J.A., Thomas, N.S., Abeysinghe, S., Krawczak, M., and Cooper, D.N. 2003. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21: 577-581.[CrossRef][Medline] Stephens, M. and Donnelly, P. 2003. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 1162-1169.[CrossRef][Medline]
Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.H., et al. 2001. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293: 489-493.
Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., and Bork, P. 2001. Prediction of deleterious human alleles. Hum. Mol. Genet. 10: 591-597. Thomson, K.L., Gloyn, A.L., Colclough, K., Batten, M., Allen, L.I., Beards, F., Hattersley, A.T., and Ellard, S. 2003. Identification of 21 novel glucokinase (GCK) mutations in UK and European Caucasians with maturity-onset diabetes of the young (MODY). Hum. Mutat. 22: 417. van der Put, N.M., Steegers-Theunissen, R.P., Frosst, P., Trijbels, F.J., Eskes, T.K., van den Heuvel, L.P., Mariman, E.C., den Heyer, M., Rozen, R., and Blom, H.J. 1995. Mutated methylenetetrahydrofolate reductase as a risk factor for spina bifida. Lancet 346: 1070-1071.[CrossRef][Medline] Wall, J.D. and Pritchard, J.K. 2003. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4: 587-597.[CrossRef][Medline] Wang, Q., Lasset, C., Desseigne, F., Saurin, J.C., Maugard, C., Navarro, C., Ruano, E., Descos, L., Trillet-Lenoir, V., Bosset, J.F., et al. 1999. Prevalence of germline mutations of hMLH1, hMSH2, hPMS1, hPMS2, and hMSH6 genes in 75 French kindreds with nonpolyposi | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||