|
|
|
|
Vol. 9, Issue 8, 677-679, August 1999
INSIGHT/OUTLOOK
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ARTICLE |
|---|
|
|
|---|
A key aspect of research in genetics is associating sequence
variations with heritable phenotypes. The most common variations are
single nucleotide polymorphisms (SNPs), which occur approximately once
every 500-1000 bases in a large sample of aligned human sequence. Because SNPs are expected to facilitate large-scale association genetics studies, there has recently been great interest in SNP discovery and detection. In collaboration with the National Human Genome Research Institute (NHGRI), the National Center for
Biotechnology Information (NCBI) has established the dbSNP database
(http://www.ncbi.nlm.nih.gov/SNP) to serve as a central repository
for molecular variation. Designed to serve as a general catalog of
molecular variation to supplement GenBank (Benson et al. 1999
) database
submissions can include a broad range of molecular polymorphisms:
single base nucleotide substitutions, short deletion and insertion
polymorphisms, microsatellite markers, and polymorphic insertion
elements such as retrotransposons.
Although the name dbSNP is a slight misnomer given the variations represented, SNP polymorphisms are the largest class of variation in the database, and the name dbSNP, selected at the request of NHGRI, reflects this fact. For the sake of brevity, we elected to use the term SNP as a shorthand for "variation" in the database notation and documentation (http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=how_to_submit). Thus terms used in the documentation like "submitted SNP" or "reference SNP" refer to all classes of variation in the database and should be regarded as meaning "a submitted report of variation" and "a reference report of variation." Furthermore, it should be noted that in serving its role as the variation complement to GenBank, dbSNP does not restrict submissions to only neutral polymorphisms. Submissions are welcome on all classes of simple molecular variation, including those that cause rare clinical phenotypes.
Submissions to dbSNP come from a variety of sources including individual laboratories, collaborative polymorphism discovery efforts, large-scale genome sequencing centers, and private industry. The data collected range from the tightly focused characterization of particular genes to broadly sampled levels of variation from random genomic sequence. The distribution of reported marker density across the genome is thus expected to be mixed, with an expected minimum density of 1/3000 bases in regions of random genomic sequence, and local regions of higher density around well-characterized genes. Each variation submitted to dbSNP must have an identifier provided by the submitter (called a "local" identifier by dbSNP), and each is issued a unique identifier, formatted as an integer prefixed with ss (for submitted SNP), for example, ss334. An ss number is thus permanently associated with the submitter's identifier, and it can be treated as a formal accession number by the scientific publishing community.
For Each Variation, dbSNP Includes Links to Populations, Specific Locations within Molecular Sequences, and Assay Methods
Linking to Genomic Location
The sequence location permits us to specify the specific base(s) altered, and although obtained in several ways, it is always pinpointed within flanking sequence in the dbSNP submission. Simultaneous submission of either STS data documenting how to isolate the marker with PCR techniques, explicit linking to a GenBank accession number, and postsubmission computational analysis of the polymorphism and flanking sequence can all be used to align the flanking sequence to other sequence records in the NCBI databases. These alignments are analyzed to localize the variation and its flanking sequence within the genome. The quality and accuracy of this localization is determined by the quality and nature of the sequence; variations in segments of low-complexity sequence will be more difficult to localize than variations reported from segments of complex, unique sequence.
|
|
Linking Populations
Submitters describe the populations containing the variations using free text fields to classify their sample as specifically as possible. Submissions also require a specification of the sample size specified as the number of chromosomes that were examined in the course of discovery of the variation. Certain population samples that are publicly available have standardized descriptions and can be used by any submitter with data from those resources. Examples include the National Institutes of Health Polymorphism Discovery Resource (NIHPDR) (Collins et al. 1998Future Directions
dbSNP is in the early stages of a maturing database. In addition to obtaining variation data by original submissions, dbSNP is developing data exchange protocols with other public variation and mutation databases, such as HGBASE (human genic biallelic sequences) (http://hgbase.interactiva.de) and The SNP Consortium (TSC) public database (Masood 1999| |
FOOTNOTES |
|---|
4 Corresponding author.
E-MAIL sherry{at}ray.nlm.nih.gov; FAX (301) 435-7794.
1 The NCBI Reference Sequence project (RefSeq) provides sequence standards for the naturally occurring molecules of the central dogma, from chromosomes to mRNAs to proteins. RefSeq standards provide a foundation for the functional annotaion of the human genome and a stable reference point for mutation analysis, gene expression studies, and polymorphism discovery. (http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html)
2 UniGene is an experimental system for automatically partitioning GenBank sequences into a nonredundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. (http://www.ncbi.nlm.nih.gov/UniGene/index.html)
| |
REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H. Yang, C. P. Dinney, Y. Ye, Y. Zhu, H. B. Grossman, and X. Wu Evaluation of Genetic Variants in MicroRNA-Related Genes and Risk of Bladder Cancer Cancer Res., April 1, 2008; 68(7): 2530 - 2537. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Madsen, P. Villesen, and C. Wiuf A periodic pattern of SNPs in the human genome Genome Res., October 1, 2007; 17(10): 1414 - 1419. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Leil, C. Endo, A. A. Adjei, G. K. Dy, O. E. Salavaggione, J. R. Reid, M. M. Ames, and A. A. Adjei Identification and Characterization of Genetic Variation in the Folylpolyglutamate Synthase Gene Cancer Res., September 15, 2007; 67(18): 8772 - 8782. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-C. Hong, C. B. Ambrosone, J. Ahn, J.-Y. Choi, M. L. McCullough, V. L. Stevens, C. Rodriguez, M. J. Thun, and E. E. Calle Genetic Variability in Iron-Related Oxidative Stress Pathways (Nrf2, NQ01, NOS3, and HO-1), Iron Intake, and Risk of Postmenopausal Breast Cancer Cancer Epidemiol. Biomarkers Prev., September 1, 2007; 16(9): 1784 - 1794. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Lonard, R. B. Lanz, and B. W. O'Malley Nuclear Receptor Coregulators and Human Disease Endocr. Rev., August 1, 2007; 28(5): 575 - 587. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Uzun, C. M. Leslin, A. Abyzov, and V. Ilyin Structure SNP (StSNP): a web server for mapping and modeling nsSNPs on protein structures with linkage to metabolic pathways Nucleic Acids Res., July 13, 2007; 35(suppl_2): W384 - W392. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bonis, L. I. Furlong, and F. Sanz OSIRIS: a tool for retrieving literature about sequence variants Bioinformatics, October 15, 2006; 22(20): 2567 - 2569. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Yuan, J.-J. Chiou, W.-H. Tseng, C.-H. Liu, C.-K. Liu, Y.-J. Lin, H.-H. Wang, A. Yao, Y.-T. Chen, and C.-N. Hsu FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W635 - W641. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Weiss, L. A. Lester, J. E. Gern, R. L. Wolf, R. Parry, R. F. Lemanske, J. Solway, and C. Ober Variation in ITGB3 Is Associated with Asthma and Sensitization to Mold Allergen in Four Populations Am. J. Respir. Crit. Care Med., July 1, 2005; 172(1): 67 - 73. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Weckx, P. De Rijk, C. Van Broeckhoven, and J. Del-Favero SNPbox: a modular software package for large-scale primer design Bioinformatics, February 1, 2005; 21(3): 385 - 387. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Iwama and T. Gojobori Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network PNAS, December 7, 2004; 101(49): 17156 - 17161. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. CLIFFORD, M. N. EDMONSON, C. NGUYEN, T. SCHERPBIER, Y. HU, and K. H. BUETOW Bioinformatics Tools for Single Nucleotide Polymorphism Discovery and Analysis Ann. N.Y. Acad. Sci., May 1, 2004; 1020(1): 101 - 109. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kasprzyk, D. Keefe, D. Smedley, D. London, W. Spooner, C. Melsopp, M. Hammond, P. Rocca-Serra, T. Cox, and E. Birney EnsMart: A Generic System for Fast and Flexible Access to Biological Data Genome Res., January 1, 2004; 14(1): 160 - 169. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Marth, G. Schuler, R. Yeh, R. Davenport, R. Agarwala, D. Church, S. Wheelan, J. Baker, M. Ward, M. Kholodov, et al. Sequence variations in the public human genome data reflect a bottlenecked population history PNAS, January 7, 2003; 100(1): 376 - 381. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lee, L. Atanelov, B. Modrek, and Y. Xing ASAP: the Alternative Splicing Annotation Project Nucleic Acids Res., January 1, 2003; 31(1): 101 - 105. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Stitham, A. Stojanovic, and J. Hwa Impaired Receptor Binding and Activation Associated with a Human Prostacyclin Receptor Polymorphism J. Biol. Chem., May 3, 2002; 277(18): 15439 - 15444. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hirakawa, T. Tanaka, Y. Hashimoto, M. Kuroda, T. Takagi, and Y. Nakamura JSNP: a database of common gene variations in the Japanese population Nucleic Acids Res., January 1, 2002; 30(1): 158 - 162. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Nickerson, S. L. Taylor, S. M. Fullerton, K. M. Weiss, A. G. Clark, J. H. Stengård, V. Salomaa, E. Boerwinkle, and C. F. Sing Sequence Diversity and Large-Scale Typing of SNPs in the Human Apolipoprotein E Gene Genome Res., October 1, 2000; 10(10): 1532 - 1545. [Abstract] [Full Text] |
||||
![]() |
R. Clifford, M. Edmonson, Y. Hu, C. Nguyen, T. Scherpbier, and K. H. Buetow Expression-based Genetic/Physical Maps of Single-Nucleotide Polymorphisms Identified by the Cancer Genome Anatomy Project Genome Res., August 1, 2000; 10(8): 1259 - 1265. [Abstract] [Full Text] |
||||
![]() |
J. S. Lanchbury and N. J. Schork Peaks and troughs in linkage mapping for the rheumatic diseases Rheumatology, May 1, 2000; 39(5): 453 - 456. [Full Text] [PDF] |
||||
![]() |
E. M. Smigielski, K. Sirotkin, M. Ward, and S. T. Sherry dbSNP: a database of single nucleotide polymorphisms Nucleic Acids Res., January 1, 2000; 28(1): 352 - 355. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Guo, M. S. Gatterman, L. Hood, J. A. Hansen, and E. W. Petersdorf Oligonucleotide Arrays for High-Throughput SNPs Detection in the MHC Class I Genes: HLA-B as a Model System Genome Res., March 1, 2002; 12(3): 447 - 457. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||