Genome Research songbird

Home Help [Feedback] [For Subscribers] [Archive] [Search] --
 QUICK SEARCH:   [advanced]


     


Published online before print May 7, 2008
Genome Research, DOI: 10.1101/gr.070169.107
OPEN ACCESS ARTICLE
This Article
OPEN ACCESS ARTICLE
Right arrow Full Text (PDF)
Right arrow Supplemental Research Data
Right arrow All Versions of this Article:
gr.070169.107v1
gr.070169.107v2    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Google Scholar
Right arrow Articles by Zeller, G.
Right arrow Articles by Rätsch, G.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zeller, G.
Right arrow Articles by Rätsch, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Methods

Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays

Georg Zeller1,2, Richard M. Clark2,3, Korbinian Schneeberger2,3, Anja Bohlen1, Detlef Weigel2, and Gunnar Rätsch1,4

1 Friedrich Miescher Laboratory of the Max Planck Society, Tübingen 72070, Germany; 2 Max Planck Institute for Developmental Biology, Department of Molecular Biology, Tübingen 72070, Germany

Whole-genome oligonucleotide resequencing arrays have allowed the comprehensive discovery of single nucleotide polymorphisms (SNPs) in eukaryotic genomes of moderate to large size. With this technology, the detection rate for isolated SNPs is typically high. However, it is greatly reduced when other polymorphisms are located near a SNP as multiple mismatches inhibit hybridization to arrayed oligonucleotides. Contiguous tracts of suppressed hybridization therefore typify polymorphic regions (PRs) such as clusters of SNPs or deletions. We developed a machine learning method, designated margin-based prediction of polymorphic regions (mPPR), to predict PRs from resequencing array data. Conceptually similar to hidden Markov models, the method is trained with discriminative learning techniques related to support vector machines, and accurately identifies even very short polymorphic tracts (<10 bp). We applied this method to resequencing array data previously generated for the euchromatic genomes of 20 strains (accessions) of the best-characterized plant, Arabidopsis thaliana. Nonredundantly, 27% of the genome was included within the boundaries of PRs predicted at high specificity ({approx}97%). The resulting data set provides a fine-scale view of polymorphic sequences in A. thaliana; patterns of polymorphism not apparent in SNP data were readily detected, especially for noncoding regions. Our predictions provide a valuable resource for evolutionary genetic and functional studies in A. thaliana, and our method is applicable to similar data sets in other species. More broadly, our computational approach can be applied to other segmentation tasks related to the analysis of genomic variation.


3 These authors contributed equally to this work.

4 Corresponding author.

E-mail Gunnar.Raetsch{at}tuebingen.mpg.de; fax 49-7071-601-801.

[Supplemental material is available online at www.genome.org.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.070169.107


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?





Home Help [Feedback] [For Subscribers] [Archive] [Search] --
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.
Copyright © 2008 by Cold Spring Harbor Laboratory Press.