Genome Research cityscape

Home Help [Feedback] [For Subscribers] [Archive] [Search] --
 QUICK SEARCH:   [advanced]


     


Published online before print September 5, 2006
Genome Research, DOI: 10.1101/gr.5431206
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
gr.5431206v1
16/10/1320    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Johnson, P. L.F.
Right arrow Articles by Slatkin, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Johnson, P. L.F.
Right arrow Articles by Slatkin, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Methods

Inference of population genetic parameters in metagenomics: A clean look at messy data

Philip L.F. Johnson1,3 and Montgomery Slatkin2

1 Biophysics Graduate Group, University of California, Berkeley, California 94720, USA; 2 Department of Integrative Biology, University of California, Berkeley, California 94720, USA

Metagenomic projects generate short, overlapping fragments of DNA sequence, each deriving from a different individual. We report a new method for inferring the scaled mutation rate, {theta} = 2Neu, and the scaled exponential growth rate, R = Ner, from the site-frequency spectrum of these data while accounting for sequencing error via Phred quality scores. After obtaining maximum likelihood parameter estimates for {theta} and R, we calculate empirical Bayes quality scores reflecting the posterior probability that each apparently polymorphic site is truly polymorphic; these scores can then be used for other applications such as SNP discovery. For realistic parameter ranges, analytic and simulation results show our estimates to be essentially unbiased with tight confidence intervals. In contrast, choosing an arbitrary quality score cutoff (e.g., trimming reads) and ignoring further quality information during inference yields biased estimates with greater variance. We illustrate the use of our technique on a new project analyzing activated sludge from a lab-scale bioreactor seeded by a wastewater treatment plant.


3 Corresponding author.

E-mail plfjohnson{at}berkeley.edu; fax (510) 643-6264.

Article published online before print. Article and publication date are online at http://www.genome.org/cgi/doi/10.1101/gr.5431206


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
P. L. F. Johnson and M. Slatkin
Accounting for Bias from Sequencing Error in Population Genetic Estimates
Mol. Biol. Evol., January 1, 2008; 25(1): 199 - 206.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Chen, R. E. Green, S. Paabo, and M. Slatkin
The Joint Allele-Frequency Spectrum in Closely Related Species
Genetics, September 1, 2007; 177(1): 387 - 398.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] --
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.
Copyright © 2006 by Cold Spring Harbor Laboratory Press.