|
|
|
|
Published online before print
March 9, 2007, 10.1101/gr.6023607 Genome Res. 17:520-526, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Methods Recent human effective population size estimated from linkage disequilibrium1 Colon Cancer Genetics Group, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom; 2 MRC Human Genetics Unit, Western General Hospital, Edinburgh EH4 2XU, United Kingdom; 3 Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom; 4 Victorian Institute of Animal Science, DPI, Attwood 3049, Australia; 5 Queensland Institute of Medical Research, Royal Brisbane Hospital, Brisbane 4006, Australia; 6 The Wellcome Trust Centre for Human Genetics, The University of Oxford, Oxford OX3 7BN, United Kingdom; 7 Institute of Land and Food Resources, University of Melbourne, Parkville 3010, Australia
Effective population size (Ne) determines the amount of genetic variation, genetic drift, and linkage disequilibrium (LD) in populations. Here, we present the first genome-wide estimates of human effective population size from LD data. Chromosome-specific effective population size was estimated for all autosomes and the X chromosome from estimated LD between SNP pairs <100 kb apart. We account for variation in recombination rate by using coalescent-based estimates of fine-scale recombination rate from one sample and correlating these with LD in an independent sample. Phase I of the HapMap project produced between 18 and 22 million SNP pairs in samples from four populations: Yoruba from Ibadan (YRI), Nigeria; Japanese from Tokyo (JPT); Han Chinese from Beijing (HCB); and residents from Utah with ancestry from northern and western Europe (CEU). For CEU, JPT, and HCB, the estimate of effective population size, adjusted for SNP ascertainment bias, was 3100, whereas the estimate for the YRI was 7500, consistent with the out-of-Africa theory of ancestral human population expansion and concurrent bottlenecks. We show that the decay in LD over distance between SNPs is consistent with recent population growth. The estimates of Ne are lower than previously published estimates based on heterozygosity, possibly because they represent one or more bottlenecks in human population size that occurred 10,000 to 200,000 years ago.
Effective population size (Ne) is an important population parameter that helps to explain how human populations evolved and expanded, and to improve the understanding and modeling of the genetic architecture underlying complex traits (Reich and Lander 2001
In this study we estimated genome-wide Ne from LD using data from LD between each pair of SNPs depends on both Ne and the recombination rate between the SNPs. The distances between SNPs that we used (5100 kb) are too small to estimate recombination rate using pedigree-based linkage analysis, so we have used other methods. Since errors in estimates of recombination rates from population data might bias the estimate of Ne, we have used three different methods to estimate these recombination rates. Each method resulted in very similar estimates of effective population size.
All our analyses were based on the known approximate relationship between LD, as measured by r2, the squared correlation of allele frequencies at a pair of loci, and Ne. In particular, we used E(r2) 1/( + 4Nec) + 1/n for markers on the same autosome, where c is the recombination rate between the SNPs and n is the chromosome experimental sample size. The constant = 1 in the absence of mutation (Sved 1971 2 if mutation is taken into account (Hill 1975
Relationship between Ne and E(r2) without mutation
For the X chromosome, recombination occurs only in females. The X chromosome in males may have recombined (since it is a maternal chromosome), and only the maternal X chromosomes in females may have recombined. Hence, two-thirds of X chromosomes may have recombined and one-third may not. The sample size for the disequilibrium (correlation) coefficient is (3/2)Ne because females produce Ne X gametes and males produce (1/2)Ne gametes. Hence,
Relationship between Ne and E(r2) with mutation
Chance LD due to finite experimental sample size
= 1 in the absence of mutation, = 2 if mutation is taken into account, k = 4 for autosomes, and k = 2 for the X chromosome. In data applications, we observe r2 and, assuming that we know c or have a good estimate thereof, Ne can be estimated for autosomes and the X chromosome.
Data To compare LD across two samples from approximately the same population, data generated by Perlegen Sciences for European Americans (n = 46) were also obtained (http://genome.perlegen.com/browser/download.html).
Haplotype frequency and r2 estimation For the Perlegen data, standard EM algorithms were applied to estimate haplotype frequencies and these used to estimate r2 for all autosomes. We filtered out markers with a minor allele frequency <0.05 and estimated r2 for all pairs of markers formed by markers that were between 5 kb and 100 kb apart. A total of 866,949 pairwise r2 estimates were in common with the CEU HapMap sample.
Estimation of recombination rates using three methods
Method 1
More specifically, to obtain estimates of the recombination rate for any pair of SNPs, we fitted for each window the nonlinear model yij = 1/(
Given these estimates of local recombination rates, a nonlinear least squares regression method (details below) was subsequently used to estimate Ne from recombination distance between all pairs of markers. For a given pair of markers, the recombination distance was calculated from the estimated recombination rate per unit of physical distance of the window that was the midpoint of the location of the pair and the physical distance between the pair (i.e.,
Method 2
Method 3
Estimation of chromosome effective population size
and were estimated iteratively using least squares.
Heterozygosity and LD in a population depend on Ne over the history of the population. However, LD between SNPs a large distance apart reflects more recent Ne than LD between SNPs closer together (Hayes et al. 2003 Estimates of the scaled recombination rate and effective population size per chromosome obtained by method 1 were compared with the number of genes and length of the chromosome (NCBI build 35; http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606&build=previous) using correlation and linear regression.
The frequency distribution of the SNPs ascertained by the HapMap project is different from the frequency distribution of SNPs that have been completely ascertained (Nielsen et al. 2004
Figure 1 shows, as expected, a near linear relationship between 1/r2 and physical distance between pairs of SNPs. Linearity of the reciprocal of r2 with physical distance at small values, and a concave relationship at larger values, is clearly shown, consistent with a population that has increased over time.
For the first method of estimating c, the average estimates of effective population size for CEU are similar to those for JPT and HCB, but lower than those for YRI (Table 1). This is expected under the out-of-Africa theory of ancestral human population expansion (Templeton 2002
There were significantly different from zero and positive correlations between Ne and chromosome length in Mb (with significance values ranging from P = 0.002 for HCB to P = 0.06 for YRI) and number of genes (with significance values ranging from P = 0.02 for CEU to P = 0.05 for YRI), but not between Ne and gene density (with significance values P > 0.6 for all four populations). The significant correlations were driven by the low estimate of Ne for the short chromosomes 21 and 22.
The second method, which estimates recombination rates between each pair of adjacent HapMap markers from a model-free method that detects recombination hotspots from LD (Clarke and Cardon 2005
The third method used estimates of fine-scale recombination rates (rather than using physical distance as a proxy for recombination rate) from coalescent models from either Phase I HapMap (Altshuler et al. 2005
Hence, using three different methods to estimate recombination rate and using two different samples of individuals from European descent gave estimates of the effective population size ranging from 1901 (Method 2) to 2843 (Method 3).
From the simulation study we found that the estimation method was not biased when SNPs were simulated as if they had been completely ascertained (data not shown). We then simulated SNPs to mimic the SNP frequency distribution from the HapMap data. For this, SNPs with minor allele frequencies between 0.05 and 0.5 were ascertained with equal probability; i.e., the frequency distribution of the SNPs was uniform. The estimates of Ne obtained from mimicking the HapMap data were biased downward by
For the CEU and YRI samples we estimated effective population size as a function of time in the past. Results for the CEU data support recent dramatic population growth (Fig. 3A). This is in agreement with the likely demographic history of the ancestral population of the non-African samples; a population bottleneck, following an out-of-Africa expansion, followed by rapid growth (Watkins et al. 2001
Overall, the estimates of Ne appear to be much lower than the usually quoted value of 10,000 (Takahata 1993 200,000 yr ago ( 10,000 generations ago). Erlich et al. (1996) 10,000 from HLA polymorphisms. Sherry et al. (1997) 17,800 during the last one to two million yr from Alu repeats evolution. Our estimates of Ne were reasonably consistent across chromosomes and methods, and similar to estimates of Ne obtained from LD in 10 small genomic regions in a sample of 15 Italians (Frisse et al. 2001 1000 for the founding population from which modern humans derive. Other anthropological and genetic evidence has also suggested that the long-term Ne has been about three times larger in African populations than in non-African populations (Relethford and Harpending 1994
Our estimate of Ne for the X chromosome was 30%50% larger than that for the autosomes. The X chromosome in humans has a number of unusually long haplotypes (Altshuler et al. 2005
We determined by simulation how the approximate ascertainment of SNPs in HapMap Phase I could bias our estimates of Ne, and adjusted these accordingly. Recently, Peer et al. (2006)
In populations in which effective population size has changed over time, such as human populations, it is not meaningful to discuss effective population size without reference to a point in time (Hayes et al. 2003
We have used a relatively small sample of individuals, combined with high-density genome-wide marker genotyping, to infer ancestral population size based upon the observed amounts of LD. Our study has shown that human effective population size estimated from entire human chromosomes is considerably lower than previously suggested, at least during a bottleneck up to
We thank W.G. Hill for helpful discussions and help in the derivation of X-linked Ne, and N. Barton, B. Weir, J. Taylor, G. McVean, T. Johnson, and A. Morris for helpful discussions. A.T. acknowledges Cancer Research UK; P.N. was supported by the Genes to Cognition Project; and P.M.V. acknowledges the UK Biotechnology and Biological Sciences Research Council, the Wellcome Trust, and the Australian National Health and Medical Research Council for funding.
8 Corresponding author.
E-mail peter.visscher{at}qimr.edu.au; fax +61-7-3362-0101. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6023607
Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P. 2005. A haplotype map of the human genome. Nature 437: 12991320.[CrossRef][Medline] Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. 2005. Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263265. Clarke, G.M. and Cardon, L.R. 2005. Disentangling linkage disequilibrium and linkage from dense single-nucleotide polymorphism trio data. Genetics 171: 20852095. Eller, E. 2001. Estimating relative population sizes from simulated data sets and the question of greater African effective size. Am. J. Phys. Anthropol. 116: 112.[Medline] Erlich, H.A., Bergstrom, T.F., Stoneking, M., and Gyllensten, U. 1996. HLA sequence polymorphism and the origin of humans. Science 274: 15521554.[CrossRef][Medline] Frisse, L., Hudson, R.R., Bartoszewicz, A., Wall, J.D., Donfack, J., and Di Rienzo, A. 2001. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69: 831843.[CrossRef][Medline] Harpending, H.C., Sherry, S.T., Rogers, A.R., and Stoneking, M. 1993. The genetic structure of ancient human populations. Curr. Anthropol. 34: 483496.[CrossRef] Hayes, B.J., Visscher, P.M., McPartlan, H.C., and Goddard, M.E. 2003. Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 13: 635643. Hill, W.G. 1975. Linkage disequilibrium among multiple neutral alleles produced by mutation in finite population. Theor. Popul. Biol. 8: 117126.[CrossRef][Medline] Hill, W.G. 1981. Estimation of effective population size from data on linkage disequilibrium. Genet. Res. 38: 209216. Hill, W.G. and Robertson, A. 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38: 226231.[CrossRef] Hudson, R.R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23: 183201.[CrossRef][Medline] Hudson, R.R. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337338. The International HapMap Consortium. 2003. The International HapMap Project. Nature 426: 789796.[CrossRef][Medline] Kong, X., Murphy, K., Raj, T., He, C., White, P.S., and Matise, T.C. 2004. A combined linkagephysical map of the human genome. Am. J. Hum. Genet. 75: 11431148.[CrossRef][Medline] Liu, H., Prugnolle, F., Manica, A., and Balloux, F. 2006. A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79: 230237.[CrossRef][Medline] Lynch, M. and Walsh, B. 1998. Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, MA. McVean, G.A.T. 2002. A genealogical interpretation of linkage disequilibrium. Genetics 162: 987991. Myers, S., Bottolo, L., Freeman, C., McVean, G., and Donnelly, P. 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310: 321324. Nielsen, R., Hubisz, M.J., and Clark, A.G. 2004. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics 168: 23732382. Peer, I., Chretien, Y.R., de Bakker, P.I.W., Barrett, J.C., Daly, M.J., and Altshuler, D.M. 2006. Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am. J. Hum. Genet. 78: 588603.[CrossRef][Medline] Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., and Feldman, M.W. 1999. Population growth of human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16: 17911798.[Abstract] Reich, D.E. and Lander, E.S. 2001. On the allelic spectrum of human disease. Trends Genet. 17: 502510.[CrossRef][Medline] Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R., et al. 2001. Linkage disequilibrium in the human genome. Nature 411: 199204.[CrossRef][Medline] Relethford, J.H. and Harpending, H.C. 1994. Craniometric variation, genetic theory, and modern human origins. Am. J. Phys. Anthropol. 95: 249270.[CrossRef][Medline] Relethford, J.H. and Jorde, L.B. 1999. Genetic evidence for larger African population size during recent human evolution. Am. J. Phys. Anthropol. 108: 251260.[CrossRef][Medline] Rogers, A.R. and Harpending, H. 1992. Population growth makes waves in the distribution of pairwise genetic differences. Mol. Biol. Evol. 9: 552569.[Abstract] Service, S., DeYoung, J., Karayiorgou, M., Roos, J.L., Pretorious, H., Bedoya, G., Ospina, J., Ruiz-Linares, A., Macedo, A., Palha, J.A., et al. 2006. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nat. Genet. 38: 556560.[CrossRef][Medline] Sherry, S.T., Rogers, A.R., Harpending, H., Soodyall, H., Jenkins, T., and Stoneking, M. 1994. Mismatch distributions of mtDNA reveal recent human population expansions. Hum. Biol. 66: 761775.[Medline] Sherry, S.T., Harpending, H.C., Batzer, M.A., and Stoneking, M. 1997. Alu evolution in human populations: Using the coalescent to estimate effective population size. Genetics 147: 19771982.[Abstract] Sved, J.A. 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor. Popul. Biol. 2: 125141.[CrossRef][Medline] Takahata, N. 1993. Allelic genealogy and human evolution. Mol. Biol. Evol. 10: 222.[Abstract] Templeton, A.R. 2002. Out of Africa again and again. Nature 416: 4551.[CrossRef] Thomson, R., Pritchard, J.K., Shen, P.D., Oefner, P.J., and Feldman, M.W. 2000. Recent common ancestry of human Y chromosomes: Evidence from DNA sequence data. Proc. Natl. Acad. Sci. 97: 73607365. Visscher, P.M. and Hill, W.G. 2006. Estimation of recombination rate and detection of recombination hotspots from dense single-nucleotide polymorphism trio data. Genetics 173: 24152417. Watkins, W.S., Ricker, C.E., Bamshad, M.J., Carroll, M.L., Nguyen, S.V., Batzer, M.A., Harpending, H.C., Rogers, A.R., and Jorde, L.B. 2001. Patterns of ancestral human diversity: An analysis of Alu-insertion and restriction-site polymorphisms. Am. J. Hum. Genet. 68: 738752.[CrossRef][Medline] Weir, B.S. and Hill, W.G. 1980. Effect of mating structure on variation in linkage disequilibrium. Genetics 95: 477488. Zhang, W.H., Collins, A., Gibson, J., Tapper, W.J., Hunt, S., Deloukas, P., Bentley, D.R., and Morton, N.E. 2004. Impact of population structure, effective bottleneck time, and allele frequency on linkage disequilibrium maps. Proc. Natl. Acad. Sci. 101: 1807518080. Zhivotovsky, L.A., Rosenberg, N.A., and Feldman, M.W. 2003. Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. Am. J. Hum. Genet. 72: 11711186.[CrossRef][Medline]
Received October 9, 2006; accepted in revised format January 17, 2007. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||