|
|
|
|
Genome Res. 15:1553-1565, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00 Letter Genomic regions exhibiting positive selection identified from dense genotype data1 Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA 2 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064-1099, USA
The allele frequency spectrum of polymorphisms in DNA sequences can be used to test for signatures of natural selection that depart from the expected frequency spectrum under the neutral theory. We observed a significant (P = 0.001) correlation between the Tajima's D test statistic in full resequencing data and Tajima's D in a dense, genome-wide data set of genotyped polymorphisms for a set of 179 genes. Based on this, we used a sliding window analysis of Tajima's D across the human genome to identify regions putatively subject to strong, recent, selective sweeps. This survey identified seven Contiguous Regions of Tajima's D Reduction (CRTRs) in an African-descent population (AD), 23 in a European-descent population (ED), and 29 in a Chinese-descent population (XD). Only four CRTRs overlapped between populations: three between ED and XD and one between AD and ED. Full resequencing of eight genes within six CRTRs demonstrated frequency spectra inconsistent with neutral expectations for at least one gene within each CRTR. Identification of the functional polymorphism (and/or haplotype) responsible for the selective sweeps within each CRTR may provide interesting insights into the strongest selective pressures experienced by the human genome over recent evolutionary history.
According to the theory of neutral molecular evolution (Kimura 1983
A number of statistical tests have been devised that compare an observed SFS against neutral theory predictions. One of the most frequently used tests is Tajima's D (Tajima 1989
The identification of >10 million single nucleotide polymorphisms (SNPs) across the human genome, and the emergence of large-scale data sets of genotypes at a subset of these SNPs in multiple populations are leading to many insights into the patterns of sequence variation across the human genome (Sabeti et al. 2002
To evaluate the utility of the Perlegen data in evaluating Tajima's D, we compared genes resequenced by SeattleSNPs in the same African-descent (AD) and European-descent (ED) individuals. Tajima's D was compared for all autosomal genes where at least five SNPs within 10 kb of the transcript were polymorphic in the Perlegen data set. The final data set consisted of 179 genes meeting this criterion in at least one population, with 178 genes in the AD population and 173 genes in the ED population. The mean value for Tajima's D in the Perlegen data (0.94 for AD, 1.25 for ED) was substantially higher than in the SeattleSNPs data (0.54 for AD, 0.26 for ED), as expected given an ascertainment bias toward high-frequency SNPs in the Perlegen data. A significant correlation was observed for Tajima's D between these data sets in both populations (Fig. 1A,B for AD and ED, respectively). The correlation between these two data sets was stronger in the ED population (R2 = 0.59) compared with the AD population (R2 = 0.28) but was significant in both populations (P = 0.001 by Student's t-test). The observed correlation is based on a comparison of genic regions, and it is unclear how well the correlation extrapolates to large intergenic regions, as selective pressures (and therefore site frequency spectra) could be different in such regions. However, the SeattleSNPs data consist of 6% coding sequence, 4% UTR sequence, 70% intronic sequence, and 20% flanking intergenic sequence. In the SeattleSNPs data, no significant differences were observed in diversity between intronic sequence and the flanking intergenic sequence, so the observed correlation applies at least to proximal intergenic sequences. The correlation between these data sets suggested that the Perlegen data can be used to survey the genome for regions exhibiting extreme values of Tajima's D, so we applied a sliding window analysis to all three populations genotyped by Perlegen. The distribution of Tajima's D values for a 100-kbp sliding window is shown in Figure 2. The average windowed Tajima's D in the AD population was 1.20 (±0.73 SD) with a range of 2.41 to 4.03. In the ED population, the average Tajima's D was 1.40 (±1.01 SD) with a range of 2.80 to 4.34, while in the Chinese (XD) population the average was 1.45 (±1.13 SD) across a range of 3.11 to 4.42. These distributions were substantially skewed in all three populations, with a heavier tail to the distribution at low values.
Results from the sliding window analysis are depicted across a 50-megabase segment of chromosome 1 (chr1, 150,000,000) in Figure 3, and results were similar across the rest of the genome. Tracks displaying the sliding window data for each chromosome are available on the UCSC Genome Browser (Kent et al. 2002
A major assumption in the interpretation of SFS is that ascertainment of the SNPs genotyped is unbiased or at least consistently biased such that the positive shift in Tajima's D is similar across the genome. The SNPs genotyped in the Perlegen map were drawn from three sources: Perlegen's internal resequencing project (class A), dbSNP validated SNPs (class B), and dbSNP un-validated SNPs (class C). Because these three classes show different SFS, with classes A and C enriched for rare variants relative to class B, each class would be expected to show a different bias in the SFS. In theory, it is possible to correct an observed SFS for the ascertainment scheme used to select SNPs (Nielsen et al. 2004 2 test) and class C SNPs (1.9%, P = 0.006 by 2 test).
To further assess whether SNP ascertainment bias might account for the CRTRs, we examined each CRTR independently (Table 1). The relative rarity of class C meant that fewer than five class C SNPs were expected in most CRTRs, so we merged classes A and C for this comparison. The proportion of class B SNPs ranges from 1.8%85.3%, with just six CRTRs >50%. In 28 CRTRs we observed a significant departure in the frequency of class B SNPs from genome-wide averages, after Bonferroni correction for 54 tests. Tellingly, 16 of these departures were toward an excess of class B, and 12 were toward an excess of the non-B classes. Thus, although class B SNPs are modestly enriched in the CRTRs on average, only a minority of CRTRs are enriched for such SNPs, with a nearly even number of CRTRs enriched for type A and C SNPs. We interpret this as evidence that SNP class (and thus SNP ascertainment bias) has not substantially biased CRTR identification.
To further examine the possible effects of SNP class SFS bias in identification of CRTRs, we reanalyzed the genome by using only class A SNPs, which comprise It is also possible that low recombination rates biased our survey toward the identification of low diversity regions that coincide with low recombination rates. The average recombination rate for regions flanking the CRTRs was 0.67 cM/Mb (Table 1), which is significantly lower than the genome-wide average of 1.13 cM/Mb, but only four of the regions fell into recombination deserts with an estimated recombination rate of 0.0 cM/Mb. However, the observed correlation between CRTRs and low recombination regions is also expected if the CRTRs are the product of selective pressure, because the region that is swept along with an advantageous allele will be larger in regions of low recombination. Thus, there does appear to be a modest bias in the data toward identification of CRTRs in regions with low recombination rates.
To assess how well CRTRs in the Perlegen data predict Tajima's D in resequencing data, we selected eight targets from six population-specific CRTRs for resequencing (Table 2). In each CRTR, targets were chosen for one or more of the following reasons: dramatic allele frequency differences between populations in the Perlegen data (EDAR, CLSPN), central position within the CRTR (EDAR, CLSPN, SCMH1, FLJ23878), important gene function (GCG), or the target was a spliced EST in a CRTR without any known genes (AW183861
[GenBank]
, BX115137
[GenBank]
). CTPS and FLJ23878 were included because these genes, in combination with SCMH1, accounted for all of the known genes within a single CRTR in the AD population. Values of Tajima's D below 2 for resequencing data are significant under the simplest neutral models (Tajima 1989
Although we demonstrate that CRTRs identified from the Perlegen data predict extreme Tajima's D in resequencing data, dramatic departures from the expected SFS might be expected to occur at random in the genome, so we investigated whether the underlying nucleotide diversity of these regions was also consistent with selective pressure. Tajima's D detects departures from the expected SFS under neutral assumptions by comparing two measures of nucleotide diversity, and . The absolute values of these statistics can also provide evidence for selective pressure, as both are reduced as an advantageous allele nears fixation. Estimated from the 179 SeattleSNPs genes previously compared against the Perlegen data, the average was 9.02 (±3.80) x 104 in AD and 7.17 (±4.00) x 104 in ED populations, and the average was 10.44 (±3.14) x 104 in AD and 6.48 (±2.68) x 104 in ED populations. All of the CRTR genes resequenced show trends toward reduced in the appropriate population, and most also show reduced (Table 2), which is consistent with selection, although low absolute nucleotide diversity might also be attributable to reduced mutation rates in these regions.
We further examined the possibility that positive selective pressure might account for the resequenced CRTRs by calculating Fay and Wu's H statistic for each region (Fay and Wu 2000
Examination of the resequenced region from CLSPN (Fig. 4) revealed a coding SNP (Ser525Asn, position 10710 in Fig. 4B, dbSNP rs7537203) with extremely low serine allele frequency in the ED (4% frequency) population and extremely high serine allele frequency in the XD population (83%). This coding polymorphism was also typed in phase I of the HapMap, with similarly extreme differences in allele frequency between Asian and European populations. Close inspection of the CRTR spanning CLSPN in the ED population identified an extreme recombination hotspot conserved between all three populations at the telomeric end of the CRTR (Fig. 5), with a relative recombination rate >1000-fold above background inferred by LDhat (McVean et al. 2004 In contrast to the CLSPN CRTR, complete resequencing of the coding regions from CTP synthase (CTPS), FLJ23878 (a predicted gene with supporting spliced EST evidence), and Sex Comb on midleg homolog 1 (SCMH1) failed to identify a single coding variant that was significantly enriched in the AD population. Also in contrast to the CLSPN CRTR, very little recombination was observed across the 300 kilobases containing these three genes in any of the three populations (|D'| = 1 in >95% of pairwise comparisons), so this region is likely to be a recombination coldspot. However, the small amount of observed recombination in the AD population revealed a clear trend across the CRTR, with the lowest Tajima's D region spanning SCMH1. Thus, if a selectively advantageous allele exists in this CRTR, it may lie within the SCMH1 transcript region, but it does not appear to be a coding polymorphism.
EDAR was selected for resequencing because it showed reduced diversity in the XD population, as well as strikingly high levels of Fst between ED and XD populations at a few SNPs. This was confirmed by resequencing, where four SNPs were observed with an allele frequency difference of >85% between populations (SNPs 173, 1663, 2531, 93981; details available at the SeattleSNPs Web site), and three other SNPs showed a difference >50% in allele frequency (SNPs 429, 1158, and 62347). Among the four SNPs with the largest allele frequency differences, the 93891 SNP changes an amino acid, but the change is quite conservative (Val370Ala): It substitutes one small, nonpolar amino acid for another and is predicted to have "benign" effects by Polyphen (Sunyaev et al. 2001
The majority of identified CRTRs spanned more than one known gene, although 20% of the CRTRs (11 of 55 CRTRs) did not contain a "Known Gene" in the UCSC Genome Browser track. For example, the CRTR on chromosome 11 in the XD population (chr11, 3782000038290000) did not contain any known genes or RefSeq entries and had only one spliced EST with multiple exemplars (GenBank BX115137
[GenBank]
at chr11, 37,916,72737,932,789). Resequencing the region spanning BX115137
[GenBank]
confirmed the significantly low diversity in this region (Tajima's D= 2.60 in the XD population) (Table 2), thereby confirming findings from the Perlegen data set. This CRTR might reflect direct selective pressure upon this EST, or it might reflect selection upon a long-range regulatory element affecting expression of genes outside of the CRTR in the XD population. It is worth noting that the two of the closest genes to this CRTR are RAG1 and RAG2 (chr11, 36,546,15036,557,871, and chr11, 36,570,07036,576,362, respectively), which are essential for adaptive immunity through the rearrangement of T Cell Receptor genes (Fugmann 2001
Identifying regions of the human genome that have experienced substantial selective pressure can provide insights into the location of functionally important polymorphisms and may help prioritize targets for association mapping (Sabeti et al. 2002
The observed distributions of the windowed Tajima's D values were remarkably similar in the ED and XD populations (Fig. 3), suggesting not only that the demographic histories of these populations are similar, as has previously been suggested (Yu et al. 2001
Data from these three major populations showed significant differences in the quantity of CRTRs between populations. Overall, relatively few CRTRs were observed in the AD population, and the observed CRTRs were generally smaller than were those in the ED and XD populations. This could be due to less dramatic selective pressure on African populations in recent evolutionary history, but we consider it unlikely that selective pressure from pathogens or diet is substantially weaker in this population. If anything, the pathogen load might be expected to be highest in regions where humans have lived the longest, although the degree of mortality from pathogens may have been attenuated. Alternatively, admixed European chromosomes might serve to obscure Africa-specific selective sweeps, but this possibility also seems unlikely because Fst is low and few if any polymorphisms have fixated between these populations. Thus, admixture between European and African populations tends to reduce Tajima's D by increasing the number of relatively rare SNPs, and admixture from the European population should actually have enhanced detection of AD-specific CRTRs. Demographic parameters such as a population bottleneck in the Eurasian populations (Marth et al. 2004
Genes within the CRTR regions may provide important targets for genotype/phenotype studies. For example, CYP3A4 and CYP3A5 play a central role in the metabolism of some prescribed drugs, lie within a CRTR in the ED population, and have been shown to have significantly low Tajima's D in European samples (Thompson et al. 2004
Resequencing of candidate genes within a number of the CRTRs provides further support for this hypothesis: In every CRTR selected for resequencing, at least one gene with a dramatic departure from the expected distribution of Tajima's D under neutrality was observed (Table 2), and in most CRTRs, a significant departure from the expected distribution of Fay and Wu'sH was also observed (Table 3). In addition to theoretical evidence, the genes resequenced in the CRTRs also fell within the bottom 5% of the empirical distribution of Tajima's D in >170 genes resequenced by SeattleSNPs (Fig. 1). For example, the resequencing-based Tajima's D for SCMH1 was the lowest Tajima's D value that we have ever observed in the AD population, compared with data from 179 resequenced genes. Furthermore, the Tajima's D for genes selected from ED CRTRs was similar to the Tajima's D for genes previously demonstrated to be robustly incompatible with neutrality in several studies (Akey et al. 2004
Within each CRTR, it is apparent that a single common haplotype has recently increased dramatically in frequency, at the expense of all other haplotypes within the CRTR. However, it is not yet clear whether the fitness advantage is attributable to a genotype at a single SNP or a haplotype of multiple SNPs. Haplotype effects are more plausible when a single transcript spans the majority of a CRTR (e.g., PHKB on chromosome 16 in XD) or across regions containing groups of functionally related genes (e.g., the Olfactory Receptor gene cluster contained within the AD CRTR on chromosome 11). However, the majority of CRTRs contain multiple genes without clearly related gene functions, so we have not pursued analyses of gene function within the CRTRs, because it is likely that many if not most of the genes within the CRTRs simply represent hitchhiking events where the advantageous allele within a single gene swept the neighboring genes along with it as it increased in frequency (Fay and Wu 2000 The pattern of concordance for CRTRs between ED and XD populations was also interesting: An overlap of three CRTRs between these populations is quite striking, given that the CRTRs comprise <1% of the genome. Given that Tajima's D values from the resequencing data are in the range of genes previously reported to be inconsistent with neutrality under a range of demographic parameters, we believe that the shared CRTRs probably represent shared selective pressures between these populations. However, not all shared CRTRs necessarily represent a single selective sweep in multiple populations. For example, Tajima's D was significantly low at EDAR in both XD and ED populations, but the genomic extent of the CRTR is substantially greater in XD than ED populations. This could represent sweeps that occurred at different times historically, but the extreme Fst at a series of polymorphisms in EDAR is consistent with either a divergent sweep with one haplotype favored in ED and a different haplotype favored in XD, or parallel sweeps favoring an allele shared by both haplotypes but not by other African haplotypes (e.g., site 96563 in the 3' UTR, rs1478517). No CRTRs were observed to be shared between all three populations, but this would be consistent with the ascertainment bias toward high-frequency SNPs: If no high-frequency SNPs exist in any of the three populations, then no SNPs were available for use in the Perlegen data. Therefore, although global CRTRs were not observed, such regions may be present as large regions without genotype data in the Perlegen data set. Considering each of the regions resequenced, identification of the specific SNP or SNPs conferring a selective advantage is not trivial. For example, although the patterns of SFS suggest that SCMH1 is likely to harbor the advantageous allele responsible for the selective sweep that created the CRTR in the AD population, no coding polymorphism was identified in the AD population with significantly enriched allele frequency in either SCMH1 or the other two genes. Given that we resequenced all of the known coding regions in this CRTR, if a polymorphism within SCMH1 drove the sweep, then the function of the polymorphism was probably regulatory rather than structural. In contrast, the CLSPN and EDAR resequencing data identified interesting candidate cSNPs, and selective pressure on these cSNPs could conceivably account for the EDAR and CLSPN CRTRs. More extensive resequencing within each CRTR is required to determine whether other candidate SNPs exist in neighboring genes. In conclusion, the availability of adequately dense genotyping data sets clearly facilitates the identification of regions of the human genome with unusual SFS, which may have been subjected to strong positive selective pressure in the recent past. Current data appear to be adequate to identify such regions in ED and CD populations, but denser data will be necessary for analysis of AD populations, probably due to the larger effective population size of this population. Detection of regions subject to balancing selection (e.g., HLA) or with less complete selective sweeps (e.g., FY) will probably require a substantially denser data set than is currently available. Although most CRTRs span multiple genes, within each CRTR the selective sweep favored only one haplotype at the expense of all others, so a single selectively advantageous polymorphism in a single gene could conceivably account for each CRTR, with the reduced diversity in flanking regions representing a hitchhiking event. Dissection of the underlying functional variant (or variants) within each CRTR may require comprehensive resequencing within the CRTR to identify candidate functional variation, but where it is feasible, functional analysis of a priori functional variants (e.g., the CLSPN Ser525Asn cSNP) should substantially accelerate this process.
Samples Twenty-four individuals from each of three populations were resequenced: 24 African American individuals from the Coriell HD100AA diversity panel (population AD), 24 CEPH individuals (population ED), and 24 Chinese Americans from the Coriell HD100A diversity panel (population XD). All ED individuals overlap with the Perlegen European panel (dbSNP population 1371); all but one of the AD individuals overlap with the Perlegen African American panel (dbSNP population 1372); and all XD individuals overlap with the Perlegen Chinese panel (dbSNP population 1373). Coriell accession numbers are as follows: AD population, NA17101NA17116, NA17133NA17140; ED population, NA06990, NA07019, NA07348, NA07349, NA10830, NA10831, NA10842, NA10843, NA10844, NA10845, NA10848, NA10850, NA10851, NA10852, NA10853, NA10854, NA10857, NA10858, NA10860, NA10861, NA12547, NA12548, NA12560, NA17201; and XD population, NA17733NA17747, NA17749, NA17752NA17757, NA17759, NA17761.
Perlegen data
Sequencing analysis
Nucleotide diversity analysis
For genic comparisons between SeattleSNPs data and Perlegen data, Tajima's D in the SeattleSNPs data was calculated for all observed polymorphic sites. The median transcript size in the SeattleSNPs data set is 14,649 bp, and on average, an additional 3000 bp of flanking sequence was also resequenced, for a median analyzed region of 17.5 kbp. Given the density of the Perlegen map, the median number of polymorphic SNPs in the resequenced region was only six in the ED population and seven in the AD population, a rather small number of polymorphisms for Tajima's D estimation. One hundred nineteen out of 179 genes analyzed in the AD population had five or more polymorphic Perlegen SNPs within the resequenced region, and in the ED population, 107 genes had five or more polymorphic Perlegen SNPs. Significant linkage disequilibrium routinely extends 10 kb, even in the AD population, so patterns of nucleotide diversity should be conserved over similar distances. We extended the region analyzed from the Perlegen data to include 10 kb upstream and 10 kb downstream of the transcript, under the expectation that this would increase the number of sites per gene and therefore the accuracy of the Tajima's D estimate. Expanding the region raised the median number of polymorphic sites per gene to 16 in ED and 19 in AD. As expected, the larger number of sites per gene improved the correlation between the SeattleSNPs and Perlegen data substantially. In the AD population, R2 = 0.29 in extended versus R2 = 0.04 in the transcript for the 119 genes with five or more polymorphic SNPs in the transcript, and in the ED population, R2 = 0.66 in extended versus R2 = 0.15 in transcript for the 107 genes with five or more polymorphic SNPs in the transcript. Thus, for the genic comparison in Figure 1, Tajima's D in the Perlegen data was calculated on the basis of all observed polymorphic sites within 10 kb of the longest reported transcript in Entrez Gene. Within each population, only genes with five or more polymorphic SNPs in the Perlegen data were included in the comparison of data sets, which yielded 178 genes in the AD population and 173 genes in the ED population. For the windowed analysis of the genome, Tajima's D was calculated independently in each population. Sliding windows of 100 kb were analyzed across all autosomal regions in the Perlegen data, stepping by 10 kb. Thus, the first window evaluated on chromosome 1 was genome coordinates chr1, 1100,000; the second window was genome coordinates chr1, 10,001110,000; and so forth. Because adjacent windows overlap, in the genome browser track, the Tajima's D is reported for each window using the coordinates of the central 10 kb. Thus, the observed Tajima's D for window chr1, 1100,000, is reported at chr1, 45,00155,000. These data have been made available as a track in the UCSC genome browser (http://genome.ucsc.edu/).
The empirically determined distribution of Tajima's D within the sliding windows was used to identify CRTRs, defined as a region of
Fay and Wu's H analysis
Recombination rate analysis
This work was supported by a Program for Genomic Applications grant from the National Heart, Lung, and Blood Institute (HL66682 and HL66642 to D.N. and M.R.). D.T. was supported by grants from the National Human Genome Research Institute (IP41HG02371 and HG02238 to David Haussler). We thank Dana Crawford, Alex Reiner, and Eric Torskey for comments on the manuscript, as well as the entire SeattleSNPs resequencing team for their extraordinary efforts on this project.
[Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4326505. Freely available online through the Genome Research Immediate Open Access option.
3 Corresponding author.
Akey, J.M., Eberle, M.A., Rieder, M.J., Carlson, C.S., Shriver, M.D., Nickerson, D.A., and Kruglyak, L. 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2: e286.[CrossRef][Medline] Bersaglieri, T., Sabeti, P.C., Patterson, N., Vanderploeg, T., Schaffner, S.F., Drake, J.A., Rhodes, M., Reich, D.E., and Hirschhorn, J.N. 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74: 11111120.[CrossRef][Medline] Carlson, C.S., Eberle, M.A., Rieder, M.J., Smith, J.D., Kruglyak, L., and Nickerson, D.A. 2003. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet. 33: 518521.[CrossRef][Medline] Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., and Nickerson, D.A. 2004. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74: 106120.[CrossRef][Medline] Clark, A.G., Glanowski, S., Nielsen, R., Thomas, P., Kejariwal, A., Todd, M.J., Tanenbaum, D.M., Civello, D., Lu, F., Murphy, B., et al. 2003a. Positive selection in the human genome inferred from humanchimpmouse orthologous gene alignments. Cold Spring Harb. Symp. Quant. Biol. 68: 471477.[Medline]
Clark, A.G., Glanowski, S., Nielsen, R., Thomas, P., Kejariwal, A., Todd, M.J., Tanenbaum, D.M., Civello, D., Lu, F., Murphy, B., et al. 2003b. Inferring nonneutral evolution from humanchimp-mouse orthologous gene trios. Science 302: 19601963. Ewens, W.J. 1972. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3: 87112.[CrossRef][Medline] Ewens, W.J. 1979. Mathematical population genetics. Springer-Verlag, New York.
Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred, I: Accuracy assessment. Genome Res. 8: 175185.
Fay, J.C. and Wu, C.I. 2000. Hitchhiking under positive Darwinian selection. Genetics 155: 14051413. Fu, Y.X. and Li, W.H. 1993. Statistical tests of neutrality of mutations. Genetics 133: 693709.[Abstract] Fugmann, S.D. 2001. RAG1 and RAG2 in V(D)J recombination and transposition. Immunol. Res. 23: 2339.[CrossRef][Medline] Fullerton, S.M., Clark, A.G., Weiss, K.M., Taylor, S.L., Stengard, J.H., Salomaa, V., Boerwinkle, E., and Nickerson, D.A. 2002. Sequence polymorphism at the human apolipoprotein AII gene (APOA2): Unexpected deficit of variation in an African-American sample. Hum. Genet. 111: 7587.[CrossRef][Medline] Gibbs, R.A., Belmont, J.W., Hardenbol, P., Willis, T.D., Yu, F., Yang, H., Ch'ang, L.Y., Huang, W., Liu, B., Shen, Y., et al. 2003. The International HapMap Project. Nature 426: 789796.[CrossRef][Medline]
Gordon, D., Abajian, C., and Green, P. 1998. Consed: A graphical tool for sequence finishing. Genome Res. 8: 195202. Hamblin, M.T. and Di Rienzo, A. 2000. Detection of the signature of natural selection in humans: Evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66: 16691679.[CrossRef][Medline] Hamblin, M.T., Thompson, E.E., and Di Rienzo, A. 2002. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70: 369383.[CrossRef][Medline] Harpending, H. and Rogers, A. 2000. Genetic perspectives on human origins and differentiation. Annu. Rev. Genomics Hum. Genet. 1: 361385.[CrossRef][Medline]
Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., and Cox, D.R. 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307: 10721079.
Hudson, R.R. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337338. Hughes, A.L. and Yeager, M. 1998. Natural selection and the evolutionary history of major histocompatibility complex loci. Front. Biosci. 3: d509d516.[Medline]
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 9961006. Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, UK. Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31: 241247.[CrossRef][Medline]
Libert, F., Cochaux, P., Beckman, G., Samson, M., Aksenova, M., Cao, A., Czeizel, A., Claustres, M., de la Rua, C., Ferrari, M., et al. 1998. The deltaccr5 mutation conferring protection against HIV-1 in Caucasian populations has a single and recent origin in Northeastern Europe. Hum. Mol. Genet. 7: 399406.
Livingston, R.J., von Niederhausern, A., Jegga, A.G., Crawford, D.C., Carlson, C.S., Rieder, M.J., Gowrisankar, S., Aronow, B.J., Weiss, R.B., and Nickerson, D.A. 2004. Pattern of sequence variation across 213 environmental response genes. Genome Res. 14: 18211831.
Marth, G.T., Czabarka, E., Murvai, J., and Sherry, S.T. 2004. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166: 351372.
McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R., and Donnelly, P. 2004. The fine-scale structure of recombination rate variation in the human genome. Science 304: 581584.
Nickerson, D.A., Tobe, V.O., and Taylor, S.L. 1997. PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25: 27452751.
Nielsen, R., Hubisz, M.J., and Clark, A.G. 2004. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics 168: 23732382. Nielsen, R., Bustamante, C., Clark, A.G., Glanowski, S., Sackton, T.B., Hubisz, M.J., Fledel-Alon, A., Tanenbaum, D.M., Civello, D., White, T.J., et al. 2005. A Scan for positively selected genes in the genomes of humans and chimpanzees. PLoS. Biol. 3: e170.[CrossRef][Medline]
Rieder, M.J., Reiner, A.P., Gage, B.F., Nickerson, D.A., Eby, C.S., McLeod, H.L., Blough, D.K., Thummel, K.E., Veenstra, D.L., and Rettie, A.E. 2005. Effect of VKORC1 haplotypes on transcriptional regulation and warfarin dose. N. Engl. J. Med. 352: 22852293. Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z., Richter, D.J., Schaffner, S.F., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald, G.J., et al. 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832837.[CrossRef][Medline]
Seltsam, A., Hallensleben, M., Kollmann, A., and Blasczyk, R. 2003. The nature of diversity and diversification at the ABO locus. Blood 102: 30353042. Smith, J.M. and Haigh, J. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 2335.[Medline]
Stajich, J.E. and Hahn, M.W. 2005. Disentangling the effects of demography and selection in human history. Mol. Biol. Evol. 22: 6373. Straus, D.S. and Taylor, C.E. 1981. Hitchhiking and linkage disequilibrium between hemoglobin S and nearby restr |