Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Peterson, R. J.
Right arrow Articles by Long, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Peterson, R. J.
Right arrow Articles by Long, J. C.
Right arrowPubmed/NCBI databases
*Substance via MeSH
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 9, Issue 9, 844-852, September 1999

LETTER
Effects of Worldwide Population Subdivision on ALDH2 Linkage Disequilibrium

Raymond J. Peterson,1,2,3 David Goldman,1 and Jeffrey C. Long1

1 Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, Maryland 20892-8110 USA; 2 Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802 USA

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
METHODS
REFERENCES

The effect of human population subdivision on linkage disequilibrium has previously been studied for unlinked genes. However, no study has focused on closely linked polymorphisms or formally partitioned linkage disequilibrium within and among worldwide populations. With an emphasis on population subdivision, the goal of this paper is to investigate the causes of linkage disequilibrium in ALDH2, the gene that encodes aldehyde dehydrogenase 2. Haplotypes for 756 people from 17 populations across five continents were estimated by maximum-likelihood from genotypes at six closely linked ALDH2 nucleotide substitutions. Linkage disequilibrium was partitioned into three components: within populations, among populations within continents, and among continents. It was found that population subdivision among continents had a larger and more disparate effect on linkage disequilibrium than subdivision among local populations. Further, linkage disequilibrium did not increase with population divergence as predicted by a simple model. Rather, the patterns of linkage disequilibrium were complicated because of the interplay of a near absence of recombination, the linkage disequilibrium that existed prior to the divergence of modern humans, subsequent mutation, population subdivision, random genetic drift, and perhaps natural selection. These results suggest that simple models may not well predict patterns of linkage disequilibrium in human populations.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
METHODS
REFERENCES

Linkage disequilibrium is the nonrandom association of alleles at different loci. In an ideal population at equilibrium, linkage disequilibrium is predicted to approach zero at a rate dependent on the recombination fraction. However, linkage disequilibrium can be generated by genetic drift (Hill and Robertson 1968), population subdivision (Nei and Li 1973), natural selection (Lewontin 1964), and mutation (Ohta 1982a,b). Because of this, it is not surprising that complicated patterns of linkage disequilibrium are observed in human populations (Jorde et al. 1994; Lewontin 1995; Clark et al. 1998).

Despite the complexity of observed patterns, several expectations have emerged. One is that linkage disequilibrium is expected to peak near a disease gene when the disease allele is rare (Ajioka et al. 1997). Application of this principle led to the positional cloning of the genes for cystic fibrosis (Kerem et al. 1989) and diastrophic dysplasia (Hästbacka et al. 1992, 1994). Another expectation is that linkage disequilibrium between a frequent disease allele and alleles at marker loci may be best preserved in a small, constant-sized population (Laan and Pääbo 1997, 1998; Terwilliger et al. 1998). However, recent theoretical work challenges this view for frequent alleles (Lonjou et al. 1999). With respect to population subdivision, while linkage disequilibrium is expected to vary among subdivisions of finite size, the average among subdivisions is expected to be zero (Hill and Robertson 1968). Finally, the variance of linkage disequilibrium is expected to increase with population subdivision and to decrease with migration (Ohta 1982a).

For pairs of unlinked genes the effect of population subdivision on linkage disequilibrium has been studied in the Tecumseh, Michigan population (Sinnock and Sing 1972) and within South American Indian villages (Smouse and Neel 1977; Smouse et al. 1983). In both studies an excess number of statistically significant values were attributed to the populations being recently founded by migrants from source populations that differed in allele frequency. In addition, Smouse and colleagues (Smouse and Neel 1977; Smouse et al. 1983) found that the effect of population subdivision on linkage disequilibrium was greater among clusters of villages than among local villages.

For pairs of closely linked polymorphisms, three studies have examined linkage disequilibrium in worldwide samples. Castiglione et al. (1995) investigated alleles at a dinucleotide short tandem repeat polymorphism (STRP) and two restriction site polymorphisms (RSPs) in DRD2. Tishkoff et al. (1996, 1998) examined alleles at a pentanucleotide STRP and an Alu deletion polymorphism in CD4, and a trinucleotide STRP, an Alu deletion polymorphism, and two RSPs in DM. All three studies found that African populations had many haplotypes and low levels of linkage disequilibrium. In contrast, nonAfrican populations had a subset of the African haplotypes and almost complete linkage disequilibrium. These results were attributed to a founder event at the time modern humans emigrated from Africa.

The goal of this paper is to investigate the causes of worldwide linkage disequilibrium in ALDH2, the gene that encodes aldehyde dehydrogenase 2. ALDH2 is located on chromosome 12q24.2 (Raghunatan et al. 1988) and spans 44 kb (Hsu et al. 1988). Haplotypes were estimated from alleles at six biallelic sites within ALDH2 (Fig. 1) that were genotyped in 756 people from 17 populations across five continents (Peterson et al. 1999). ALDH2 has a dominant deficiency allele that is frequent in, but private to, Asia (Yoshida et al. 1984). The deficiency allele is of interest because natural selection in the form of conferring resistance to parasite infection may have preserved this allele in Asia (Ikuta et al. 1986; Goldman and Enoch 1990; R.J. Peterson, D. Goldman, and J.C. Long, in prep.).


View larger version (10K):
[in this window]
[in a new window]
 
Figure 1   ALDH2 genomic structure and variable sites. Solid segments are exons; open segments are introns. Numbers indicated the variable sites that were genotyped in the worldwide survey. () Nucleotide substitutions.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
METHODS
REFERENCES

Allele and Haplotype Frequency

The allele frequencies at each site and in each population are shown in Table 1. Examination of the multisite homozygotes and single-site heterozygotes yielded seven directly observed haplotype states. The maximum-likelihood frequency estimates of these haplotypes and their jackknife standard errors are tabulated in Table 2. Below, haplotype states are given within brackets. A 1 represents the reference allele, and 2 represents the variant allele. The alleles are ordered by site where the sites are in the order 1, 2, 3, 5, 6, and 12. For brevity, each haplotype state is designated by a number and the letter H. The site and haplotype numbers are from Peterson et al. (1999).

                              
View this table:
[in this window]
[in a new window]
 
Table 1.   Frequency of the Variant Allele (×1000) at Six Sites in 17 Worldwide Populations

                              
View this table:
[in this window]
[in a new window]
 
Table 2.   Estimated Frequency of Unique ALDH2 Haplotypes (×1000) and Jackknife SE in 17 Worldwide Populations

Three haplotypes had worldwide distribution: H1 [111111], H2 [211111], and H3 [122121] (Fig. 2). Of note, the frequencies of H1 and H2 were nearly reversed in the African Biaka and in Europeans and the variant alleles at sites 2, 3, and 6 usually co-occurred. Although H4 [111212] was private to Asia, it attained a frequency of 25% in the Chinese, Taiwanese of Chinese descent, and Japanese. H4 carried the deficiency allele as well as the usually co-occurring variant at site 5. The high frequency of H4 in Asia appears to have come about largely at the expense of H1. The combined frequency of H1 and H4 in Asia is 67.9%, almost identical to the 66.7% frequency of H1 in the African Biaka. The remaining haplotypes were observed in single copy only: H6 [111121] in the African Biaka, H8 [111211] in the Chinese and H9 [111112] in the Japanese.


View larger version (49K):
[in this window]
[in a new window]
 
Figure 2   Allele and haplotype frequencies. (A) Frequency of the variant (minor) allele at each site in each population. The alleles are colored to match the corresponding haplotypes depicted in B. The order of populations is the same as B. (B) Hpalotype frequencies in each population. H- = H6 + H8 + H9.

Population Divergence

The variants at the six sites naturally formed three groups of sites: site 1; sites 2, 3, and 6; and sites 5 and 12. Sites within each group yielded nearly identical allele frequency distributions and fixation indices. Fixation indices (Wright 1978), or F-statistics, measure population divergence as the among group proportion of the total allele frequency variance. To avoid redundancy the F-statistics are not reported individually but rather for each group of sites. F-statistics were calculated for local populations relative to continental average, continental average to worldwide average, and local populations to worldwide average. In the following, S indexes local populations, C indexes continental averages, and T indexes the worldwide average.

At site 1, allele frequency differences among local populations resulted in an FSC of 8% (Table 3). Allele frequency differences among continents resulted in an FCT of 37%. The divergence among all sub-populations (FST) was 42%. Because of the almost one-to-one correspondences between haplotype frequency and variant allele frequency, these F-statistics can also be explained in terms of the haplotype frequency variation. At site 1, the F-statistics largely reflect the H1 and H2 frequency reversal in the African Biaka and the Europeans.

                              
View this table:
[in this window]
[in a new window]
 
Table 3.   Fixation Indices and Jackknife Standard Errors

Sites 2, 3, and 6 contrasted the frequency of H3 with H1, H2, and H4. Reflecting the low frequency variation of H3 among populations FSC was 3%, FCT was 0%, and FST was 3%. Sites 5 and 12 contrasted H4 with H1, H2, and H3. Here, FSC was 12%, FCT was 17%, and FST was 27%. These latter F-statistics were due entirely to the restriction of H4 and the deficiency allele to Asia. Treating the haplotypes as multiple alleles at a single locus, the haplotypic FSC was 5.9%, FCT was 24.4%, and FST was 28.8%. Here too the low frequency variation of H3 among populations contributed little to these values.

As the jackknife standard errors indicate (Table 3), the confidence intervals for FSC often overlap zero but those of FCT or FST usually do not. This result indicates that subdivision among continents has played the more important role in divergence of allele and haplotype frequencies. As indicated by their standard errors, FST values for site 1 (42%) and for sites 5 and 12 (27%) were significantly larger than the 10%---15% FST values usually reported for RSPs (Bowcock et al. 1991; Jorde et al. 1995). Such large values may be due to random genetic drift, natural selection, or both.

Two-Site Linkage Disequilibrium Analysis

The linkage disequilibrium coefficient (DA1B1) compares haplotype frequency (PA1B1) with the product of the allele frequencies (pA1 and qB1). That is, DA1B1 = PA1B1 - pA1qB1 (Weir 1996). Hereafter D is given without any subscripts when the argument pertains to a pair of alleles at any two sites. Because D depends on allele frequency, it was normalized to the maximum that it could have been given the allele frequencies: D` = D/Dmax (Lewontin 1964). It was also normalized to the following correlation coefficient (Hill and Robertson 1968):
r<SUB>A<SUB>1</SUB>B<SUB>1</SUB></SUB> = <BINOM><NU>D<SUB>A<SUB>1</SUB>B<SUB>1</SUB></SUB></NU><DE><RAD><RCD>p<SUB>A<SUB>1</SUB></SUB>p<SUB>A<SUB>2</SUB></SUB>q<SUB>B<SUB>1</SUB></SUB>q<SUB>B<SUB>2</SUB></SUB></RCD></RAD></DE></BINOM>
Within each population D' was -1.0 or +1.0. This outcome is consistent with the fact that the variability at all six sites was essentially carried on only four haplotypes. The approximate variance of D' is 0 when D' = -1.0 or +1.0 (Zapata et al. 1997), making these D' values statistically significant. The correlation coefficient (r) results are given in Table 4. For pairings that involve site 1, r was relatively low except in the Europeans. The Europeans were unique in that only two haplotypes dominated the frequency spectrum. Low r values were also observed for the 2, 3, 6 versus 5, 12 pairing in Asia. In contrast, and concordant with the usual co-occurrence of alleles at sites 2, 3, and 6, and at sites 5 and 12, the r values for pairings within these groups were at or near unity.

                              
View this table:
[in this window]
[in a new window]
 
Table 4.   Two-Site ALDH2 Linkage Disequilibrium Correlation Coefficients (r) in 17 Worldwide Populations

In a subdivided population, the total linkage disequilibrium (DT) can be partitioned into additive components using a hierarchical model (Nei and Li 1973). Here DT was partitioned into DW + DSC DCT where DW is the average linkage disequilibrium within populations, DSC is the linkage disequilibrium among local populations, and DCT is the linkage disequilibrium among continents. DW, DSC, and DCT were then normalized to DT to obtain dW, dSC and dCT. Interestingly, DSC and DCT depend solely on allele frequency differences among groups (Nei and Li 1973). Consequently allelic divergence among populations can increase, decrease, or leave unchanged linkage disequilibrium.

The partitioning of worldwide ALDH2 linkage disequilibrium revealed that linkage disequilibrium within populations (dW) usually accounted for most of the total linkage disequilibrium (Table 5). In addition, the effect of population subdivision was greater and more disparate among continents (dCT) than among local populations (dSC). Specifically, dSC ranged from just 1% to 6% whereas dCT ranged from -10% to 70%. The dCT values can be explained by the fact that large among group values require large allele frequency differences at both sites of the two-site haplotype (Sinnock and Sing 1972). As indicated by the jackknife standard errors, all of the dSC and dCT estimates were significantly different from 0. It can be concluded that population subdivision, both among local populations and among continents, had a significant effect on the worldwide linkage disequilibrium.

                              
View this table:
[in this window]
[in a new window]
 
Table 5.   Partition of Worldwide ALDH2 Linkage Disequilibrium: Estimates and Jackknife Standard Errors

The within-population r2 values (computed from Table 4) were plotted against the haplotypic FSC and FCT values (Fig. 3). These values were then compared with Hill and Robertson's (1968) model of population divergence, which predicts that linkage disequilibrium increases with FST (broken line). The ALDH2 r2 values ranged from well below to almost as far above the predicted line as was possible. Clearly, the model of Hill and Robertson (1968) did not fit the ALDH2 data well. This result is perhaps not surprising given that this island model assumed a large population initially in linkage equilibrium, equality of population sizes, and the absence of mutation and natural selection. The evolutionary history of ALDH2 likely violates several if not all of these assumptions. Furthermore, Hill and Robertson's model assumed that the product of effective population size multiplied by the recombination rate was large. This is not likely for the closely linked sites surveyed here.


View larger version (11K):
[in this window]
[in a new window]
 
Figure 3   Contrast of population divergence with variance of linkage disequilibrium. Fsc (0.06) is contrasted with the within-population r2 values, and FCT (0.24) is contrasted with the continental r2 values. (Broken line) Relationship between Fst and r2 predicted from the model of Hill and Robertson (1968). () Predictions for Fst = 0.06 and Fst = 0.24. (open circle ) Site 1 vs. 2, 3, 6 (Africa and Asia), site 1 vs. 5, 12 (Asia); and sites 2, 3, 6 vs. 5, 12 (Asia) () Site 1 vs. 2, 3, 6 (Americas); (triangle ) site 1 vs. 2, 3, 6 (Europe); (diamond ) sites 2, 3, 6 (Africa, Europe, Asia, Americas); site 5 vs. 12 (Asia) (note that each diamond  accounts for several data points; see text for details).

    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
METHODS
REFERENCES

A striking pattern of ALDH2 haplotypic variation was the maximal linkage disequilibrium and corresponding low number of haplotypes. While the number of haplotypes that segregate at a locus depends on historical effective population size and natural selection, combinatorics show that there are 2s possible haplotype states from s biallelic sites. From a related perspective, the cladistic model of haplotype evolution predicts that s + 1 haplotypes must have existed in evolutionary history to establish variability at each site. These primary haplotypes create a network of haplotypes that differ from each other by single mutational steps (Long et al. 1990). Some or all of the remaining 2s - s - 1 haplotype states could exist in a population because of recombination.

At ALDH2, 26 = 64 haplotype states are possible, but only seven states were observed and only four were frequent. At least 6 + 1 = 7 one-step haplotypes must have existed in evolutionary history. However, it is impossible that the seven observed haplotypes comprise the primary set. H3 differs from H1 at three sites, and the two intermediate one-step haplotypes are completely, or essentially, missing. Whereas H6 provides an intermediate link at one of the steps, only a single copy was observed and it may have arisen by recombination. Similarly, H4 differs from H1 at two sites. Although H8 and H9, each observed in single-copy, connect H4 and H1 by single steps, at least one of these haplotypes must have been formed by recombination, and it is possible that both were. Thus, three of the seven one-step haplotypes were essentially, or entirely, missing. In humans, one-step haplotypes are frequently missing, as evidenced by beta -globin (Harding et al. 1997) and NF1 (Jorde et al. 1993) haplotype phylogenies.

The number of segregating haplotypes at ALDH2 may be due to natural selection at ALDH2 (R.J. Peterson, D. Goldman, and J.C. Long, in prep.) or selection on a closely linked gene. A coalescent analysis of the ALDH2 haplotype phylogeny suggests that, given a neutral model, the age of the deficiency allele is expected to be 149,000 (35,000-416,000) years (R.J. Peterson, D. Goldman, and J.C. Long, in prep.). Such an ancient apparent age rivals the origin of modern humans and predates the colonization of Asia. This suggests that natural selection has increased the frequency of the deficiency allele in Asia faster than expected under a neutral model, and directional selection can reduce the number of haplotypes at a locus. Alternatively, the low number of ALDH2 haplotypes could be the result of a population bottleneck that is recent relative to the mutation rate.

Because only one African population was sampled, a complete understanding of the African versus non-African patterns of ALDH2 haplotype variation awaits the sampling of more African populations. Speculatively, the fact that the African Biaka shared the worldwide pattern of linkage disequilibrium at sites 2, 3, and 6 suggests that the pattern arose before the divergence of modern humans and has not subsequently decayed. An absence of a strong out-of-Africa effect at ALDH2 is hinted by the similarity of the H1 frequency in the Biaka with the H1 + H4 frequency in Asia. Interestingly, while the variants at sites 5 and 12 were in complete linkage disequilibrium, their presence only in Asia indicates a recent Asian origin and perhaps natural selection. Thus, the African versus non-African pattern at ALDH2 may contrast with DRD2, CD4, and DM. At these latter loci non-Africans had higher linkage disequilibrium and segregated a subset of African haplotypes (Castiglione et al. 1995; Tishkoff et al. 1996; 1998). However, this pattern was not as extreme at DRD2. This suggests that an out-of-Africa effect had less effect at DRD2 than at CD4 and DM.

These distinct patterns of linkage disequilibrium have several explanations. An out-of-Africa founder event may by chance have had less effect at ALDH2, or natural selection could have been stronger. In contrast to the other loci, the ALDH2 haplotypes did not comprise STRP alleles. The STRP mutation rate is several orders of magnitude higher than the nucleotide substitution rate (Weber and Wong 1993). Because of this, STRPs may better resolve recent human evolution. Whatever the explanation, these divergent results suggest that patterns of linkage disequilibrium vary across the human genome.

In addition, the ALDH2 evidence suggests that the pattern of linkage disequilibrium at any set of closely linked sites may depend on the pattern of linkage disequilibrium that existed in ancestral populations. Because each gene may have had a unique pattern of linkage disequilibrium in ancestral populations, the effect of population subdivision at individual genes may be idiosyncratic. This insight contrasts with the situation of unlinked genes (Smouse and Neel 1977; Smouse et al. 1983). Because unlinked genes have a low covariance of allele and haplotype frequency, a particular pattern of allelic divergence can occur from many different starting haplotype distributions. Thus, while the effect of population divergence on ALDH2 is likely to be locus dependent, the effect of population divergence on unlinked genes is likely to be independent of the particular set of loci studied.

The importance of ancestral linkage disequilibrium to current patterns has recently received theoretical treatment (Lonjou et al. 1999). These investigators showed that for ancient polymorphisms linkage disequilibrium is largely determined by regional founders, whereas subsequent demography has little effect. This theory was supported by data at the MNSs, RHCE, and CD4 loci. Of implication to genetic epidemiologists, and contrary to current belief (Terwilliger et al. 1998), is that isolates may actually be less advantageous than large populations for linkage disequilibrium studies (Lonjou et al. 1999).

Another important insight is that patterns of linkage disequilibrium may vary within a set of closely linked sites. At ALDH2, the effect of population subdivision varied greatly depending on the groups of sites that were compared. This complicated pattern reiterates that linkage disequilibrium among populations is not a simple function of population divergence (Nei and Li 1973). Distinct patterns of linkage disequilibrium were also observed among tightly linked sites in the AI-CIII apolipoprotein gene region (Thompson et al. 1988). Specifically, linkage disequilibrium was found between two flanking RSPs but not with an internal RSP. In this case, the power to detect linkage disequilibrium was low because the major allele of each flanking RSP occurred with the rare allele of the internal RSP (Thompson et al. 1988). These results suggest that patterns of linkage disequilibrium in a gene region may not be fully described by analyzing a single pair of sites. Rather, the proper characterization of linkage disequilibrium may require the examination of alleles at several sites. This same conclusion was reached in a linkage disequilibrium analysis of 88 variable sites in the human lipoprotein lipase gene (Clark et al. 1998).

Population subdivision clearly affected the worldwide pattern of ALDH2 linkage disequilibrium. Linkage disequilibrium among local populations and among continents was significantly different from zero. The magnitude of this effect was greater among continents than among local populations. The present study augments the original one-level model of population subdivision (Sinnock and Sing 1972; Nei and Li 1973) by extending it to a second level. Because it was found that the effects of population subdivision were greater among continents than among local populations, this extension represents an important advance in resolution. Moreover, the fact that linkage disequilibrium among local populations was statistically significant suggests a cautionary note to the genetic epidemiologist considering mixing local populations to fine map disease genes.

Hill and Robertson's (1968) model did not fit the data well. This observation suggests that simple models do not provide a reasonable framework for understanding worldwide linkage disequilibrium at ALDH2. Violating the assumptions of the model, ALDH2 linkage disequilibrium was likely maximal in a finite-sized ancestral population. Further, natural selection may have acted (R.J. Peterson, D. Goldman, and J.C. Long, in prep.), mutation is evident, human population sizes have not been equal (Urbanek et al. 1996), and the hierarchy of human populations is not balanced (Nei and Roychoudhury 1993). This suggests that other simple models, such as Ohta's (1982a,b) partition of the variance of linkage disequilibrium, will also not fit the data well.

The extension of the model of Nei and Li (1973) to two levels provided valuable insights that would have been missed with a one-level partition. However, it can also be concluded that this simple two-level partition was inadequate to fully describe the effects of human demographic history on ALDH2 linkage disequilibrium. Perhaps the crucial improvement is to model unequal rates of evolution and a realistic human population phylogeny (Nei and Roychoudhury 1993; Urbanek et al. 1996). The emergence of fine-scale haplotype data for many genes in many populations is likely to provide continuing impetus to incorporate population subdivision into coalescence models of linkage disequilibrium (Rannala and Slatkin 1998).

    METHODS
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
METHODS
REFERENCES

Molecular Methods and Population Samples

ALDH2 is defined as the segment of DNA from which aldehyde dehydrogenase 2 is transcribed (Fig. 1). The six sites analyzed here are a subset of the 12 sites reported by Peterson et al. (1999). The six sites not included in this analysis had only a rare variant, and rare variants are uninformative of the effects of population subdivision on linkage disequilibrium. Site 1 was discovered by M. Stewart (pers. comm.). Sites 2, 3, 5, and 6 were discovered by Peterson et al. (1999). Site 12 is the site that defines the well-known Glu-487-Lys deficiency allele (Yoshida et al. 1984). PCR, restriction enzymes, and SSCP methods were used to genotype the variable sites. Genotypes were collected on a worldwide sample consisting of Africa, 51 Biakans; Asia, 24 Cambodians, 47 Han Chinese, 49 Japanese, 40 South Koreans, 43 Taiwanese, and 50 Black Thai; Europe, 32 Ceph, 41 Finns, 45 Swedes; North America, 51 Cheyenne, 50 Maya, 46 Navajo, 45 Pima; and South America, 49 Karitiana, 44 Rondonian Surui, and 49 Ticuna. Samples were provided and donated by a variety of researchers (Peterson et al. 1999).

Statistical Analyses

For each site, the allele with the higher worldwide frequency was assigned to be the reference allele. Because phase-unknown multi-site genotypes were collected, haplotype states and frequencies were estimated by maximum-likelihood using an expectation-maximization (E-M) method (Dempster et al. 1977). Details of this method, and the associated jackknife standard errors, are presented in Long et al. (1995) and Peterson et al. (1999).

The number of segregating sites in each population ranged from four to six. A contingency table chi 2 test for departure from single-site Hardy-Weinberg expectation (Weir 1996) was applied to each segregating site in each population. Altogether, 68 tests were performed. Four tests had P-values of <5%. These tests lacked independence due to the correlation of alleles among sites. Despite this result, it is reasonable to conclude that this number of departures from Hardy-Weinberg expectation reflects sampling fluctuation under the null hypothesis.

For linkage disequilibrium analysis, the two-site haplotype frequencies were obtained from the six-site haplotype frequencies by summing frequencies of all haplotypes with each specific combination of alleles at the two sites (Long et al. 1995). Linkage disequilibrium in the worldwide data set was partitioned as follows, where DT, DW, DSC, and DCT are as defined in the Results. Suppose there are K populations across C continents, and Kc populations on the cth continent. With respect to the total sample, each population has relative size wi, with
<LIM><OP>∑</OP><LL>i = 1</LL><UL>K</UL></LIM> w<SUB>i</SUB> = 1.0
and each continent has relative size wc, with
<LIM><OP>∑</OP><LL>c = 1</LL><UL>C</UL></LIM> w<SUB>c</SUB> = 1.0
With respect to a continental pooling, each population has relative size
<LIM><OP>∑</OP><LL>i = 1</LL><UL>K<SUB>c</SUB></UL></LIM> w<SUB>ci</SUB> = 1.0
Following Nei and Li (1973),
D<SUB>W</SUB> = <LIM><OP>∑</OP><LL>i = 1</LL><UL>K</UL></LIM>w<SUB>i</SUB>D<SUB>i</SUB>
D<SUB>SC</SUB> = <LIM><OP>∑</OP><LL>c = 1</LL><UL>C</UL></LIM><LIM><OP>∑</OP><LL>i = 1</LL><UL>K<SUB>c</SUB></UL></LIM>w<SUB>ci</SUB>   (p<SUB>ci</SUB> − p<SUB>c</SUB>)(q<SUB>ci</SUB> − q<SUB>c</SUB>)
D<SUB>CT</SUB> = <LIM><OP>∑</OP><LL>c = 1</LL><UL>C</UL></LIM>w<SUB>c</SUB>     (p<SUB>c</SUB> − p<SUB>T</SUB>)(q<SUB>c</SUB> − q<SUB>T</SUB>)
Here, pci is the reference allele frequency at the first site in the ith population on the cth continent, and qci is the analogous reference allele frequency at the second site. As these equations show, DSC and DCT reflect allele frequency differences among populations (Sinnock and Sing 1972). Because of this diff., DW, DSC, and DCT were normalized to the total linkage disequilibrium such that dW = DW/DT, dSC = DSC/DT and dCT = DCT/DT. Standard errors were calculated by use of a bootstrap method.

The variance of linkage disequilibrium is D2 in replicate subpopulations of finite size drawn from a population initially in linkage equilibrium (Hill and Robertson 1968). Because D2 depends on allele frequency, it is normalized to the variances of allele frequency to obtain the squared correlation coefficient (r2). In relation to FST, E[r2] = [6(1 - FST)=5(1 - FST)3 - (1 - FST6] 15(Hill and Robertson 1968). F-statistics for a two-level partition with equal effects were estimated using the method of Urbanek et al. (1996). The relationship (1 - FST) = (1 - FSC)(1 - FCT) (Wright 1978) was used to obtain FST. Standard errors of the estimates were obtained by use of a jackknife procedure (Weir 1996).

    ACKNOWLEDGMENTS

We thank Longina Akhtar for maintaining the Laboratory of Neurogenetics cell lines. Ken Kidd, Su-Jen Tsai, Dr. Chandanayingyong, and Dr. Park kindly provided DNA samples. Mark Stewart provided details on the variant at site 1. Margrit Urbanek, Andrew Bergen, and Jaakko Lappalainen contributed invaluable advice on laboratory techniques and useful discussions. Ken Weiss, Andy Clark, Mark Stoneking, and Henry Harpending provided helpful comments on earlier drafts. We are grateful to two anonymous reviewers for insightful comments. This work was supported by a National Institute of Alcohol Abuse and Alcoholism predoctoral Intramural research training award to R.J.P.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

    FOOTNOTES

3 Corresponding author.

E-MAIL peterson{at}ncifcrf.gov; FAX (301)846-1909.

    REFERENCES
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
METHODS
REFERENCES

  • Ajioka, R.S., L.B. Jorde, J.R. Gruen, P. Yu, D. Dimitrova, J. Barrow, E. Radisky, C.Q. Edwards, L.M. Griffen, and J.P. Kushner. 1997. Haplotype analysis of Hemochromatosis: Evaluation of different linkage-disequilibrium approaches and evolution of disease chromosomes. Am. J. Hum. Genet. 60: 1439-1447[Medline].
  • Bowcock, A.M., J.M. Hebert, J.L. Mountain, J.R. Kidd, J. Rogers, K.K. Kidd, and L.L. Cavalli-Sforza. 1991. Study of an additional 58 DNA markers in five human populations from four continents. Gene Geogr. 5: 151-173[Medline].
  • Castiglione, C.M., A.S. Deinard, W.C. Speed, G. Sirugo, H.C. Rosenbaum, Y. Zhang, D.K. Grandy, E.L. Grigorenko, B. Bonne-Tamir, A.J. Pakstis, J.R. Kidd, and K.K. Kidd. 1995. Evolution of haplotypes at the DRD2 locus. Am. J. Hum. Genet. 57: 1445-1456[Medline].
  • Clark, A.G., K.M. Weiss, D.A. Nickerson, S.L. Taylor, A. Buchanan, J. Stengård, V. Salomaa, E. Vartiainen, M. Perola, E. Boerwinkle, and C.F. Sing. 1998. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63: 595-612[CrossRef][Medline].
  • Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B. 39: 1-38.
  • Goldman, D. and M.-A. Enoch. 1990. Genetic epidemiology of ethanol metabolic enzymes: A role for selection. World Rev. Nutr. Diet 63: 143-160[Medline].
  • Harding, R.M., S.M. Fullerton, R.C. Griffiths, J. Bond, M.J. Cox, J.A. Schneider, D.S. Moulin, and J.B. Clegg. 1997. Archaic African and Asian lineages in the genetic ancestry of modern humans. Am. J. Hum. Genet. 60: 772-789[Medline].
  • Hästbacka, J., A. de la Chapelle, I. Kaitila, P. Sistonen, A. Weaver, and E. Lander. 1992. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat. Genet. 2: 204-211[CrossRef][Medline].
  • Hästbacka, J., A. de la Chapelle, M.M. Mahtani, G. Clines, M.P. Reeve-Daly, M. Daly, B.A. Hamilton, K. Kusumi, B. Trivedi, A. Weaver, A. Coloma, M. Lovett, A. Buckler, I. Kaitila, and E. Lander. 1994. The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78: 1073-1087[CrossRef][Medline].
  • Hill, W.G. and A. Robertson. 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 33: 54-78.
  • Hsu, L.C., R.E. Bendel, and A. Yoshida. 1988. Genomic structure of the human mitochondrial aldehyde dehydrogenase gene. Genomics 2: 57-65[CrossRef][Medline].
  • Ikuta, T., S. Szeto, and A. Yoshida. 1986. Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence. Proc. Natl. Acad. Sci. 83: 634-638[Abstract/Free Full Text].
  • Jorde, L.B., W.S. Watkins, D. Viskochil, P. O'Connell, and K. Ward. 1993. Linkage disequilibrium in the Neurofibromatosis 1 (NF1) region: Implications for gene mapping. Am. J. Hum. Genet. 53: 1038-1050[Medline].
  • Jorde, L.B., W.S. Watkins, M. Carlson, J. Groden, H. Albertsen, A. Thliveris, and M. Leppert. 1994. Linkage disequilibrium predicts physical distance in the Adenomatous Polyposis Coli region. Am. J. Hum. Genet. 54: 884-898[Medline].
  • Jorde, L.B., M.J. Bamshad, W.S. Watkins, R. Zenger, A.E. Fraley, P.A. Krakowiak, K.D. Carpenter, H. Soodyall, T. Jenkins, and A.R. Rogers. 1995. Origins and affinities of modern humans: A comparison of mitochondrial and nuclear genetic data. Am. J. Hum. Genet. 57: 523-538[Medline].
  • Kerem, B.-S., J.M. Rommens, J.A. Buchanan, D. Markiewicz, T.K. Cox, A. Chakravarti, M. Buchwald, and L.-C. Tsui. 1989. Identification of the Cystic Fibrosis gene: Genetic analysis. Science 245: 1073-1080[Abstract/Free Full Text].
  • Laan, M. and S. Pääbo. 1997. Demographic history and linkage disequilibrium in human populations. Nat. Genet. 17: 435-438[CrossRef][Medline].
  • -----. 1998. Mapping genes by drift-generated linkage disequilibrium. Am. J. Hum. Genet. 63: 654-656[CrossRef][Medline].
  • Lewontin, R.C. 1964. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49: 49-67[Free Full Text].
  • -----. 1995. The detection of linkage disequilibrium in molecular sequence data. Genetics 140: 377-388[Abstract].
  • Long, J.C., A. Chakravarti, C.D. Boehm, S. Antonarakis, and H.H. Kazazian. 1990. Phylogeny of human b-globin haplotypes and its implications for recent human evolution. Am. J. Phys. Anthropol. 81: 113-130[CrossRef][Medline].
  • Long, J.C., R.C. Williams, and M. Urbanek. 1995. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am. J. Hum. Genet. 56: 799-810[Medline].
  • Lonjou, C., A. Collins, and N.E. Morton. 1999. Allelic association between marker loci. Proc. Natl. Acad. Sci. 96: 1621-1626[Abstract/Free Full Text].
  • Nei, M. and W.-H. Li. 1973. Linkage disequilibrium in subdivided populations. Genetics 75: 213-219[Abstract/Free Full Text].
  • Nei, M. and A.K. Roychoudhury. 1993. Evolutionary relationships of human populations on a global scale. Mol. Biol. Evol. 10: 927-943[Abstract].
  • Ohta, T. 1982a. Linkage disequilibrium due to random genetic drift in finite subdivided populations. Proc. Natl. Acad. Sci. 79: 1940-1944[Abstract/Free Full Text].
  • -----. 1982b. Linkage disequilibrium with the island model. Genetics 101: 139-155[Abstract/Free Full Text].
  • Peterson, R.J., D. Goldman, and J.C. Long. 1999. Nucleotide sequence diversity in non-coding regions of ALDH2 as revealed by restriction enzyme and SSCP analysis. Hum. Genet. 104: 177-187[CrossRef][Medline].
  • Raghunathan, L., L.C. Hsu, I. Klisak, R.S. Sparkes, A. Yoshida, and T. Mohandas. 1988. Regional localization of the human genes for aldehyde dehydrogenase-1 and aldehyde dehydrogenase-2. Genomics 2: 267-269[CrossRef][Medline].
  • Rannala, B. and M. Slatkin. 1998. Likelihood analysis of disequilibrium mapping, and related problems. Am. J. Hum. Genet. 62: 459-473[CrossRef][Medline].
  • Sinnock, P. and C.F. Sing. 1972. Analysis of multilocus genetic systems in Tecumseh, MI. II. Considerations of the correlation between nonalleles in gametes. Am. J. Hum. Genet. 24: 393-415[Medline].
  • Smouse, P.E. and J.V. Neel. 1977. Multivariate analysis of gametic disequilibrium in the Yanomama. Genetics 85: 733-752[Abstract/Free Full Text].
  • Smouse, P.E., J.V. Neel, and W. Liu. 1983. Multiple-locus departures from panmictic equilibrium within and between village gene pools of Amerindian tribes at different stages of agglomeration. Genetics 104: 133-153[Abstract/Free Full Text].
  • Tajima, F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437-460[Abstract/Free Full Text].
  • Terwilliger, J.D., S. Zollner, M. Laan, and S. Pääbo. 1998. Mapping genes through the use of linkage disequilibrium generated by genetic drift: "Drift mapping" in small populations with no demographic expansion. Hum. Hered. 48: 138-154[CrossRef][Medline].
  • Thompson, E.A., S. Deeb, D. Walker, and A.G. Motulsky. 1988. The detection of linkage disequilibrium between closely linked markers: RFLPs at the AI-CIII Apolipoprotein genes. Am. J. Hum. Genet. 42: 113-124[Medline].
  • Tishkoff, S.A., E. Dietzsch, W. Speed, A.J. Pakstis, J.R. Kidd, K. Cheung, B. Bonné-Tamir, A.S. Santachiara-Benerecetti, P. Moral, M. Krings, S. Pääbo, E. Watson, N. Risch, T. Jenkins, and K.K. Kidd. 1996. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271: 1380-1387[Abstract].
  • Tishkoff, S.A., A. Goldman, F. Calafell, W.C. Speed, A.S. Deinard, B. Bonne-Tamir, J.R. Kidd, A.J. Pakstis, T. Jenkins, and K.K. Kidd. 1998. A global haplotype analysis of the Myotonic Dystrophy locus: Implications for the evolution of modern humans and for the origin of Myotonic Dystrophy mutations. Am. J. Hum. Genet. 62: 1389-1402[CrossRef][Medline].
  • Urbanek, M., D. Goldman, and J.C. Long. 1996. The apportionment of dinucleotide repeat diversity in Native Americans and Europeans: A new approach to measuring gene identity reveals asymmetric patterns of divergence. Mol. Biol. Evol. 13: 943-953[Abstract].
  • Weber, J.L. and C. Wong. 1993. Mutation of human short tandem repeats. Hum. Mol. Genet. 2: 1123-1128[Abstract/Free Full Text].
  • Weir, B.S. 1996. Genetic data analysis II: Methods for discrete population genetic data. Sinauer, Sunderland, MA.
  • Wright, S. 1978. Evolution and the genetics of populations. In Variability within and among natural populations., Vol. 4 The University of Chicago Press, Chicago, IL.
  • Yoshida, A., I.-Y. Huang, and M. Ikawa. 1984. Molecular abnormality of an inactive aldehyde-dehydrogenase variant commonly found in Orientals. Proc. Natl. Acad. Sci. 81: 258-261[Abstract/Free Full Text].
  • Zapata, C., G. Alvarez, and C. Carollo. 1997. Approximate variance of the standardized measure of gametic disequilibrium D'. Am. J. Hum. Genet. 61: 771-774[Medline].

Received January 7, 1999; accepted in revised form June 29, 1999.


9:844-852 ©1999 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/99 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
FocusHome page
D. Goldman, G. Oroszi, and F. Ducci
The Genetics of Addictions: Uncovering the Genes
Focus, August 1, 2006; 4(3): 401.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
L. Tiret, O. Poirier, V. Nicaud, S. Barbaux, S.-M. Herrmann, C. Perret, S. Raoux, C. Francomme, G. Lebard, D. Tregouet, et al.
Heterogeneity of linkage disequilibrium in human genes has implications for association studies of common diseases
Hum. Mol. Genet., February 1, 2002; 11(4): 419 - 429.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Wu and Z.-B. Zeng
Joint Linkage and Linkage Disequilibrium Mapping in Natural Populations
Genetics, February 1, 2001; 157(2): 899 - 909.
[Abstract] [Full Text]


Home page
Hum Mol GenetHome page
H.G. Koch, J. McClay, E.-W. Loh, S. Higuchi, J.-H. Zhao, P. Sham, D. Ball, and I. W. Craig
Allele association studies with SSR and SNP markers at known physical distances within a 1 Mb region embracing the ALDH2 locus in the Japanese, demonstrates linkage disequilibrium extending up to 400 kb
Hum. Mol. Genet., December 1, 2000; 9(20): 2993 - 2999.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Peterson, R. J.
Right arrow Articles by Long, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Peterson, R. J.
Right arrow Articles by Long, J. C.
Right arrowPubmed/NCBI databases