|
|
|
|
Vol. 9, Issue 9, 844-852, September 1999
LETTER
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The effect of human population subdivision on linkage disequilibrium has previously been studied for unlinked genes. However, no study has focused on closely linked polymorphisms or formally partitioned linkage disequilibrium within and among worldwide populations. With an emphasis on population subdivision, the goal of this paper is to investigate the causes of linkage disequilibrium in ALDH2, the gene that encodes aldehyde dehydrogenase 2. Haplotypes for 756 people from 17 populations across five continents were estimated by maximum-likelihood from genotypes at six closely linked ALDH2 nucleotide substitutions. Linkage disequilibrium was partitioned into three components: within populations, among populations within continents, and among continents. It was found that population subdivision among continents had a larger and more disparate effect on linkage disequilibrium than subdivision among local populations. Further, linkage disequilibrium did not increase with population divergence as predicted by a simple model. Rather, the patterns of linkage disequilibrium were complicated because of the interplay of a near absence of recombination, the linkage disequilibrium that existed prior to the divergence of modern humans, subsequent mutation, population subdivision, random genetic drift, and perhaps natural selection. These results suggest that simple models may not well predict patterns of linkage disequilibrium in human populations.
| |
INTRODUCTION |
|---|
|
|
|---|
Linkage disequilibrium is the nonrandom
association of alleles at different loci. In an ideal population at
equilibrium, linkage disequilibrium is predicted to approach zero at a
rate dependent on the recombination fraction. However, linkage
disequilibrium can be generated by genetic drift (Hill and Robertson
1968
), population subdivision (Nei and Li 1973
), natural selection
(Lewontin 1964
), and mutation (Ohta 1982a
,b
). Because of this, it is
not surprising that complicated patterns of linkage disequilibrium are
observed in human populations (Jorde et al. 1994
; Lewontin 1995
; Clark et al. 1998
).
Despite the complexity of observed patterns, several expectations have
emerged. One is that linkage disequilibrium is expected to peak near a
disease gene when the disease allele is rare (Ajioka et al. 1997
).
Application of this principle led to the positional cloning of the
genes for cystic fibrosis (Kerem et al. 1989
) and diastrophic dysplasia
(Hästbacka et al. 1992
, 1994
). Another expectation is that
linkage disequilibrium between a frequent disease allele and alleles at
marker loci may be best preserved in a small, constant-sized population
(Laan and Pääbo 1997
, 1998
; Terwilliger et al. 1998
).
However, recent theoretical work challenges this view for frequent
alleles (Lonjou et al. 1999
). With respect to population subdivision,
while linkage disequilibrium is expected to vary among subdivisions of
finite size, the average among subdivisions is expected to be zero
(Hill and Robertson 1968
). Finally, the variance of linkage
disequilibrium is expected to increase with population subdivision and
to decrease with migration (Ohta 1982a
).
For pairs of unlinked genes the effect of population subdivision on
linkage disequilibrium has been studied in the Tecumseh, Michigan
population (Sinnock and Sing 1972
) and within South American Indian
villages (Smouse and Neel 1977
; Smouse et al. 1983
). In both studies an
excess number of statistically significant values were attributed to
the populations being recently founded by migrants from source
populations that differed in allele frequency. In addition, Smouse and
colleagues (Smouse and Neel 1977
; Smouse et al. 1983
) found that the
effect of population subdivision on linkage disequilibrium was greater
among clusters of villages than among local villages.
For pairs of closely linked polymorphisms, three studies have examined
linkage disequilibrium in worldwide samples. Castiglione et al. (1995)
investigated alleles at a dinucleotide short tandem repeat polymorphism
(STRP) and two restriction site polymorphisms (RSPs) in DRD2.
Tishkoff et al. (1996
, 1998
) examined alleles at a pentanucleotide STRP
and an Alu deletion polymorphism in CD4, and a trinucleotide
STRP, an Alu deletion polymorphism, and two RSPs in DM. All
three studies found that African populations had many haplotypes and
low levels of linkage disequilibrium. In contrast, nonAfrican
populations had a subset of the African haplotypes and almost complete
linkage disequilibrium. These results were attributed to a founder
event at the time modern humans emigrated from Africa.
The goal of this paper is to investigate the causes of worldwide
linkage disequilibrium in ALDH2, the gene that encodes
aldehyde dehydrogenase 2. ALDH2 is located on chromosome
12q24.2 (Raghunatan et al. 1988
) and spans 44 kb (Hsu et al. 1988
).
Haplotypes were estimated from alleles at six biallelic sites within
ALDH2 (Fig. 1) that were genotyped in 756 people from 17 populations across five continents (Peterson et al.
1999
). ALDH2 has a dominant deficiency allele that is frequent
in, but private to, Asia (Yoshida et al. 1984
). The deficiency allele
is of interest because natural selection in the form of conferring
resistance to parasite infection may have preserved this allele in Asia
(Ikuta et al. 1986
; Goldman and Enoch 1990
; R.J. Peterson, D. Goldman,
and J.C. Long, in prep.).
|
| |
RESULTS |
|---|
|
|
|---|
Allele and Haplotype Frequency
The allele frequencies at each site and in each population are shown
in Table 1. Examination of the multisite homozygotes and single-site heterozygotes yielded seven directly observed haplotype
states. The maximum-likelihood frequency estimates of these haplotypes
and their jackknife standard errors are tabulated in Table
2. Below, haplotype states are given within brackets. A 1 represents the reference allele, and 2 represents the variant allele. The alleles are ordered by site where the sites are in the
order 1, 2, 3, 5, 6, and 12. For brevity, each haplotype state is
designated by a number and the letter H. The site and haplotype numbers
are from Peterson et al. (1999)
.
|
|
Three haplotypes had worldwide distribution: H1 [111111], H2 [211111], and H3 [122121] (Fig. 2). Of note, the frequencies of H1 and H2 were nearly reversed in the African Biaka and in Europeans and the variant alleles at sites 2, 3, and 6 usually co-occurred. Although H4 [111212] was private to Asia, it attained a frequency of 25% in the Chinese, Taiwanese of Chinese descent, and Japanese. H4 carried the deficiency allele as well as the usually co-occurring variant at site 5. The high frequency of H4 in Asia appears to have come about largely at the expense of H1. The combined frequency of H1 and H4 in Asia is 67.9%, almost identical to the 66.7% frequency of H1 in the African Biaka. The remaining haplotypes were observed in single copy only: H6 [111121] in the African Biaka, H8 [111211] in the Chinese and H9 [111112] in the Japanese.
|
Population Divergence
The variants at the six sites naturally formed three groups of
sites: site 1; sites 2, 3, and 6; and sites 5 and 12. Sites within each
group yielded nearly identical allele frequency distributions and
fixation indices. Fixation indices (Wright 1978
), or
F-statistics, measure population divergence as the among group
proportion of the total allele frequency variance. To avoid redundancy
the F-statistics are not reported individually but rather for
each group of sites. F-statistics were calculated for local populations
relative to continental average, continental average to worldwide
average, and local populations to worldwide average. In the following, S indexes local populations, C indexes continental averages, and T
indexes the worldwide average.
At site 1, allele frequency differences among local populations resulted in an FSC of 8% (Table 3). Allele frequency differences among continents resulted in an FCT of 37%. The divergence among all sub-populations (FST) was 42%. Because of the almost one-to-one correspondences between haplotype frequency and variant allele frequency, these F-statistics can also be explained in terms of the haplotype frequency variation. At site 1, the F-statistics largely reflect the H1 and H2 frequency reversal in the African Biaka and the Europeans.
|
Sites 2, 3, and 6 contrasted the frequency of H3 with H1, H2, and H4. Reflecting the low frequency variation of H3 among populations FSC was 3%, FCT was 0%, and FST was 3%. Sites 5 and 12 contrasted H4 with H1, H2, and H3. Here, FSC was 12%, FCT was 17%, and FST was 27%. These latter F-statistics were due entirely to the restriction of H4 and the deficiency allele to Asia. Treating the haplotypes as multiple alleles at a single locus, the haplotypic FSC was 5.9%, FCT was 24.4%, and FST was 28.8%. Here too the low frequency variation of H3 among populations contributed little to these values.
As the jackknife standard errors indicate (Table 3), the confidence
intervals for FSC often overlap zero but those of
FCT or FST usually do not. This
result indicates that subdivision among continents has played the more
important role in divergence of allele and haplotype frequencies. As
indicated by their standard errors, FST values for
site 1 (42%) and for sites 5 and 12 (27%) were significantly larger
than the 10%
15% FST values usually reported for
RSPs (Bowcock et al. 1991
; Jorde et al. 1995
). Such large values may be
due to random genetic drift, natural selection, or both.
Two-Site Linkage Disequilibrium Analysis
The linkage disequilibrium coefficient
(DA1B1)
compares haplotype frequency
(PA1B1)
with the product of the allele frequencies
(pA1 and
qB1). That is,
DA1B1 = PA1B1
pA1qB1 (Weir 1996
). Hereafter D is given without any subscripts
when the argument pertains to a pair of alleles at any two sites.
Because D depends on allele frequency, it was normalized to
the maximum that it could have been given the allele frequencies: D` = D/Dmax (Lewontin 1964
). It was also normalized
to the following correlation coefficient (Hill and Robertson 1968
):
|
1.0 or +1.0. This
outcome is consistent with the fact that the variability at all six
sites was essentially carried on only four haplotypes. The approximate variance of D' is 0 when D' =
1.0 or
+1.0 (Zapata et al. 1997
|
In a subdivided population, the total linkage disequilibrium
(DT) can be partitioned into additive components
using a hierarchical model (Nei and Li 1973
). Here
DT was partitioned into DW + DSC + DCT where
DW is the average linkage disequilibrium within
populations, DSC is the linkage disequilibrium among
local populations, and DCT is the linkage
disequilibrium among continents. DW,
DSC, and DCT were then normalized
to DT to obtain dW,
dSC and dCT. Interestingly, DSC and DCT depend solely on
allele frequency differences among groups (Nei and Li 1973
).
Consequently allelic divergence among populations can increase,
decrease, or leave unchanged linkage disequilibrium.
The partitioning of worldwide ALDH2 linkage disequilibrium
revealed that linkage disequilibrium within populations
(dW) usually accounted for most of the total linkage
disequilibrium (Table 5). In addition, the effect of
population subdivision was greater and more disparate among continents
(dCT) than among local populations (dSC). Specifically, dSC ranged
from just 1% to 6% whereas dCT ranged from
10% to 70%. The dCT values can be explained by
the fact that large among group values require large allele frequency differences at both sites of the two-site haplotype (Sinnock and Sing
1972
). As indicated by the jackknife standard errors, all of the
dSC and dCT estimates were
significantly different from 0. It can be concluded that population
subdivision, both among local populations and among continents, had a
significant effect on the worldwide linkage disequilibrium.
|
The within-population r2 values (computed
from Table 4) were plotted against the haplotypic
FSC and FCT values (Fig.
3). These values were then compared with Hill and
Robertson's (1968)
model of population divergence, which predicts that
linkage disequilibrium increases with FST (broken
line). The ALDH2 r2 values ranged
from well below to almost as far above the predicted line as was
possible. Clearly, the model of Hill and Robertson (1968)
did not fit
the ALDH2 data well. This result is perhaps not surprising
given that this island model assumed a large population initially in
linkage equilibrium, equality of population sizes, and the absence of
mutation and natural selection. The evolutionary history of
ALDH2 likely violates several if not all of these assumptions. Furthermore, Hill and Robertson's model assumed that the product of
effective population size multiplied by the recombination rate was
large. This is not likely for the closely linked sites surveyed here.
|
| |
DISCUSSION |
|---|
|
|
|---|
A striking pattern of ALDH2 haplotypic variation was the
maximal linkage disequilibrium and corresponding low number of
haplotypes. While the number of haplotypes that segregate at a locus
depends on historical effective population size and natural selection, combinatorics show that there are 2s possible
haplotype states from s biallelic sites. From a related perspective, the cladistic model of haplotype evolution predicts that
s + 1 haplotypes must have existed in evolutionary history to establish variability at each site. These primary haplotypes create
a network of haplotypes that differ from each other by single
mutational steps (Long et al. 1990
). Some or all of the remaining
2s
s
1 haplotype states could
exist in a population because of recombination.
At ALDH2, 26 = 64 haplotype states are possible,
but only seven states were observed and only four were frequent. At
least 6 + 1 = 7 one-step haplotypes must have existed in
evolutionary history. However, it is impossible that the seven observed
haplotypes comprise the primary set. H3 differs from H1 at three sites,
and the two intermediate one-step haplotypes are completely, or
essentially, missing. Whereas H6 provides an intermediate link at one
of the steps, only a single copy was observed and it may have arisen by
recombination. Similarly, H4 differs from H1 at two sites. Although H8
and H9, each observed in single-copy, connect H4 and H1 by single
steps, at least one of these haplotypes must have been formed by
recombination, and it is possible that both were. Thus, three of the
seven one-step haplotypes were essentially, or entirely, missing. In
humans, one-step haplotypes are frequently missing, as evidenced by
-globin (Harding et al. 1997
) and NF1 (Jorde et al. 1993
) haplotype phylogenies.
The number of segregating haplotypes at ALDH2 may be due to natural selection at ALDH2 (R.J. Peterson, D. Goldman, and J.C. Long, in prep.) or selection on a closely linked gene. A coalescent analysis of the ALDH2 haplotype phylogeny suggests that, given a neutral model, the age of the deficiency allele is expected to be 149,000 (35,000-416,000) years (R.J. Peterson, D. Goldman, and J.C. Long, in prep.). Such an ancient apparent age rivals the origin of modern humans and predates the colonization of Asia. This suggests that natural selection has increased the frequency of the deficiency allele in Asia faster than expected under a neutral model, and directional selection can reduce the number of haplotypes at a locus. Alternatively, the low number of ALDH2 haplotypes could be the result of a population bottleneck that is recent relative to the mutation rate.
Because only one African population was sampled, a complete
understanding of the African versus non-African patterns of
ALDH2 haplotype variation awaits the sampling of more African
populations. Speculatively, the fact that the African Biaka shared the
worldwide pattern of linkage disequilibrium at sites 2, 3, and 6 suggests that the pattern arose before the divergence of modern humans and has not subsequently decayed. An absence of a strong out-of-Africa effect at ALDH2 is hinted by the similarity of the H1
frequency in the Biaka with the H1 + H4 frequency in Asia.
Interestingly, while the variants at sites 5 and 12 were in complete
linkage disequilibrium, their presence only in Asia indicates a recent Asian origin and perhaps natural selection. Thus, the African versus
non-African pattern at ALDH2 may contrast with DRD2,
CD4, and DM. At these latter loci non-Africans had
higher linkage disequilibrium and segregated a subset of African
haplotypes (Castiglione et al. 1995
; Tishkoff et al. 1996
; 1998
).
However, this pattern was not as extreme at DRD2. This
suggests that an out-of-Africa effect had less effect at DRD2
than at CD4 and DM.
These distinct patterns of linkage disequilibrium have several
explanations. An out-of-Africa founder event may by chance have had
less effect at ALDH2, or natural selection could have been
stronger. In contrast to the other loci, the ALDH2 haplotypes did not comprise STRP alleles. The STRP mutation rate is several orders
of magnitude higher than the nucleotide substitution rate (Weber and
Wong 1993
). Because of this, STRPs may better resolve recent human
evolution. Whatever the explanation, these divergent results suggest
that patterns of linkage disequilibrium vary across the human genome.
In addition, the ALDH2 evidence suggests that the pattern of
linkage disequilibrium at any set of closely linked sites may depend on
the pattern of linkage disequilibrium that existed in ancestral
populations. Because each gene may have had a unique pattern of linkage
disequilibrium in ancestral populations, the effect of population
subdivision at individual genes may be idiosyncratic. This insight
contrasts with the situation of unlinked genes (Smouse and Neel 1977
;
Smouse et al. 1983
). Because unlinked genes have a low covariance of
allele and haplotype frequency, a particular pattern of allelic
divergence can occur from many different starting haplotype
distributions. Thus, while the effect of population divergence on
ALDH2 is likely to be locus dependent, the effect of
population divergence on unlinked genes is likely to be independent of
the particular set of loci studied.
The importance of ancestral linkage disequilibrium to current patterns
has recently received theoretical treatment (Lonjou et al. 1999
). These
investigators showed that for ancient polymorphisms linkage
disequilibrium is largely determined by regional founders, whereas
subsequent demography has little effect. This theory was supported by
data at the MNSs, RHCE, and CD4 loci. Of implication to
genetic epidemiologists, and contrary to current belief (Terwilliger et
al. 1998
), is that isolates may actually be less advantageous than
large populations for linkage disequilibrium studies (Lonjou et al. 1999
).
Another important insight is that patterns of linkage disequilibrium
may vary within a set of closely linked sites. At ALDH2, the
effect of population subdivision varied greatly depending on the groups
of sites that were compared. This complicated pattern reiterates that
linkage disequilibrium among populations is not a simple function of
population divergence (Nei and Li 1973
). Distinct patterns of linkage
disequilibrium were also observed among tightly linked sites in the
AI-CIII apolipoprotein gene region (Thompson et al. 1988
).
Specifically, linkage disequilibrium was found between two flanking
RSPs but not with an internal RSP. In this case, the power to detect
linkage disequilibrium was low because the major allele of each
flanking RSP occurred with the rare allele of the internal RSP
(Thompson et al. 1988
). These results suggest that patterns of linkage
disequilibrium in a gene region may not be fully described by analyzing
a single pair of sites. Rather, the proper characterization of linkage
disequilibrium may require the examination of alleles at several sites.
This same conclusion was reached in a linkage disequilibrium analysis of 88 variable sites in the human lipoprotein lipase gene (Clark et al. 1998
).
Population subdivision clearly affected the worldwide pattern of
ALDH2 linkage disequilibrium. Linkage disequilibrium among local populations and among continents was significantly different from
zero. The magnitude of this effect was greater among continents than
among local populations. The present study augments the original one-level model of population subdivision (Sinnock and Sing 1972
; Nei
and Li 1973
) by extending it to a second level. Because it was found
that the effects of population subdivision were greater among
continents than among local populations, this extension represents an
important advance in resolution. Moreover, the fact that linkage
disequilibrium among local populations was statistically significant
suggests a cautionary note to the genetic epidemiologist considering
mixing local populations to fine map disease genes.
Hill and Robertson's (1968)
model did not fit the data well. This
observation suggests that simple models do not provide a reasonable
framework for understanding worldwide linkage disequilibrium at
ALDH2. Violating the assumptions of the model, ALDH2
linkage disequilibrium was likely maximal in a finite-sized ancestral population. Further, natural selection may have acted (R.J. Peterson, D. Goldman, and J.C. Long, in prep.), mutation is evident, human population sizes have not been equal (Urbanek et al. 1996
), and the
hierarchy of human populations is not balanced (Nei and Roychoudhury 1993
). This suggests that other simple models, such as Ohta's (1982a
,b
) partition of the variance of linkage disequilibrium, will
also not fit the data well.
The extension of the model of Nei and Li (1973)
to two levels provided
valuable insights that would have been missed with a one-level
partition. However, it can also be concluded that this simple two-level
partition was inadequate to fully describe the effects of human
demographic history on ALDH2 linkage disequilibrium. Perhaps
the crucial improvement is to model unequal rates of evolution and a
realistic human population phylogeny (Nei and Roychoudhury 1993
;
Urbanek et al. 1996
). The emergence of fine-scale haplotype data for
many genes in many populations is likely to provide continuing impetus
to incorporate population subdivision into coalescence models of
linkage disequilibrium (Rannala and Slatkin 1998
).
| |
METHODS |
|---|
|
|
|---|
Molecular Methods and Population Samples
ALDH2 is defined as the segment of DNA from which
aldehyde dehydrogenase 2 is transcribed (Fig. 1). The six
sites analyzed here are a subset of the 12 sites reported by Peterson
et al. (1999)
. The six sites not included in this analysis had only a rare variant, and rare variants are uninformative of the effects of
population subdivision on linkage disequilibrium. Site 1 was discovered
by M. Stewart (pers. comm.). Sites 2, 3, 5, and 6 were discovered by
Peterson et al. (1999)
. Site 12 is the site that defines the well-known
Glu-487-Lys deficiency allele (Yoshida et al. 1984
). PCR, restriction
enzymes, and SSCP methods were used to genotype the variable sites.
Genotypes were collected on a worldwide sample consisting of Africa, 51 Biakans; Asia, 24 Cambodians, 47 Han Chinese, 49 Japanese, 40 South
Koreans, 43 Taiwanese, and 50 Black Thai; Europe, 32 Ceph, 41 Finns, 45 Swedes; North America, 51 Cheyenne, 50 Maya, 46 Navajo, 45 Pima; and
South America, 49 Karitiana, 44 Rondonian Surui, and 49 Ticuna. Samples
were provided and donated by a variety of researchers (Peterson et al. 1999
).
Statistical Analyses
For each site, the allele with the higher worldwide frequency was assigned to be the reference allele. Because phase-unknown multi-site genotypes were collected, haplotype states and frequencies were estimated by maximum-likelihood using an expectation-maximization (E-M) method (Dempster et al. 1977
2 test for departure from
single-site Hardy-Weinberg expectation (Weir 1996
|
|
|
|
|
|
FST)=5(1
FST)3
(1
FST6]
15(Hill and Robertson 1968
FST) = (1
FSC)(1
FCT) (Wright 1978| |
ACKNOWLEDGMENTS |
|---|
We thank Longina Akhtar for maintaining the Laboratory of Neurogenetics cell lines. Ken Kidd, Su-Jen Tsai, Dr. Chandanayingyong, and Dr. Park kindly provided DNA samples. Mark Stewart provided details on the variant at site 1. Margrit Urbanek, Andrew Bergen, and Jaakko Lappalainen contributed invaluable advice on laboratory techniques and useful discussions. Ken Weiss, Andy Clark, Mark Stoneking, and Henry Harpending provided helpful comments on earlier drafts. We are grateful to two anonymous reviewers for insightful comments. This work was supported by a National Institute of Alcohol Abuse and Alcoholism predoctoral Intramural research training award to R.J.P.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL peterson{at}ncifcrf.gov; FAX (301)846-1909.
| |
REFERENCES |
|---|
|
|
|---|
Received January 7, 1999; accepted in revised form June 29, 1999.
This article has been cited by other articles:
![]() |
D. Goldman, G. Oroszi, and F. Ducci The Genetics of Addictions: Uncovering the Genes Focus, August 1, 2006; 4(3): 401. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Tiret, O. Poirier, V. Nicaud, S. Barbaux, S.-M. Herrmann, C. Perret, S. Raoux, C. Francomme, G. Lebard, D. Tregouet, et al. Heterogeneity of linkage disequilibrium in human genes has implications for association studies of common diseases Hum. Mol. Genet., February 1, 2002; 11(4): 419 - 429. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Wu and Z.-B. Zeng Joint Linkage and Linkage Disequilibrium Mapping in Natural Populations Genetics, February 1, 2001; 157(2): 899 - 909. [Abstract] [Full Text] |
||||
![]() |
H.G. Koch, J. McClay, E.-W. Loh, S. Higuchi, J.-H. Zhao, P. Sham, D. Ball, and I. W. Craig Allele association studies with SSR and SNP markers at known physical distances within a 1 Mb region embracing the ALDH2 locus in the Japanese, demonstrates linkage disequilibrium extending up to 400 kb Hum. Mol. Genet., December 1, 2000; 9(20): 2993 - 2999. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||