|
|
|
|
Vol. 9, Issue 6, 558-567, June 1999
LETTER
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The idea that all modern humans share a recent (within the last
150,000 years) African origin has been proposed and supported on the
basis of three observations. Most genetic loci examined to date have
(1) shown greater diversity in African populations than in others, (2)
placed the first branch between African and all non-African populations
in phylogenetic trees, and (3) indicated recent dates for either the
molecular coalescence (with the exception of some autosomal and
X-chromosomal loci) or for the time of separation between African and
non-African populations. We analyze variation at 10 Y chromosome
microsatellite loci that were typed in 506 males representing 49 populations and every inhabited continent and find significantly
greater Y chromosome diversity in Africa than elsewhere, find the first
branch in phylogenetic trees of the continental populations to fall
between African and all non-African populations, and date this
branching with the (
µ)2 distance measure to
5800-17,400 or 12,800-36,800 years BP depending on the mutation rate
used. The magnitude of the excess Y chromosome diversity in African
populations appears to result from a greater antiquity of African
populations rather than a greater long-term effective population size.
These observations are most consistent with a recent African origin for
all modern humans.
| |
INTRODUCTION |
|---|
|
|
|---|
For the last 10 years, human population genetics
has focused intently on the question of modern human origins. Most
geneticists have considered two opposing hypotheses. Both agree that
Homo erectus was the first species in our lineage to leave
Africa for Europe and Asia
sometime within the last 2 million
years
but the models disagree about what happened next. One (the
multiregional or candelabra theory) posits that modern humans evolved
simultaneously from the descendants of Homo erectus throughout
the Old World
synchronized, perhaps, by some amount of gene flow among
archaic populations (Wolpoff et al. 1984
). The other suggests that
anatomically modern humans evolved in Africa within the last 150,000 years, before supplanting archaic populations in Europe and Asia. This
has become known as the "out of Africa" hypothesis and is
associated primarily with studies of mitochondrial DNA (mtDNA) and the
so-called "African" or "Mitochondrial Eve." Only recently has
this view of a recent African origin gained full acceptance and
independent verification from paleontologists and archeologists
(Stringer and Andrews 1988
; Lahr 1996
; Foley 1998
).
Three lines of evidence favoring a recent African origin have emerged
from studies of most genetic systems. First is the observation of
greater genetic diversity in Africa than elsewhere. It is reasonable to
assume that older populations have had more time to accumulate genetic
variation, although variation within populations is affected by many
factors in addition to age
most notably, fluctuations in population
size. With the exception of classical protein polymorphisms and
restriction fragment length polymorphisms (RFLPs) (Cavalli-Sforza et
al. 1994
), elevated genetic diversity in African populations has been
documented for most other genetic systems: mtDNA (Vigilant et al.
1991
), autosomal microsatellites (Bowcock et al. 1994
; Jorde et al.
1997
), an autosomal minisatellite (Armour et al. 1996
), and various
other autosomal systems (Batzer et al. 1994
; Tishkoff et al. 1996
). The
tendency for nuclear RFLPs and classical polymorphisms to display
greater diversity in Europe has been adequately explained by an
ascertainment bias (Mountain and Cavalli-Sforza 1994
; Rogers and Jorde
1996
); most of these markers were discovered in samples of European
origin, ensuring that they would be maximally polymorphic in Europeans.
A clear picture of the geographic distribution of Y-chromosomal
variation has yet to emerge and be rigorously tested. Jorde et al.
(1998)
have observed greater Y-chromosomal variation in Asia. Hammer et
al. (1997)
have observed greater Y haplotypic diversity in Africa for
five biallelic polymorphisms and observe greater Asian diversity in
some Y chromosome lineages (Hammer et al. 1998
). The availability of
microsatellites has allowed the first tests of the significance of
excess Y-chromosomal variation in African populations.
Phylogenetic analyses provide the second line of evidence for an
African origin. Beginning with the study of Cann et al. (1987)
, nearly
every study of human mtDNA has presented a tree whose first branch
separates African from non-African populations
just as expected if all
non-African populations are descendants of an African one. Similarly,
trees constructed from autosomal markers
classical polymorphisms
(Cavalli-Sforza et al. 1994
), autosomal RFLPs (Bowcock et al. 1991
),
and autosomal microsatellites (Bowcock et al. 1994
)
have been
consistent in placing the first split between African and non-African
populations and generally agree on the placement of subsequent branches
as well.
However, the competing theories of modern human origins both posit an
African origin, and estimates of the timing of our descent from Africa
comprise the third, most discriminating line of genetic evidence. Dates
for the mitochondrial coalescence time center around 150,000 years BP.
Two useful estimates have been made for the Y chromosome (Hammer 1995
;
Whitfield et al. 1995
). Both dates are recent, which would appear to
exclude a multiregional origin.
mtDNA
The most compelling genetic evidence has come from the study of
mtDNA, which has a high mutation rate and does not recombine. Under
most reasonable models of population structure, the molecular coalescence of a nonrecombining molecule should antedate the actual diversification of populations (Cann et al. 1987
). Multiregional evolution could be excluded if the mitochondrial coalescence occurred within the last 1,000,000 years or so. Although technical problems afflicted the earliest estimates, many subsequent studies have put the
human mitochondrial coalescence to within the last 250,000 years (Cann
et al. 1987
; Vigilant et al. 1991
; Ruvolo et al. 1993
; Horai et al.
1995
; Zischler et al. 1995
).
An independent approach to the analysis of mtDNA data has been taken
with the work of Rogers and Harpending (1992)
and Harpending et al.
(1993)
. Their method seeks to extract demographic information from the
distribution of mtDNA mismatches within populations. The data suggest
that the major human population groups split from each other
~100,000 years ago but did not begin to expand in size until several
tens of thousands of years later. The reliability of these mismatch
analyses is still a matter of concern (Marjoram and Donnelly 1994
).
However, they correspond closely with archeological evidence indicating
that the modern human anatomy developed in Africa some tens of
thousands of years before the major cultural changes that allowed the
phenomenal expansion and spread of modern humans into the rest of the
Old World only within the last 70,000 years or so (Klein 1995
).
Further doubt has been cast on the theory of multiregional evolution
(at least in Europe) by the determination of mtDNA D-loop sequence from
a Neanderthal bone (Krings et al. 1997
). The Neanderthal sequence is
quite different from all known human sequences, suggesting that few (if
any) Neanderthal mtDNA lineages will be found in modern European
populations. Given the difficulty of analyzing DNA of this antiquity,
however, we may never be able to exclude entirely the possibility that
some Neanderthal gene lineages survive among modern humans (Nordborg 1998
).
Autosomal Evidence
Considerable evidence in favor of a recent African origin has also
been found in genes of the nucleus. However, the autosomes are more
ambiguous in their support of a recent origin for all modern humans,
because recombination complicates coalescent analyses of genetic
variation; coalescence times are expected to be four times greater, on
average, than for the mitochondrial genome; and variation is less
abundant than for mtDNA. At the same time, there are many independent
loci on the autosomes, and a composite view from several loci is likely
to be more informative than the single loci of mtDNA and the Y
chromosome. The earliest studies of nuclear DNA to support a recent
African origin were the surveys of RFLPs performed by Luca
Cavalli-Sforza and his collaborators (Mountain et al. 1993
). Like some
studies of classical markers before them, the RFLPs placed the deepest
split between African and non-African populations (consistent with an
African origin). The RFLP data also allowed the assignment of a crude
date to the divergence of African and non-African populations by
regressing genetic distances among populations onto archeologically
derived dates for the time of the first arrival of modern humans on
each continent. The fit of this regression is surprisingly linear and indicates (by extrapolation) a date of 100 kya for the split between Africans and non-Africans (Bowcock et al. 1991
; Mountain et al. 1993
).
The weakness of this approach (as also for dates based on mtDNA) is the
need for an outside reference to calibrate the rate of genetic
divergence (archeological dates in the case of autosomal RFLPs and
estimated human-chimp divergence times for mtDNA). More recent work
with autosomal microsatellites has allowed an independent determination
of the tree topology and the assignment of dates that rely only on
estimates of the microsatellite mutation rate and not on comparisons
with external events (Goldstein et al. 1995b
). Using this approach,
Goldstein et al. (1995b)
dated the split between African and
non-African populations at 156,000 years BP. Approaches based on
genetic distances, though, may be afflicted by admixture among
populations that would reduce the genetic distance between them and
lead to underestimates of the divergence times.
Another approach has been taken by analyzing the geographic
distribution of diversity at a minisatellite locus (Armour et al.
1996
). The non-African populations contained a very restricted subset
of the allelic diversity seen in Africa. Estimates of the age of the
African versus non-African split from this locus, however, were around
15,000 years BP
far earlier than conceivable. Homoplasy (a problem
with microsatellite and minisatellite loci) as well as mutation rate
heterogeneity (e.g., the potential dependence of mutation rate on
allele size) and genetic admixture can probably explain the
disagreement between this date and others. Tishkoff et al. (1996)
took
a similar tack, based on the decay of linkage disequilibrium between
two closely linked markers on chromosome 12. Again, much greater
haplotypic diversity was found in Africa than outside that continent.
By comparing levels of linkage disequilibrium in African and
non-African populations, these authors estimated the age of chromosomes
outside Africa. Their result was 102,000 years BP with an upper limit
of 313 kya, although some details of their approach have been
criticized (Pritchard and Feldman 1996
; Slatkin and Rannala 1998
).
A few coalescent analyses of autosomal DNA sequence variation are now
appearing. Harding et al. (1997)
have chosen to describe an analysis of
349 geographically diverse
-globin gene sequences as contradicting
a recent African origin. Nevertheless, the estimated coalescence date
for the
-globin locus of ~800,000 years ago is in excellent
agreement with mitochondrial coalescence times, when we consider that
the average coalescence time for autosomal loci should be about four
times greater. These authors find some evidence for an Asian (as well
as African) contribution to modern human allelic diversity, but this
assertion requires closer examination. Natural selection in response to
malaria has had a major impact in shaping
-globin diversity in
Africa and parts of Asia and Europe. Clearly, more sequence data from
many more autosomal genes will be required to evaluate the likelihood
of admixture between populations of modern and archaic humans as the
former spread from Africa.
The analysis of X-chromosomal loci has not greatly clarified the
picture provided by nuclear genetic variation. Conflicting conclusions
have been drawn from two recent studies (Zietkiewicz et al. 1998
;
Harris and Hey 1999
). Nucleotide polymorphisms in an 8-kb segment
surrounding exon 44 of the dystrophin gene appear to be evolving
neutrally (Zietkiewicz et al. 1998
). As a result, their allele
frequencies are equivalent, in most cases, to their age. Alleles that
appear >100,000-200,000 years old are generally found at similar
frequencies in African and most non-African populations. Younger
alleles display much more restricted geographic distributions, strongly
supporting the notion that all modern populations descend from an
ancestral population that existed as recently as 100,000 years ago. An
estimate of the long-term effective population size is ~10,000,
consistent with estimates from other loci. Zietkeiwicz et al. (1998)
suggest that a population with an effective size anywhere near 10,000 would have difficulty evolving independently (multiregionally) into
modern Homo sapiens over such wide geographic expanses.
Furthermore, the dystrophin gene displays greater African diversity,
although much of this diversity appears to be somewhat recent. This
relatively young, low-frequency variation might indicate that the
African population size has been larger or began to expand earlier than
other continental populations.
The geographic patterns of variation in the PDHA1 gene studied
by Harris and Hey (1999)
are quite unusual and deserve further analysis, although conclusions based on present data appear seriously premature. A fixed nucleotide difference between African and
non-African populations was observed, which forms the basis for many of
the authors' conclusions. Because only 35 chromosome were analyzed, however, most assertions are somewhat tentative. The picture of genetic
variation presented by most loci examined to date suggests that modern
human populations living outside Africa descend from a limited number
of African populations that had already begun to diversify (Armour et
al. 1996
; Tishkoff et al. 1996
). When only 16 African chromosomes have
been sampled, it is hard to preclude the possibility that both alleles
are present in Africa or other continents like Europe, which is
represented by only 6 French X chromosomes. The action of natural
selection (perhaps via "hitchhiking") is detected in sweeping the
non-African allele to apparent fixation, although the authors
nevertheless attribute the extreme allele distributions to strongly
subdivided ancestral populations. Although noting that admixture among
populations will cause dates based on genetic distance among
populations to underestimate the time of their actual fission,
the authors fail to note that their estimates of coalescence times
(which have very broad confidence intervals) must overestimate
the time of population fission by an unknown amount. The events of
interest are likely to fall within the dates estimated by coalescence
times of alleles and genetic distances among populations. Because a
fixed nucleotide difference among continents could not be maintained if
admixture among the continents were high, the results of Harris and Hey
(1999)
might suggest that the time of population fission is more
reliably estimated by genetic distance measures than by coalescence times.
The Y Chromosome
As the paternally transmitted counterpart to mtDNA, the Y chromosome
has attracted great interest. It has provided a unique challenge as
well. Unlike mtDNA, its mutation rate is very low, and variation has
been exceedingly hard to find. Conventional searches for RFLPs
encountered little success (Casanova et al. 1985
; Lucotte and Ngo
1985
). The first single nucleotide polymorphism to be identified on the
Y chromosome was described only in 1994 (Seielstad et al. 1994
). One
study found no variation in a 729-bp intron of the ZFY gene in 38 human
samples from throughout the world (Dorit et al. 1995
). Other work has
allowed estimates of the Y chromosome coalescence time to be made.
Hammer's (1995)
study, based on three polymorphic sites assayed in 18 individuals, indicates a coalescence time of 188,000 BP with a
confidence interval stretching from 51 kya to 411 kya. Whitfield et al.
(1995)
assayed a different set of three polymorphisms in five
individuals and calculated a coalescence time of between 37,000 years
and 49,000 years BP. As indicated by the large confidence interval and
the discordance between the two Y chromosome studies, the number of polymorphic sites and individuals needs to be increased substantially. Since these studies were published, more efficient techniques for
detecting polymorphisms have been developed (Underhill et al. 1997
),
and more definitive estimates of the Y chromosome coalescence time
derived from nucleotide sequence information will soon be available.
In this paper we examine the ability of Y chromosome microsatellites to
discriminate between the predictions of the two rival theories of
modern human origins. The statistical significance of the observed
excess of Y chromosome microsatellite diversity in African populations
is tested following the approach of Jorde et al. (1997)
.
Significantly greater diversity in African populations is observed.
The magnitude of this excess is greater than reported for autosomal
microsatellites using many of the same population samples (Jorde et al.
1997
). As explained below, this is more likely to result from an
early population expansion among African populations than from
substantial differences in Ne among populations. A
phylogenetic analysis based on the (
µ)2 distance
(among others) also supports an African origin by placing the first
split between African and all non-African populations. However,
attempts to date the split between African and non-African populations
or estimating the length of time over which the observed levels of microsatellite diversity have been accumulating
are more
difficult tasks. (
µ)2 is a linear distance
measure, and knowledge of the microsatellite mutation rate allows
the date of any split in the tree to be estimated (Goldstein et al.
1995b
). Applying this method to the Y chromosome microsatellites
results in very recent dates for the split between African and
non-African populations. On the whole, Y chromosome microsatellites
are consistent with a recent African origin for modern humans and tend
to agree with the results of studies of mtDNA and the autosomes.
| |
RESULTS |
|---|
|
|
|---|
Table 2, A, B, and C (below) reports the average gene diversities and the results of the test for excess genetic diversity (described in Methods). The tests were performed on three different continental groupings. Initially, six regional groups were identified: Africa, Asia, Pakistan, Europe, Oceania, and America (Table 1). This scheme resulted in some groups with very few chromosomes and might have been biased toward dividing non-African populations into inappropriate subpopulations (e.g., considering Pakistan separately from Europe or Asia). For this reason, the test was repeated on an alternative grouping that was designed to minimize the differences in sample size, while increasing the variance of non-African populations in a way that made some geographic and historical sense. Ethiopians, who have experienced admixture with populations in Western Asia, were split off from sub-Saharan Africans along with the Beja from Sudan and Tuareg from Mali. These populations were grouped instead with Europeans and Pakistanis. A third group encompassing populations from Asia, the Americas, and Oceania was also formed. The results for this grouping are reported in Table 2B. The final arrangement (Table 2C) was designed to be extremely conservative and identified only two groups: sub-Saharan Africans versus all other populations (including Ethiopians). Raw data are available at http://www.stats.ox.ac.uk/~pritch/ydata.html.
|
|
S.E.s surrounding estimates of gene diversity are fairly
large [calculated according to Nei (1987)
], reflecting the small number of loci currently available. The results of the first grouping reported in Table 2A indicate higher gene diversity (H) in
Africans than the other continents, although this difference is not
significant given the large sampling variance. The observed S
was 1.76, suggesting an excess of genetic diversity in Africa of 76%
relative to the other continents (see Methods). As shown in Table 2B,
sub-Saharan African populations continued to exhibit greater variance
in repeat score in a more conservative classification scheme. Not
unexpectedly, the African excess was lower (39%). The largest value
for S in >100,000 random replicates was 1.24, indicating
P < 10
5. Finally, Africans continue to show
a significant excess of diversity in even the most conservative
classification reported in Table 2C. A similar excess in gene diversity
is not observed in either of the latter two classifications, although
none of these differences is significant.
The UPGMA tree depicted in Figure 1 places the root
between African and non-African populations, like a great number of
trees before it. Although it is frequently criticized, the reliability of average linkage in reconstructing evolutionary relationships is well
established (Cavalli-Sforza et al. 1994
), particularly when
evolutionary rates are approximately constant among populations. One
reason for its good performance may relate to its calculation of
average distances between each taxon added to the tree and those that
have already been added
an approach that may minimize the errors
associated with genetic distances between particular pairs of populations.
|
The matrix of (
µ)2 distances and S.E.s
estimated from 10,000 bootstrap replicates are reproduced in Table 3.
The average distance of all non-African populations to the African
population is 1.029 with a 95% confidence interval of 0.515-1.546.
Applying equation 1 (in Methods) and a microsatellite mutation rate of 1.2 × 10
3 mutations per generation (Heyer et al.
1997
; Bianchi et al. 1998
) yields an estimate of 429 generations since
the split of all non-African populations from the African population.
If the human generation time is 27 years (Weiss 1973
), this corresponds
to a date of 11,600 years BP. The 95% confidence interval is
5800-17,400 BP. If a lower mutation rate of
5.6 × 10
4 is used as suggested by Weber and Wong
(1993)
(and still within the confidence interval of Heyer et al.'s
estimate), a date of 24,800 ± 12,000 is derived.
| |
DISCUSSION |
|---|
|
|
|---|
There are several possible reasons for an excess of genetic diversity in African populations: African populations are older and have been accumulating genetic variation for a longer period of time, African populations have maintained a higher long-term effective population size, gene flow into Africa has been higher than into other continents, and population subdivision has been greater outside Africa. These last two possibilities are easily discounted. If gene flow into Africa from the other continents was sufficient to elevate African genetic diversity, then the smallest genetic distances should be found among African and non-African populations. We observe the opposite phenomenon, with the greatest genetic distances occurring between African and all non-African populations. We would also anticipate a very short branch leading to the African population in a neighbor-joining tree, which is not the case (Fig. 2). Extensive population subdivision outside Africa can be excluded by noting that, although genetic diversity within any subpopulation may be reduced, the genetic variation across the entire collection of subdivided populations should be very high. The data of Table 2C grouping 177 African chromosomes against 329 chromosomes from all the other continents combined demonstrate a significantly higher variance in repeat number for African populations than for all others. Non-African populations appear as if they are but a sample of African genetic diversity. Even if greater genetic variation were eliminated in non-African populations through a demographic crisis, one would not expect to find similar variants eliminated from populations as widely separated as Basques in the Pyrenees and Quechua in the Andes. Natural selection could remove particular variants from populations, but why would it leave Africa alone unaffected?
|
It is more difficult to distinguish between an older African population and a larger long-term effective population size. Doing so depends on whether the microsatellites have reached mutation-drift equilibrium. At mutation-drift equilibrium, the variation within a population is proportional only to n (the effective population size) and µ (the mutation rate, which is probably the same in all populations), so differences in levels of variation should depend only on the relative effective population sizes. Before mutation-drift equilibrium is reached, however, a population's age will influence the amount of genetic variation it displays. Variation will accumulate in direct proportion to time and the mutation rate.
Equilibrium cannot be rejected with a small number of loci. The
variance of the variance of repeat number among loci is high, requiring
huge numbers of loci (or loci with low variances) to reject
mutation-drift equilibrium (Goldstein et al. 1996
). However, even if we
cannot formally reject mutation-drift equilibrium with the limited data
we have today, we should note that the attainment of equilibrium is
very slow
roughly equal to the reciprocal of the mutation rate in
generations. For the estimated mutation rates we have used, this would
range from 22,500 years to 48,000 years, although this should be
regarded as a minimum estimate because the time necessary to reach
equilibrium would be lengthened by population subdivision and
fluctuations in population size. Most non-African populations appear
unlikely to be much older than 48,000 years, so we might be justified
in assuming that differences in the age of populations have not been
completely erased by the attainment of equilibrium.
Two observations might lead us to conclude that the fact of greater
genetic diversity in Africa is not purely the result of a larger
long-term effective population size. The smaller a population, the greater the effect of genetic drift in changing gene frequencies. Drift would tend to differentiate populations, increasing genetic distances between them and lengthening branches in trees relating the populations. This is not what we observe (Figs. 1 and 2). If
non-African populations were smaller and drifting to a greater extent, they would appear more dissimilar from each other rather than displaying the similarity that is a feature of most
trees
most notably of trees constructed from mitochondrial DNA, which
should experience increased drift as a result of their decreased
effective population size relative to the autosomes.
The second argument, suggested by Jorde et al. (1997)
, receives further
support from the study of Y chromosome microsatellites. Citing Li
(1977)
and Slatkin (1995)
, Jorde et al. (1997)
note that diversity for
an expanding population (and analyses of mitochondrial and
Y-chromosomal DNA indicate that most major populations have been
expanding) is proportional to n + 2t for mtDNA and
the Y chromosome, whereas it is proportional to
2n + t for autosomal loci with a fourfold greater
effective population size. t is the time following the
population expansion. The relative excess of African diversity appears
to be much greater for mtDNA and the Y chromosome than for autosomal
microsatellites. From the equations above, this observation suggests
that African populations began to expand before the others and suggests
that African populations have not been larger than non-African ones.
This is exactly the model of modern human origins for which Klein
(1995)
finds the strongest support. The modern human morphology appears
in Southern and Eastern Africa ~150,000 years ago, whereas
evidence for the behavioral transition to modern humans is not found
until ~50,000 years ago when modern humans appear to have begun
leaving Africa. Thus, African populations may have been expanding and
accumulating genetic variation for tens of thousands of years before a
subset of that continent's population began to expand and colonize the rest of the world.
This view is supported by the application of (
µ)2 to
date the separation time between African and non-African populations. The dates calculated using this method would certainly seem to preclude
multiregional evolution or an ancient ancestry for modern humans.
However, the dates are too recent to be trusted entirely. Recent
archeological results suggest that modern humans may not have attained
behavioral modernity and left Africa until as recently as 50,000 years
ago (Klein 1995
), but the dates derived from Y chromosome
microsatellites do not go beyond 40,000 years.
It is possible to imagine several explanations for the extreme recency
of these dates. The small number of loci available (n = 10)
and the fact that the Y chromosome itself is effectively a single locus
may produce large stochastic errors around the estimate. Admixture
among populations will reduce the genetic distance between them. Some
such admixture has undoubtedly occurred, but it is probably not of
sufficient magnitude to produce the entire effect. The application of
(
µ)2 to 30 autosomal microsatellites resulted in
dates of 155,000 years, which is in very good agreement with other data
and does not indicate the effect of significant admixture (Goldstein et al. 1995b
).
(
µ)2 is linear and proportional to time regardless
of population size but only if the populations are at mutation-drift
equilibrium. Because some of the populations used in the calculations
may not have reached equilibrium (as noted in the previous discussion), this too may affect the accuracy of estimated dates. Homoplasy is
another limitation with loci that have a high mutation rate, and it may
affect the linearity of (
µ)2. An average of six
repeat units separated the extreme alleles for single-copy loci. This
suggests (
µ)2 will remain linear up to a maximum
distance of 5.8 (Goldstein et al. 1995a
).
Failure of these microsatellites to adhere to the stepwise mutation
model assumed by the (
µ)2 distance could also affect
the estimates. We have direct evidence of a mutation rate dependence on
repeat length and a constraint on maximal allele size (X. Xu, M.T.
Seielstad, and X. Xu, unpubl.). With these caveats in mind, we should
emphasize that dates estimated by (
µ)2 are not
coalescent dates. Coalescence dates are estimates of the time to a
molecular ancestor, which must precede the diversification of
populations. Estimates from (
µ)2 come closer to
estimating the actual divergence times, unless admixture among
populations is extensive
in which case, the date will underestimate
the parameter.
A recent African origin for all humans is supported by a robust corpus of evidence. This evidence takes three major tracks: (1) increased genetic diversity in Africa versus the rest of the world, (2) phylogenetic analyses placing the deepest branches between African and non-African populations, and (3) indications that the age of the "genetic most recent common ancestor" is very young. The Y chromosome, like mtDNA and the autosomes, appears to support a recent African origin. Although no locus taken alone would be sufficient to demonstrate support for the out of Africa theory, the consistency of so many loci in supporting a recent African origin comprises a compelling case. Perhaps we can begin to use the genetic data that are so rapidly accumulating to answer more interesting questions about the forces that have generated and maintained the geographic patterns of genetic diversity we observe today.
| |
METHODS |
|---|
|
|
|---|
Populations
Samples from the Hadendowah and Beni Amer tribes of the Beja were
collected in and around Kassala, Sudan, in January 1994 and the Dinka
from settlements near Khartoum. Eleven populations from Ethiopia were
sampled in March and April 1995: Konso from Konso town, Tsamako and
Ongota near Weyto, Hamar near Turmi, Dasenech in Omorate, Dizi and
Surma near Maji, Bume (or Nyangatom) at Kibish, Bench in Mizan Teferi,
Majangir near Tepi, and Berta near Kurmuk. Collections from five
populations in Mali were made in February 1996: Dogon and Peulh near
Bandiagara (14.361°N, 3.647°W), Bozo in Mopti, Tuareg near
Timbuktu (16.661°N, 3.240°W), and Songhai near Gao (16.287°N,
0.044°W). Australian Aborigine and New Guinean DNA samples are
described by Stoneking et al. (1990)
. Samples from Nasioi Melanesians
were collected by Dr. Jonathan Friedlaender in Bougainville, Solomon
Islands. Dr. Judy Kidd and her collaborators provided DNA from several
Taiwanese Aboriginal and Native American populations (primarily from
South America): Karitiana, Mayan, Moskoke, Quechua, Surui, and Ticuna.
DNA samples from South African populations (Khoisan, Pedi, Sotho,
Swazi, Tswana, Xhosa, and Zulu) are described by Spurdle and Jenkins
(1992)
and from Pakistan by Dr. Qasim Mehdi [Baluchi
(30.5°N,66.5°E), Brahui, Hunza (Burushaski speaking), Pathan
(33.5°N, 70.5°E), and Sindhi (25°N, 69°E)]. The remaining
DNA samples were extracted from EBV-transformed B cell lines as
described in Bowcock et al. (1987)
. Samples from northern Italians are
described in Matullo et al. (1994)
. Pygmy and Lissongo samples were
collected from the Central African Republic and Zaire. Chinese,
Japanese, and Northern Europeans (mostly German) samples were collected
from immigrants to the San Francisco Bay Area. Cambodian samples are
from Khmer living in Santa Ana, California. Y chromosome microsatellite
genotype data for the Basque and Catalan populations were published by
Perez-Lezaun et al. (1997)
.
DNA Extraction
DNA for samples collected in Ethiopia, Sudan, and Mali was
extracted from 5 ml of whole blood drawn in EDTA anticoagulant. Extraction was begun in the field as quickly as possible (but always
<2 days after collection) using a "salting out" procedure modified from Miller et al. (1988)
. Initially, whole blood was centrifuged at 1200g for 10 min to separate plasma that was
discarded before adding 10 ml of chilled (where possible) red cell
lysis buffer (RCLB: 1 mM NH4HCO3 and 115 mM NH4Cl). In subsequent procedures, 10 ml of RCLB
was added directly to 5 ml of whole blood without first discarding
plasma. After gentle mixing, white cells were collected by
centrifugation at 1200g for 10 min and washed as necessary
with additional RCLB followed by centrifugation. When clean, the cell
pellet was lysed in 3 ml of white cell lysis buffer [WCLB: 100 mM Tris-Cl at pH 7.6, 40 mM EDTA at pH 8.0, 50 mM NaCl, 0.2% SDS, and 0.05% NaN3 (to inhibit
microbial growth)]. At this stage, DNA is sufficiently stabilized to
allow storage at room temperature with little additional loss. Final
extraction and purification in the laboratory continued with an
optional proteinase K digestion (10 µl of 20 mg/ml of ProK).
Proteins were precipitated by the addition of one-third
volume-saturated NaCl (~6 M) followed by centrifugation at
12,000 rpm for 10 min. The supernatant was collected, and DNA was
precipitated by adding two volumes of absolute ethanol and centrifuging
at >12,000 rpm for 15 min. The DNA pellet was washed in 70%
ethanol, dried, and resuspended in 100 µl of TE before fluorimetric
or spectrophotometric quantitation and dilution.
Y Microsatellites
Eight sets of primers were used to amplify 10 microsatellite loci
on the Y chromosome. A total of 506 chromosomes were analyzed in 49 populations from every inhabited continent. With the exception of
DYS19, all primers are available with the indicated dye labels from
Research Genetics (Huntsville, AL): DYS385-FAM, DYS388-FAM, DYS389-FAM,
DYS390-FAM, DYS391-FAM, DYS392-HEX, and DYS393-HEX. Many of the primers
can be coamplified, although no consistent multiplexing scheme was used
in this study. Two of the primer sets typically produced two bands
(DYS385 and DYS389). It is known that the two bands of DYS389
correspond to two discrete repeat units, whereas DYS385 produces two
bands of similar size. In many of the analyses, the two bands of DYS385
and DYS389 were treated as if from separate loci. A complete
description of all loci can be found in Kayser et al. (1997)
.
PCR Amplification and Allele Size Determination
Microsatellite loci were amplified by PCR in 5-µl reactions. Reactions were "hotstart" using Taqstart (Clontech) anti-Taq antibody. Reactions consisted of 0.5 µl of 10× PCR buffer (Boehringer Mannheim); 0.1 µl of each labeled primer (8.5-10 µM); 0.1 µl of each unlabeled primer (8.5-10 µM); 0.8 µl of a dNTP mix (1.25 mM dATP, 1.25 mM dCTP, 1.25 mM dTTP, and 1.25 mM dGTP); 0.05 µl of Taq DNA polymerase (5 U/µl) (Boehringer Mannheim); 0.05 µl of Taqstart anti-Taq polymerase antibody (5 U/µl) (Clontech); 0.2 µl of Taqstart antibody buffer (Clontech); 0.3 µl of MgCl2 (25mM; for a final concentration of 1.5 mM); 1.0 µl of template DNA (10-50 ng/ml); and distilled water to a final volume of 5 µl.
A "touchdown" cycling regime was used, beginning with 14 cycles of
successive 0.5°C decreases in annealing temperature
from 63°C to
56.5°C. Twenty cycles with a constant 56°C annealing temperature followed. Each step lasted for 30 sec, and a denaturing step of 94°C
and extension step of 72°C were used for all cycles. Samples were
incubated at 72°C for 4 min following completion of the final cycle.
PCR products were diluted with water and run on an ABI373a DNA sequencer with GS-350 or GS-500 dye-labeled size standard. Allele sizes were determined with ABI's GS Analysis software. Data are available at http://www.stats.ox.ac.uk/~pritch/ydata.html.
Diversity Among Populations
Several measures of genetic variation are available. Gene diversity
(average heterozygosity) is one, although a more suitable quantity for
microsatellites obeying a stepwise mutation model is the variance in
repeat number. Gene diversity and its S.E. were calculated
according to the method of Nei (1987)
. Jorde et al. (1997)
have devised
a test to identify differences in genetic diversity among populations
based on the variance in repeat scores. A resampling strategy is used
to assess the level of statistical significance. Jorde et al.'s test
first calculates the variance within continental groupings,
Vij, for each locus i and continental grouping j. The mean of the ratio of Vij
over the mean of Vij among populations is the value
Rj, reported in Table 2, A, B, and C. S is
the ratio of the largest Rj to the mean of the
others and indicates the degree to which diversity is higher in that population relative to the others. The statistical significance of
S was determined by taking random replicates of the
haplotypes, equal in size to the original continental groupings, and
determining the value of S in each replicate. The fraction of
replicates that exceeds the value calculated from the original data is
an indication of the statistical significance of the observed effect.
Calculations and replicates were performed using a program
("strvar") provided by Alan Rogers.
Phylogenetic Analysis
Figure 1 shows a tree relating the continental groupings listed
in Table 1. (
µ)2 distances were calculated with the
Microsat program written by Eric Minch (available at
http://lotka.stanford.edu/microsat.html). Average linkage (UPGMA) trees
(Sokal and Michener 1958
) were constructed using PHYLIP (Felsenstein
1993
). Average linkage was chosen, because a suitable outgroup for the
rapidly evolving Y chromosome microsatellites is not available. Figure
2 depicts an unrooted neighbor-joining tree constructed from the same
distance matrix (Table 3). Trees were constructed
using several other measures of genetic distance (e.g.,
DSW, proportion of shared alleles, absolute
difference, FST and GST), and
most agreed with the topology of the tree in Figure 1, although the
placement of the American population group sometimes differed. As
suggested by the low values of genetic diversity in the Americas (Table
2A), it appears that genetic drift has profoundly affected Y chromosome
variation in Native American populations
contributing at least in part
to their increased distance from the other populations.
|
The performance of a distance measure is primarily a function of its
linearity (the duration over which it increases linearly with the time
since population fission) and its coefficient of variance (Goldstein
and Pollock 1997
). Often, optimizing linearity results in increased
variance, so that, in practice, the various distance measures perform
differently in different circumstances. Some genetic distance measures
work best only with closely related populations (displaying a low
coefficient of variance but maintaining linearity over only short
distances). (
µ)2 is among the more linear distance
measures. As a result it seems to perform well in an array of diverse circumstances.
Estimating Separation Times Among Populations
The same (
µ)2 distances used to construct the
trees in Figures 1 and 2 were also used to estimate the time of the
split between African and non-African populations. Following the
approach of Goldstein et al. (1995b)
, the average of the distances
between each non-African population and the African population was
calculated. The following formula derived by Goldstein et al. (1995b)
was used to estimate the number of generations since the split of African and non-African populations:
|
(1) |
is the mutation rate per locus per generation, and
is time
in generations. Confidence intervals (95%) for the
(
µ)2 distance were estimated by analyzing 10,000 bootstrap replications as implemented by Microsat.
| |
ACKNOWLEDGMENTS |
|---|
We wish to thank all of the DNA donors who participated in this project. Laboratory work was funded by National Institutes of Health grant GM28428 to Luca Cavalli-Sforza, who also provided much helpful discussion. The collection of DNA samples in Ethiopia, Sudan, and Mali was supported by grants from the Arthur Green Fund of Harvard University and the L.S.B. Leakey Foundation to M.T.S. We thank David Goldstein for helpful advice and discussion and Trefor Jenkins and S. Qasim Mehdi for DNA samples from South African and Pakistani populations, respectively.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
5 Corresponding author.
E-MAIL mark{at}ppg.harvard.edu; FAX (617) 432-2956.
| |
REFERENCES |
|---|
|
|
|---|
Received October 26, 1998; accepted in revised form April 22, 1999.
This article has been cited by other articles:
![]() |
S. MacEachern Where in Africa does Africa start?: Identity, genetics and African studies from the Sahara to Darfur Journal of Social Archaeology, October 1, 2007; 7(3): 393 - 412. [Abstract] [PDF] |
||||
![]() |