|
|
|
|
Vol. 9, Issue 8, 711-719, August 1999
RESEARCH
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
From the historically prevalent social structure of Indian populations it may be predicted that there has been very little male gene flow across ethnic boundaries. To test this finding, we have analyzed DNA samples of individuals belonging to 10 ethnic groups, speaking Indo-European or Austroasiatic languages and inhabiting the eastern and northern regions of India. Eight Y-chromosomal markers, two biallelic and six microsatellite, were studied. All populations were monomorphic for the deletion allele at the YAP (DYS287) locus and for the 119-bp allele at the DYS288 locus. Y-chromosomal haplotypes were constructed on the basis of one RFLP locus and five microsatellite loci. The haplotype distribution among the groups showed that different ethnic groups harbor nearly disjoint sets of haplotypes. This indicates that there has been virtually no male gene flow among ethnic groups. Analysis of molecular variance revealed that there was significant haplotypic variation between castes and tribes, but nonsignificant variation among ranked caste clusters. Haplotypic variation attributable to differences in geographical regions of habitat was also nonsignificant.
| |
INTRODUCTION |
|---|
|
|
|---|
Although several early studies (Jakubiczka et al.
1989
; Malaspina et al. 1990
; Spurdle and Jenkins 1992
; Dorit et al.
1995
; Hammer 1995
; Whitfield et al. 1995
) pointed to a low level of variation in the Y chromosome, it has now been established beyond doubt
that there are many Y-chromosomal markers that are highly polymorphic
in all global populations (Deka et al. 1996
; Ruiz-Linares et al. 1996
;
Santos and Pena 1996
; Hammer et al. 1997
; Karafet et al. 1997
;
Rodriguez-Delfin et al. 1997
; Zerjal et al. 1997
). Because the Y
chromosome, except for its telomeric regions, is transmitted
uniparentally (paternally) as a linkage group, it has turned out to be
extremely useful in population genetic studies for establishing
paternal lineages (Deka et al. 1996
). Studies on Y-chromosomal
variation permit the interesting possibility of contrasting
male-specific histories of populations to female-specific ones, which
are revealed by mitochondrial DNA (mtDNA) studies.
Population differentiation with respect to the Y chromosome has been
studied in many regions of the world, and India represents one of the
most ethnically and genetically diverse regions (Majumder 1998
).
Socially, the vast majority (~80%) of the Indian population belong
to the Hindu religious fold and are organized into ~2000 caste
groups, each of which belongs to a socially ranked (broadly, upper,
middle, and lower) caste cluster. The social rank is dependent on
occupation, certain beliefs of purity and pollution, and continued settlement in a particular geographical location (Thapar 1992
). The
tribal populations of India are organized into clan groups; there are
~400 tribes in India. Additionally, there are several religious
communities, such as Sikhs, Muslims, Christians, Jews, etc. Marriages
between different religious groups are extremely infrequent. The caste
structure is also fairly rigid, and each caste remains as an endogamous
unit, although the levels of endogamy can vary substantially (Malhotra
and Vasulu 1993
). The extent of admixture among caste groups of the
same social rank is higher than among those belonging to differnt
social ranks. Boundaries of middle caste groups have been the most
fluid; these groups have admixed with both upper and lower caste
groups. Despite the admixture between caste groups, the genetic
implication of the approved social rule of hypergamy, by which a man
can marry a woman belonging to a caste of lower social rank and
continue to retain his caste affiliation (the woman is absorbed in her
husband's caste subsequent to marriage), is that crossings of Y
chromosomes across ranked caste-cluster boundaries have been negligible
in historical times. We note that the converse union of a woman
marrying a man of a lower social status and retaining her caste
affiliation (hypogamy) has been discouraged, historically. In this
extremely infrequent type of marriage, the woman moves to the
husband's caste, which is of a lower social rank. Although the rules
governing marriage, that is, admixture of genes among castes, are
clearly delineated, the practices leading to intermixture of genes
between castes and tribes or other religious groups have not been
consistent. Often such marriages result in social ostracization and
excommunication, forcing the spouses to move to other geographical
areas; thereafter, they become absorbed by a local group, generally of
a low social rank. In view of the interesting social norms governing
marriage, it is of considerable interest to examine the degree of
differentiation of population groups (including castes, tribes, and
other religious groups) of India with respect to male lineages. While
the present study was in progress, a similar study, comprising 12 caste
groups inhabiting a restricted geographical area (Andhra Pradesh)
within India, recently has been completed (Bamshad et al. 1998
). This study has indicated a relative lack of male gene flow among castes compared to female gene flow. In this paper we report results of a
study conducted among 10 population groups (8 caste and 2 tribal)
inhabiting a much wider geographical area (three different states
West
Bengal and Orissa in the eastern region and Uttar Pradesh in the
northern region
of the Republic of India) and covering two linguistic
families (Indo-European and Austroasiatic), with respect to eight
Y-chromosomal markers (two biallelic and six short-tandem repeat
markers). Ethnic descriptions, sampling locations, and sample sizes of
the 10 study populations are given in Table 1. Our
study, in addition to permitting estimation of male gene flow across
caste boundaries, also permitted estimation of such gene flow across
caste and tribal boundaries and has a wider geographical coverage than
the study of Bamshad et al. (1998)
.
|
| |
RESULTS |
|---|
|
|
|---|
All populations were monomorphic for the Y Alu polymorphic
[YAP
(deletion)] allele at the DYS287 locus and for the 119-bp allele at the DYS288 locus. [Correspondences between repeat numbers and allele sizes (bp) at all STR loci were obtained from Kayser et al.
(1997)
.] The remaining six loci were polymorphic in all populations.
Haplotypes constructed on the basis of data of the six polymorphic loci
and their frequencies observed in the study populations are presented
in Table 2. The 125 sampled individuals harbored 81 distinct haplotypes, indicating extensive Y-chromosomal diversity. It
was also observed that the haplotypes were generally
population-specific; that is, the sets of haplotypes observed in the
different study populations were largely disjoint. Only 12 (15%) of
the 81 distinct haplotypes were shared between populations. Among the
northern Indian populations of Uttar Pradesh (Brahmin, Chamar, and
Rajput), the total number of distinct haplotypes was 35, of which only 3 (8.6%) were shared among the populations (1 haplotype was shared between the upper caste Brahmin and middle caste Rajput; 2 were shared between Rajput and lower caste Chamar). Among the eastern Indian
populations of West Bengal and Orissa (Brahmin, Agharia, Bagdi,
Mahishya, Tanti, Lodha, and Santal), the total number of distinct
haplotypes observed was 60, of which only 7 (11.7%) were shared among
the populations. The upper caste Brahmin shared one haplotype with
middle caste Agharia and another with Santal tribals; the Agharia also
shared one other haplotype with lower caste Mahishya and two other
haplotypes with Lodha tribals; the Mahishyas shared a haplotype with
the Santal tribals; and one haplotype was shared by three groups
the
lower caste Mahishya and Tanti and the two tribal groups of Lodha and
Santal. Therefore, there was minimal sharing of haplotypes among the
ethnic groups studied.
|
However, because the sample sizes of the individual ethnic populations were small and because our hypothesis largely pertained to ranked caste cluster boundaries, we decided to carry out further analyses by pooling data of ethnic groups belonging to separate ranked caste clusters (upper, middle, and lower) and also of the two tribal groups. The sample sizes of these pooled categories were upper caste = 27, middle caste = 29, lower caste = 37, and tribal cluster = 32. The numbers of distinct haplotypes observed among upper, middle, and lower castes were, respectively, 18, 27, and 30. This number among the tribes was 20. The upper castes shared two haplotypes with the middle castes and three with the lower castes, but none with the tribals. The middle castes additionally shared five haplotypes with lower castes and three with tribes. The lower castes also shared three haplotypes with tribes. We noted that of these shared haplotypes, one haplotype was shared by all three caste clusters and another was shared by the middle and lower caste groups with the tribals. The haplotypes shared by these four clusters of populations are presented in Table 3. Therefore, even when individual ethnic populations are grouped in ranked clusters, there is very little haplotype sharing among clusters, indicating that there has been minimal gene flow even across these ethnic clusters. It is, however, noteworthy that the upper castes, while sharing haplotypes with middle and lower castes, do not share any haplotype with the tribes.
|
We sought to examine whether the observed extent of haplotype sharing among these ranked clusters of populations was statistically significantly greater than expected by chance. Because the frequencies of distinct haplotypes vary among clusters, to test this hypothesis, it is not sufficient to take only number of distinct haplotypes shared, but to actually count the numbers of individuals between pairs of clusters that have identical haplotypes. For the six pairs of clusters, the numbers of individuals with identical haplotypes were upper caste and middle caste = 5, upper caste and lower caste = 3, upper caste and tribal cluster = 0, middle caste and lower caste = 8, middle caste and tribal cluster = 11, and lower caste and tribal cluster = 8. Based on 500 simulation runs (details provided in Methods), the upper 95% cutoff values of the distribution of the numbers of individuals who possess identical haplotypes by chance were, for the six pairs of clusters listed above, 14, 17, 16, 19, 16, and 19, respectively. Because all of the observed values are smaller than these cutoff values, it is clear that the observed sharing of haplotypes between clusters of populations is not statistically significant.
The allele frequencies and haplotype diversities are presented in Figures 1 and 2, respectively. Considerable variation in allele frequencies has been observed at many loci among these clusters of populations. For example, at DYS391, not all alleles are observed in all the clusters. Although all clusters harbor very high levels of haplotype diversity (range, 0.9658-0.9975), the highest level of haplotype diversity (0.9975 ± 0.0099) is observed among middle caste populations.
|
|
We have performed analyses of molecular variance (Excoffier et al.
1992
) to quantitatively establish that there is much greater haplotypic
variation within clusters of populations than between them. Because the
analysis of molecular variance (AMOVA) takes into account not only the
relative frequencies of haplotypes but also the number of mutational
steps separating pairs of haplotypes, we excluded the
hHindIII RFLP locus and analyzed haplotypes defined by
the five polymorphic microsatellite loci only. The individual ethnic
populations were suitably grouped to enable examination of the effect
on linguistic, geographical, and social rank on Y-chromosomal haplotype differentiation.
It may be noted from Table 1 that there is a complete confounding of linguistic and caste/tribal affiliations of populations. All caste populations speak languages that belong to the Indo-European family, whereas both tribal populations are Austroasiatic speakers. AMOVA results showed that although 95.75% of the Y-chromosomal microsatellite haplotypic variation was within populations belonging to the two language families (or within populations belonging to caste or tribal clusters), the extent of variation between these groups of populations was 4.25% [F(ST) = 0.0425]. This F(ST) value was statistically significant, indicating that significant Y-chromosomal structuring due to differences in language, or analogously, for the present data set, due to caste-tribal differences. When we subdivided our data further by ranked caste categories (i.e., when the caste populations were grouped into the three separate ranked clusters), the additional variance thus explained was not statistically significant.
To examine whether Y-chromosomal variation was significantly structured
because of differences in geographical locations of habitat of the
populations, we grouped the populations as "northern Indian
inhabitants" and "eastern Indian inhabitants". Only 8 (13.3%) of
the 60 distinct haplotypes (defined by five polymorphic microsatellite loci) were shared between populations inhabiting these two geographical regions. (It may be noted that of the 81 observed haplotypes defined on
the basis of six polymorphic loci, several haplotypes defined by five
microsatellite loci occurred on both
hHindIII+ and
hHindIII
backgrounds; hence, the number of
distinct haplotypes dropped from 81 to 60.) AMOVA results indicated
that haplotypic variation attributable to geographical differences was
not statistically significant.
Because microsatellite loci are known to have higher mutation rates
than biallelic RFLP loci, we sought to examine whether insights into
the evolutionary histories of these populations can be obtained by
examining variations at the microsatellite loci separately for
chromosomes that possess the
hHindIII restriction site
and for chromosomes that do not possess this site. In the pooled set of
chromosomes from all populations, there were 59 (47.2%) chromosomes
that possessed the HindIII site and 66 (52.8%) that did not.
This difference was not statistically significant at the 5% level.
Although these two groups of chromosomes shared only 5 (8.33%)
haplotypes out of 60 distinct haplotypes (defined by the 5 polymorphic
microsatellite loci), the ranges and frequency distributions of repeat
numbers at the microsatellite loci between these two groups of
chromosomes were, however, strikingly similar (Table
4). The variances among individuals of repeat numbers at the microsatellite loci in these two groups of chromosomes (
hHindIII+,
hHindIII
) were
DYS19 = (0.74, 0.72), DYS389I = (0.46, 0.36), DYS390 = (1.45,
1.48), DYS391 = (0.69, 0.26), and DYS393 = (0.90, 0.72). Therefore,
except for the DYS391 locus at which the
hHindIII+
chromosomes showed greater variability, the extent of variability at
all the other microsatellite loci were similar in both groups of
chromosomes. Furthermore, the numbers of distinct microsatellite
haplotypes in
hHindIII+ and
hHindIII
chromosomes were 39 and 42, respectively. The haplotype diversities in
these two groups were, respectively, 0.97 ± 0.01 and
0.98 ± 0.01. Therefore, it appears that the antiquities of both of
these groups of chromosomes are roughly equal. All populations harbor microsatellite haplotypes on both
hHindIII+ and
hHindIII
backgrounds. Differentiation of the
populations, therefore, seems to have taken place after this locus
became polymorphic.
|
We have also examined the relationships among the haplotypes defined by the five microsatellite loci. The haplotype tree (not shown) comprised four major clusters of haplotypes; one haplotype (Table 1, haplotype 43) formed a single-point cluster. [This haplotype, observed among the tribal Santals, possessed a 15-repeat allele at the DYS391 locus, which was not observed in any other population.] Contrary to our expectations, however, the clusters of haplotypes did not correspond to the ethnic clusters.
| |
DISCUSSION |
|---|
|
|
|---|
The prevailing social customs of hypergamy and hypogamy in India restrict male gene flow among ethnic groups of India. We have tested this prediction, using several biallelic and microsatellite Y-chromosomal DNA markers.
The YAP element, which is found in varying frequencies in most global
populations, is absent in all the populations included in the present
study. The YAP+ frequency is very high among most African groups and
low among European populations (see Table 7 of Passarino et al. 1998
).
The absence of this element among Indian populations confirms that
Indians show relatively more genetic similarities with the Caucasoids
than with the Negroids (Majumder 1998
). It is, however, noteworthy that
our earlier studies have revealed that the Austroasiatic tribal
populations of India have some of the human-specific Alu
elements in the nuclear genome at frequencies that are similar to those
found in many African populations (Majumder et al. 1999
). The ranges of
repeat numbers and allele frequencies at the polymorphic microsatellite
loci in the study populations are consistent with global estimates (Kayser et al. 1997
). At most of these loci, the most frequent allele
is not the same across populations. This may indicate different origins
of the study populations or may be due to effects of genetic drift. The
locus DYS288 was found to be monomorphic. Comparable data at this locus
are not available from many other populations (Kayser et al. 1997
).
High levels of haplotype diversity were noted in all clusters of
populations. Consistent with our earlier findings, based on serum
protein and enzyme polymorphisms, the ethnic populations of India
harbor higher levels of genetic diversity that most comparable global
regions (Majumder 1998
). The highest level of haplotype diversity was
found among the middle castes. This finding is not unexpected in view
of the fact that the social boundaries of the middle caste groups have,
historically, been the most fluid.
Nearly disjoint sets of haplotypes were found among the study
populations. Because the effective population size with respect to the
Y chromosome is only one-quarter of the autosomal effective population
size, this phenomenon may be due to drift effects but is consistent
with the prevailing norms governing marriage that severely restrict
male gene flow across ethnic boundaries. In the caste hierarchy, the
middle caste groups are expected to be the most fluid genetically. The
data on haplotype sharing presented in this paper are largely
consistent with this expectation. However, although within a restricted
geographical region (eastern or northern India in the present study)
the caste groups belonging to the upper social rank do not share
haplotypes with groups belonging to the lower social rank, there is
such sharing of haplotypes across geographical regions. This may
indicate that when there are unions, within or outside of marriage,
between an upper caste man and a lower caste woman inhabiting a
geographical area, there is a tendency for them to move away to distant
geographical areas and then affiliate with a lower, not upper, caste in
the new location. The fact that tribal clusters share haplotypes with
middle and lower castes, but not with upper castes, is also
interesting. There are documented instances of tribal groups that after
relinquishing the hunter-gatherer life style and adopting agriculture,
were converted to castes
mostly lower castes (Bose 1953
; Mandelbaum 1970
). Sharing of haplotypes among clusters of populations was, however, not significantly higher than chance expectation.
The extent of molecular variance attributable to differences among
socially ranked clusters was found to be statistically nonsignificant.
This is striking and indicates that Y-chromosomal variation is not
structured by social rank, consistent with the anthropological finding
that there has been social mobility and variations in ranks of ethnic
groups in India (Thapar 1992
). However, our analysis has revealed that
the extent of molecular variation at the Y-chromosomal microsatellite
loci between castes, who are all Indo-European speakers in our sample,
and tribes, who are all Austroasiatic speakers in our sample, is
significant. No such significant variation was observed between
geographical regions.
The comparison of chromosomes on
hHindIII+ and
hHindIII
backgrounds has also revealed an interesting
aspect of the population differentiation in India. If one makes the
reasonable assumption that the loss of a restriction site is more
probable than its gain, it is clear that the
hHindIII
site loss was a very ancient mutation and that the two alleles at this
locus had reached nearly equal frequencies before the differentiation
of the study populations into separate ethnic groups. The finding that
the most frequent alleles at all microsatellite loci, except at DYS393,
are the same on chromosomes with
hHindIII+ and
hHindIII
backgrounds further corroborates this view.
However, the fact that the clustering of haplotypes did not correspond
to the ethnic clusters, is contrary to the simple expectation arising
from the finding of highly disjoint sets of haplotypes among the
populations. We are unable to provide a clear explanation of this lack
of correspondence. We hypothesize that there was large Y-chromosomal
haplotype diversity even before the people of India became organized
into distinct social groups and that each social group was formed by a
restricted number of male lineages. Collection of data on more
biallelic loci and further study variation at microsatellite loci on
the multibiallelic locus backgrounds in these populations will
contribute to a deeper understanding of the population history of India.
| |
METHODS |
|---|
|
|
|---|
Populations
One hundred twenty-five males belonging to 10 ethnic populations of India were studied. Their anthropological details and sample sizes are presented in Table 1. All individuals were not related at least at the first-cousin level.
DNA Isolation and Genotyping
From each selected individual, 5-10 ml of blood was drawn with
consent. DNA was isolated following the protocol of Miller et al.
(1988)
. PCR primers and conditions used for screening DYS19, DYS287,
DYS288, DYS389I, DYS390, DYS391, and DYS393 loci were essentially the
same as those given in Jobling and Tyler-Smith (1995)
. However, for the
DYS19 and DYS391 loci, PCR-amplified products were run on 6%
sequencing gels, transferred to Hybond N+ membranes, probed with one of
the end-labeled primers, blotted, and autoradiographed. Band sizes were
determined by comparing against locus-specific allelic ladders. For the
DYS288, DYS389I, DYS390, and DYS393 loci, product sizes were determined
from electrophoretographs, using GeneScan Analysis version 2.02 in an
ABI-377 automated DNA sequencer. The
hHindIII site was
screened, using the primers and protocols given in Santos et al. (1995)
.
Statistical Analysis
For estimating haplotype diversities and performing AMOVA, Arlequin
version 1.1 (Schneider et al. 1997
) was used. Significance of variance
components was tested, using the nonparametric permutation procedure
approach described in Excoffier et al. (1992)
.
For testing the significance of the observed numbers of shared
haplotypes among individuals between pairs of the four ranked ethnic
clusters, we carried out a permutation test as follows. The total
number of sampled individuals in our study was 125; the numbers of
individuals belonging to the four ranked clusters
upper caste, middle
caste, lower caste, and tribal clusters
were, respectively, 18, 27, 30, and 20. We first randomly permuted the haplotype data of the 125 individuals and then partitioned the data into four subsets of sizes
18, 27, 30, and 20, respectively. To obtain the number of individuals
who shared haplotypes between any two subsets I and J
(I < J; I,J = 1,2,3,4), we
compared individuals i and j
(i
I, j
J;
i < j) and checked whether they had the same haplotypes. For all pairs of the four subsets, numbers of shared haplotypes were counted. This procedure was repeated 500 times. The
upper 95% cutoff point of the frequency distribution (based on the 500 replications) of the number of haplotypes shared between subsets
I and J was then calculated. If the actual number of
shared haplotypes between the corresponding pair of clusters was less than this cutoff point, then the observed number of shared haplotypes was declared to be statistically nonsignificant at the 5% level.
For constructing a tree of observed haplotypes defined by the five polymorphic microsatellite loci, we computed pairwise distances between haplotypes, using the squared Euclidean distance measure, and then the UPGMA and neighbor-joining clustering algorithms.
| |
ACKNOWLEDGMENTS |
|---|
This work was supported by a grant from the Department of Biotechnology, Government of India. We are grateful to Badal Dey, Monami Roy, Madan Chakraborty, R.S. Balgir, and B.P. Dash for their participation in fieldwork for collection of samples. We are also grateful to Chris Tyler-Smith and Andres Ruiz-Linares for information and advice, and to Lynn Jorde and W. Scott Watkins for contributing some labeled primers for initiation of this work. Suggestions provided by an anonymous reviewer were extremely helpful.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Corresponding author.
E-MAIL ppm{at}isical.ac.in; FAX 91-33-577 6680.
| |
REFERENCES |
|---|
|
|
|---|
Received February 16, 1999; accepted in revised form June 8, 1999.
This article has been cited by other articles:
![]() |
A. Basu, N. Mukherjee, S. Roy, S. Sengupta, S. Banerjee, M. Chakraborty, B. Dey, M. Roy, B. Roy, N. P. Bhattacharyya, et al. Ethnic India: A Genomic View, With Special Reference to Peopling and Structure Genome Res., October 1, 2003; 13(10): 2277 - 2290. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Mukhopadhyay, D. Kersulyte, J.-Y. Jeong, S. Datta, Y. Ito, A. Chowdhury, S. Chowdhury, A. Santra, S. K. Bhattacharya, T. Azuma, et al. Distinctiveness of Genotypes of Helicobacter pylori in Calcutta, India J. Bacteriol., June 1, 2000; 182(11): 3219 - 3227. [Abstract] [Full Text] |
||||
![]() |
M. Bamshad, T. Kivisild, W. S. Watkins, M. E. Dixon, C. E. Ricker, B. B. Rao, J. M. Naidu, B.V. R. Prasad, P. G. Reddy, A. Rasanayagam, et al. Genetic Evidence on the Origins of Indian Caste Populations Genome Res., June 1, 2001; 11(6): 994 - 1004. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||