Genome Res. 13:2101-2111, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Letter
Global Haplotype Diversity in the Human Insulin Gene Region
John D.H. Stead1,3,4,
Matthew E. Hurles2 and
Alec J. Jeffreys1
1 Department of Genetics, University of Leicester, Leicester LE1 7RH,
UK
2 McDonald Institute for Archaeological Research, University of Cambridge,
Cambridge CB2 3ER, UK
 |
ABSTRACT
|
|---|
The insulin minisatellite (INS VNTR) has been intensively analyzed
due to its associations with diseases including diabetes. We have previously
used patterns of variant repeat distribution in the minisatellite to
demonstrate that genetic diversity is unusually great in Africans compared to
non-Africans. Here we analyzed variation at 56 single nucleotide polymorphisms
(SNPs) flanking the minisatellite in individuals from six populations, and we
show that over 40% of the total genetic variance near the minisatellite is due
to differences between Africans and non-Africans, far higher than seen in most
genomic regions and consistent with differential selection acting on the
insulin gene region, most likely in the non-African ancestral population.
Linkage disequilibrium was lower in African populations, with evidence of
clustering of historical recombination events. Analysis of haplotypes from the
relatively nonrecombining region around the minisatellite revealed a
star-shaped phylogeny with lineages radiating from an ancestral
African-specific haplotype. These haplotypes confirmed that minisatellite
lineages defined by variant repeat distributions are monophyletic in origin.
These analyses provide a framework for a cladistic approach to future disease
association studies of the insulin region within both African and non-African
populations, and they identify SNPs which can be rapidly analyzed as surrogate
markers for minisatellite lineage.
The insulin minisatellite, located within the promoter of the human insulin
gene, has been intensely investigated for nearly two decades due to its
associations with diseases such as diabetes
(Bell et al. 1984 ;
Bennett and Todd 1996 ). Most
studies have analyzed populations of European descent where low diversity at
the minisatellite combined with strong linkage disequilibrium in flanking
regions make it difficult to distinguish between etiological and associated
variants (Bennett and Todd
1996 ; Doria et al.
1996 ). The identification of etiological polymorphisms in the
insulin region may therefore require analysis of a range of different
populations, in particular those showing a greater range of haplotype
diversity than that seen in Europeans.
As a first step in these studies, we used a system of
minisatellite variant repeat mapping by PCR
(MVR-PCR; Jeffreys et al.
1991 ) to analyze variation at the insulin minisatellite both in
allele size (number of tandem repeats) and structure (distribution of
different variant repeats) in a range of populations from Africa, Asia, and
Europe (Stead and Jeffreys
2000 ,
2002 ). Pairwise comparisons
between all alleles showed that structures were either very different from
each other, indicating that they had diverged through multiple complex
mutational rearrangements, or displayed very similar patterns of variant
repeat distribution, resulting in these closely related structures having
similar overall sizes. In this way, alleles could be readily divided into
lineage groups with low levels of variation within each lineage both in allele
size and structure (Stead and Jeffreys
2000 ,
2002 ). Analysis of three
African populations from the Ivory Coast (156 alleles), Zimbabwe (138), and
Kenya (84) revealed substantial structural diversity with minisatellite
alleles assigned to one of 22 different lineages. In contrast, all alleles
within non-African populations from the U.K. (1080 alleles), Japan (118), and
Kazakhstan (80) belong to just three lineages. These lineages, termed I, IIIA,
and IIIB, are all present in Africa. This reduction in diversity from 22
lineages in Africans to a subset of just three lineages in non-Africans is
consistent with a wealth of genetic evidence from mtDNA, Y chromosome
polymorphisms, and autosomal markers which demonstrates a greater diversity in
Africans than non-Africans and supports an African origin for anatomically
modern humans (for review, see Tishkoff
and Williams 2002 ). Although other explanations are possible, an
African origin is further supported by the presence at multiple loci of
ancestral sequences seen exclusively in Africa
(Takahata et al. 2001 ).
Although most studies of genetic variation have identified greater
diversity in Africans compared with non-Africans, the difference in lineage
frequencies at the insulin minisatellite between these population groups is
unusually large, with 28% of the total genetic variance being due to
differences between Africans and non-Africans
(Stead and Jeffreys 2002 ).
This is substantially greater than that at most other loci (average 11%;
Barbujani et al. 1997 ).
Although this increased population differentiation is consistent with
differential selection acting in the past between Africans and non-Africans,
it is unclear whether the differentiation is simply an artifact of using
minisatellite lineages rather than haplotypes defined by SNPs.
We now extend these minisatellite structural studies by analyzing SNP
variation flanking the minisatellite in the three African and three
non-African populations previously analyzed. If the 22 minisatellite lineages
are truly monophyletic and deeply diverged from each other, then the unusually
great differentiation between Africans and non-Africans should be reflected by
elevated haplotype differentiation. SNP-based haplotypes could also give
further clues about the nature, timing, and intensity of selective pressures
in this region of DNA.
Haplotype data are also highly relevant to disease-association studies.
Lineage diversity is too great in Africans for each lineage to be tested
independently for association, and the use of MVR-PCR is too labor-intensive
for lineage identification in large cohorts. Haplotype analysis should
identify SNPs that can act as surrogate markers for minisatellite lineages,
allowing lineage identification in large cohorts. Furthermore, the
identification of closely related haplotypes will allow different but related
lineages to be pooled, thus providing a framework for a hierarchical cladistic
approach to disease-association studies
(Templeton et al. 2000 ).
 |
RESULTS
|
|---|
Global SNP Diversity in the Insulin Gene Region
Previous analysis of variant repeat structures of 1278 non-African and 378
African alleles at the insulin minisatellite identified 22 lineages in
Africans, with a subset of just three lineages present in non-Africans (Stead
and Jeffreys 2000 ,
2002 ). We used this
information to select three non-Africans and 13 Africans for SNP discovery who
between them contained 21 of these 22 lineages; this strategy helped maximize
the chance of detecting SNPs and gaining a global view of DNA diversity.
Analysis of a 7.8-kb region including the minisatellite identified 53
polymorphisms, with an additional three polymorphisms determined from the
literature (Table 1). To the
best of our knowledge, only 22 of these 56 SNPs have been described previously
(Elbein et al. 1985 ;
Kelsoe et al. 1988 ;
Julier et al. 1991 ;
Lucassen et al. 1993 ;
Owerbach and Gabbay 1993 ;
Johnson et al. 2001 ).
Nucleotide diversity across the 32 sequences was estimated as = 0.00124.
Although this estimate will be biased upwards, as individuals were selected
for sequencing based on maximal diversity at the minisatellite, similar
estimates were derived from haplotype data (described below) on unselected
chromosomes, which will be biased downwards because only known variants were
typed on most chromosomes (data not shown). These measures of the average
difference per nucleotide between two randomly chosen sequences selected from
a population are similar to estimates from elsewhere in the genome
(Jorde et al. 2001 ). The 56
polymorphisms include 40 transitions and 14 transversions, plus a 4-bp
insertion and a 5-bp deletion. Twenty-seven polymorphisms (48%) were at CpG
doublets (24 transitions and three transversions), four were within 3 bp of
polymerase -arrest sites (TGRRGA), and three were within mononucleotide
runs of 5 bp (Templeton et al.
2000 ). The higher frequency of polymorphism at CpG doublets
compared with estimates from elsewhere in the genome
(Templeton et al. 2000 ) is
perhaps unsurprising given the high GC content of the region (65% GC excluding
the minisatellite) and evidence of imprinting in this region
(Moore et al. 2001 ). Indeed,
levels of polymorphism are known to increase with GC content, at least over
short genomic regions (Sachidanandam et
al. 2001 ; Smith and Lercher
2002 ).
The ancestral states of 49 of the 56 polymorphisms could be determined
unambiguously by comparison with chimpanzee, gorilla, and orangutan sequences.
The remaining seven ambiguous sites showed evidence of recurrent mutation in
primates, with five occurring at CpG doublets. One CpG doublet was polymorphic
in humans for both TpG and CpA variants (SNPs INS-66 and INS-67) with the
gorilla showing the likely ancestral CpG form, the orangutan the TpG form, and
the chimpanzee the CpA form. The only amino acid substitutional changes among
the four species were within the signal peptide of preproinsulin, with a
conservative Ala Val switch at position 13 in chimpanzees
(Seino et al. 1992 ) plus an
Ala/Gln/Ser three-way substitution at position 2 which is also
polymorphic in humans, resulting in an Ala Thr substitution
(Oda et al. 2001 ).
Sequence comparisons between humans and chimpanzees generated a sequence
divergence estimate of 2.11%. By random concatenation of human and chimpanzee
sequence data from 53 noncoding autosomal loci described by Chen and Li
(2001 ), we generated a
distribution of divergences between these sequences. Divergence estimates as
high as 2.11% were not observed in this distribution, allowing us to show that
sequence divergence at the insulin region was significantly (P <
0.05) higher than at most other loci (see Methods). Furthermore, sequence
comparisons with other great apes revealed high divergence between all species
analyzed (Supplemental Information Table S1, available online at
www.genome.org;
Chen and Li 2001 ). This
elevated divergence may reflect the high GC content near the insulin gene
(65%) and the abundance of segregating sites at CpG dinucleotides. A relative
rates test performed on these sequence data showed that the insulin gene
region was evolving in a clock-like fashion (P > 0.05).
Genotype and Haplotype Diversity
To determine the level of diversity around the insulin gene, we typed all
of the 56 polymorphisms in 189 non-Africans from the U.K. (102 individuals),
Kazakhstan (28), and Japan (59), plus 189 Africans from the Ivory Coast (78
individuals), Zimbabwe (69), and Kenya (42). All of these individuals had been
previously characterized for insulin minisatellite structures using MVR-PCR
(Stead and Jeffreys 2002 ).
Fifty-two of these SNPs were polymorphic in Africans, compared with only 21 in
non-Africans (Table 1).
To investigate haplotype diversity around the insulin gene, we inferred
haplotypes from diploid SNP data in silico using the PHASE algorithm
(Table 2;
Stephens et al. 2001 ). A total
of 77 different haplotypes were identified (70 with >95% confidence).
Africans contained 66 of these haplotypes (61 with >95% confidence),
compared to only 20 in Europeans and Asians (18 with >95% confidence).
These differences in haplotype diversity prompted further investigations of
population differentiation by Fst and analysis of molecular
variance (AMOVA) analyses (Table
3; data not shown). Both approaches revealed unusually high
differentiation between Africans and non-Africans, compared to modest
differentiation between Europe and Asia and very little differentiation within
Asia or between the three African populations tested. This picture is very
similar quantitatively to the levels of population differentiation seen using
minisatellite lineage data (Table
3). This demonstrates that the unexpectedly large reduction in
minisatellite lineage diversity seen in non-Africans
(Stead and Jeffreys 2002 ) is
not an artifact of using data from groups of related minisatellite alleles but
instead reflects a genuine major shift in lineage frequencies not only at the
minisatellite but also at flanking SNPs and haplotypes.
As noted previously (Stead and Jeffreys
2002 ), this extreme population differentiation between Africans
and non-Africans is consistent with differential and directional selection
acting to increase genetic differentiation between population groups
(Cavalli-Sforza et al. 1994 ),
whereas the similar lineage composition in the three non-African populations
suggests that selection acted on a population which was ancestral to all
three. If selection is acting at or near the minisatellite, then markers
closest to the minisatellite should show the highest levels of population
differentiation, with recombination releasing more distal markers from these
selective pressures and reducing the differentiation between populations. This
prediction was tested using a sliding window AMOVA analysis on blocks of 16
consecutive SNPs across the insulin region
(Fig. 1). This analysis showed
that the component of variance attributed to differences between Africans and
non-Africans rose to a peak of 43.5% near the minisatellite, far higher than
the genome average of 10.8% (Barbujani et
al. 1997 ) and fell to 4.2% towards the 5' end of
IGF2. In contrast, neither the HKA test
(Hudson et al. 1987 ; comparing
the insulin region with the LPL locus;
Clark et al. 1998 ) nor
estimates of Tajima's D (1.027;
Tajima 1989 ) found evidence of
selection at the insulin locus (P > 0.05; data not shown).
Although this appears to contradict the Fst estimates, the high
Fst values reflect the very low lineage and haplotype diversity in
non-Africans, suggesting that selection may have occurred since the MRCA and
specifically in these populations. If so, then the HKA test, which we used to
compare human/great ape sequence divergence with human sequence diversity,
would have little power to detect this recent selection. Similarly, the Tajima
test, which compares nucleotide diversity with the number of segregating sites
in humans, would have little power to detect selection in our sequence data
set established primarily (81%) from Africans.
Patterns of Linkage Disequilibrium
D' is a measure of complete disequilibrium where
|D'| = 1 if at most three of the four possible haplotypes
are present (Lewontin 1964 ).
To investigate patterns of linkage disequilibrium (LD) at the insulin gene
region, D' was estimated between all pairwise combinations of markers in
each of the six populations tested (Fig.
2). High LD was observed in each of the three non-African
populations across the 7.8-kb region. The only exception is the 5' most
distal SNP INS-1, though LD has been reported to extend over 6 kb 5' of
this position (Doria et al.
1996 ), suggesting that LD breakdown at INS-1 may instead have
resulted from recurrent mutation or localized gene conversion at this marker
rather than from crossover. The main difference between the non-African
populations was reduced statistical power supporting LD in the Japanese, a
consequence of 95% of chromosomes within this population belonging to lineage
I.
To investigate haplotype structure, we analyzed levels of absolute
association in these populations using the LD measure for which
| | = 1 where at most only two of the possible four
haplotypes are present (Hill and Robertson
1968 ). Two blocks of absolute association were apparent in
non-Africans, indicating the presence of only two major haplotypes per block
(Table 2A). These blocks of
very low haplotype diversity, one covering the INS gene and 5'
flanking region and the other extending over the first exon of the
IGF2 gene, correlate strongly with the heavily reduced minisatellite
lineage diversity seen in Europeans and Asians. Thus, the two main haplotypes
in the INS block, which corresponds to the region of disequilibrium
previously described by Lucassen et al.
(1993 ), are associated with
minisatellite lineages I and IIIA in all three non-African populations
(Table 2A). Similarly, one
haplotype in the IGF2 block corresponds to sublineage IC, with the
other haplotype being associated with ID and IIIA lineages. Haplotypes
associated with lineage IIIB differ from the two main haplotypes in each
block, but are at low frequency in these populations and so do not
dramatically reduce levels of absolute association. We found no evidence of
historical recombination between the two blocks of association.
The degree of LD in the insulin gene region is lower in Africans compared
to non-Africans (Fig. 2).
Patterns of complete disequilibrium (D') do show evidence for possible
LD blocks that broadly correspond to the two domains of absolute
disequilibrium ( ) identified in non-Africans
(Table 2B). Application of the
four-gamete test to these populations found evidence of recombination between
the two blocks of disequilibrium. Furthermore, there was evidence of homoplasy
at four markers within the INS LD block (INS-21, INS-24, INS-69, and
INS-28) which cluster in an interval from 50 bp 5' of the minisatellite
to 800 bp 3' of the minisatellite
(Fig. 2B,
Table 2B). This clustering,
plus the observation that two of the four markers are not located at sites
prone to increased mutation, argue against recurrent mutation and instead
provide evidence of a minisatellite-associated increase in local recombination
frequency. Interestingly there is little evidence that linkage phase has been
disrupted between markers distal to these apparent recombinants, suggesting
that either crossing-over occurred between similar haplotypes or that
recombination may be generating conversions in this region. The lack of
observable recombinants in this region in non-Africans may simply reflect the
fact that these markers are monomorphic in Europeans and Asians.
Historical recombination rates in the insulin gene region were investigated
by analyzing the rate of decay of absolute disequilibrium with distance
(Fig. 2C). LD decays more
rapidly in Africans than non-Africans, with extended LD greatest in the
Japanese population, followed by Kazakhstan, then the U.K. Assuming an
effective population size of 10,000 (Jorde
et al. 1998 ), the decay of LD suggests a mean recombination
activity of 5.3 cM/Mb across the region (ranging from 0.4 to 12.8 cM/Mb
depending on the population considered), somewhat higher than the genome
average rate of 1 cM/Mb (Yu et al.
2001 ). This is likely to be an overestimate given the evidence of
gene conversion and recurrent mutation in this region, both of which will
contribute to LD decay. Assuming that the recombination activity is the same
in all populations, the LD decay data suggest a ratio of effective population
sizes among Africa, Asia, and Europe of 10:1:3. This is similar to estimates
of 7:1:2 based on simple tandem repeat (STR) data
(Relethford and Jorde 1999 )
and 3:1:1 based on craniometric and genetic data sets
(Relethford and Harpending
1994 ).
Phylogenetic Network Analysis
We used median joining network analysis
(Bandelt et al. 1999 ) of the
insulin haplotypes to explore the phylogenetic relationships between
minisatellite lineages (Fig.
3A). To minimize reticulation arising from historical
recombination events, we focused on a 3.6-kb region spanning the
minisatellite, from markers INS-11 to INS-40, that is two SNPs (512 bp)
shorter than the largest block of complete disequilibrium found in African
populations. This analysis revealed a star-like phylogeny in which the
residual reticulation involved only markers INS-21, INS-24, INS-69, and INS-28
near the minisatellite which had been previously identified as being prone to
recombination.

View larger version (20K):
[in this window]
[in a new window]
|
Figure 3 Median-joining (MJ) network analysis of the insulin gene region.
(A) MJ network analysis of all haplotypes. MJ networks were based on
26 polymorphisms within a 3.6-kb region around the insulin minisatellite and
including SNPs INS-11 to INS-40. Haplotypes are represented by circles, with
areas reflecting the frequency of each haplotype group within the total
African plus non-African data set. A reference circle representing n = 5
chromosomes is shown. African haplotypes are marked in black and non-African
in white. The lengths of lines connecting haplotype nodes are proportional to
the number of mutation steps separating lineages, with the identity of the
SNPs mutated along each branch indicated. SNPs displaying homoplasy are
underlined. The network was rooted by comparison with primate sequences at the
African-specific haplotypes S, V, and H (indicated by stars). (B) MJ
network analysis of sublineages of lineage I chromosomes. Relationships
between sublineages were resolved by network analysis using all markers across
the 7.8-kb haplotype.
|
|
Most nodes on the haplotype phylogeny contain only one minisatellite
lineage, confirming that these lineages defined by repeat DNA structure are
genuinely monophyletic and that many of them are highly diverged, in
particular lineages I and IIIA that account for the majority of non-African
chromosomes. Discounting singletons, only three lineages shared the same
flanking haplotype (lineages S/V, lineages IIIB/X, and lineage K with a subset
of lineage J). Minisatellite alleles can therefore on occasion become highly
diverged in structure without accumulating flanking mutations. Conversely,
three lineages were spread over similar haplotypes related by mutation
(lineages IIIA and Y) or by either mutation or recombination (lineage J),
indicating that major rearrangements in the minisatellite must occur
sufficiently rarely to allow flanking mutations to sometimes accumulate within
a minisatellite lineage. It therefore follows that major rearrangements must
occur at a rate similar to base substitution over this 3.6-kb region
(c.7x105 per gamete). Interestingly, this
rate is similar to the frequency of major complex rearrangements at the
minisatellite detected in human sperm
(c.2x105;
Stead and Jeffreys 2000 ).
To investigate whether closely related haplotypes had minisatellite
lineages of similar sizes, pairwise comparisons were performed between the
most common haplotype from each of the 22 lineages. There was no correlation
between the distance between haplotypes and the size difference between
minisatellite allele lineages, either with the shortened haplotype used for
the phylogenetic studies (Kendall = 0.069, P = 0.12) or with
full haplotypes (Kendall = 0.026, P = 0.56). This implies that
major rearrangements occurring in the minisatellite must involve radical
changes in allele lengths, as seen in complex sperm mutants
(Stead and Jeffreys 2000 ).
Comparison of human and primate sequences located the root of the network
at or near the central node of the star phylogeny (lineages S/V or H,
respectively). Haplotypes near the root were uncommon in Africans and absent
from non-Africans. Two of the three lineages found outside Africa (lineages I
and IIIB) lie on the same haplotypes in Africans and non-Africans
(Fig. 3A,B). In contrast,
lineage IIIA alleles in Europeans and Asians have diverged from their African
counterparts by mutation at INS-11 and/or INS-40, suggesting diversification
following the founding of non-African populations.
 |
DISCUSSION
|
|---|
Previous studies of lineage diversity at the insulin minisatellite revealed
an unusually high level of genetic differentiation between Africans and
non-Africans, considerably greater than that seen at most other genomic
regions analyzed using SNPs or microsatellites
(Barbujani et al. 1997 ;
Jorde et al. 1998 ). We now
show that this hyperdifferentiation is not an artifact of using minisatellite
lineages as markers, but instead correlates with a remarkable degree of
population differentiation in SNP-based haplotypes near the minisatellite.
Furthermore, this differentiation does not appear to affect the whole insulin
gene region uniformly; rather, it extends across a region previously
implicated in susceptibility to type 1 diabetes
(Lucassen et al. 1993 ),
reaching a peak at and near the minisatellite. These data are consistent with
directional selection acting on the minisatellite or a nearby polymorphism,
with more distal markers being released from selective pressures by historical
recombination. As previously noted (Stead
and Jeffreys 2002 ), the substantial over-representation of lineage
I chromosomes in Europeans and Asians raises the possibility that positive
selection has acted to increase the frequency of class I chromosomes
specifically in non-African populations, as a result for example of adaptation
to different environments. The finding of the same three lineages in all
non-African populations suggests that this selection acted early during the
migration out of Africa, and that shared evolutionary history accounts for the
similar lineage composition of the three non-African populations. Finally, we
found no evidence of differences in selective environments between Europe and
Japan, as Fst values between these populations are no higher at the
insulin locus than at other autosomal loci
(Cavalli-Sforza et al. 1994 ).
We are not aware of comparative data for Kazakhstan.
It will be of considerable interest to extend these analyses more broadly
in the INS region, to see whether other peaks of population
hyperdifferentiation can be detected that may mark regions that have undergone
prehistoric selection. If selective environments have remained sufficiently
stable over time, such sites may colocate with regions containing variants
underlying current disease etiologies. These studies might also prove more
informative as to the nature of the putative selection that has acted on this
locus. For example, analyses of extended haplotypes have recently been used to
find signatures of recent selection at two genes implicated in resistance to
malaria, G6PD, and the CD40 ligand
(Sabeti et al. 2002 ). However,
this approach may not be fruitful at the insulin locus, where selection is
likely to be much older and therefore harder to detect; also, neutral `core
haplotypes' at the same locus in the same populations are not available for
comparison.
There is substantial linkage disequilibrium across the insulin gene region,
most noticeably in non-Africans but still present in African populations; for
example, 57% of comparisons between SNPs with minor allele frequencies
>0.15 yielded D' values >0.9 in the Ivory Coast sample. Despite
these associations, there is evidence that historical recombination events
have contributed to haplotype diversity. The indirectly estimated
recombination activity of this region is roughly fivefold higher than the mean
rate of 1 cM/Mb in the human genome (Yu et
al. 2001 ). However, historical recombination events appear not to
be randomly scattered but rather cluster into two regions, one between
INS and IGF2, and another spanning a region from just
upstream of the insulin minisatellite to 800 bp 3' of the minisatellite.
There is growing evidence that minisatellites are generated from errors in
meiotic recombination centered at recombination hotspots
(Jeffreys et al. 1998 ). It is
therefore possible that the insulin minisatellite may also be associated with
a weak hotspot perhaps 3' to the minisatellite. Consistent with this,
sperm mutation studies have revealed a low frequency recombination-based mode
of repeat DNA instability in the germline, with some evidence that these
events occur preferentially towards the 3' end of the minisatellite
(Stead and Jeffreys 2000 ).
Variant repeat analysis at the insulin minisatellite identified 22 lineages
with highly diverged structures, most of which are apparently present only in
Africa (Stead and Jeffreys
2002 ). Network analysis of haplotype data has now confirmed that
most minisatellite lineages are indeed monophyletic in origin
(Fig. 3). Although the
structural divergence between minisatellite lineages prevents the construction
of phylogenies from MVR data, haplotype networks show that some closely
related lineages do share common motifs within the minisatellite. For example,
lineages IIIB, X, and Y all share a motif of consecutive F, A, C, and E
variant repeats, whereas lineages L, M, and W display similarities at both
5' and 3' ends. This indicates that motif sharing between lineages
can be due to a relatively recent common ancestry as opposed to recombination
or mutational convergence. However, neither these motifs nor minisatellite
allele lengths can serve as reliable phylogenetic markers for identifying
related lineages, and lineage phylogeny can only be established by analysis of
flanking haplotypes.
These data provide a framework for future disease association studies at
the insulin region in a range of populations. Most previous studies analyzed
populations of European descent in which low minisatellite and haplotype
diversity combined with high LD near the minisatellite make it difficult to
distinguish etiological from associated variants
(Bennett and Todd 1996 ;
Stead et al. 2000 ). Our data
suggest that this limitation would apply to the analysis of any non-African
population, given that Asians and Europeans share the same few lineages and
associated haplotypes. However, increased lineage diversity and reduced LD in
African populations open up the possibility of a second phase of
higher-resolution studies with the potential to be able to dissect out
etiological variants which in non-African populations cannot be uniquely
identified due to strong associations with other markers. The present study
allowed us to identify either lineage-specific SNPs or combinations of SNPs
for every known minisatellite lineage; these could be rapidly analyzed as
surrogate markers of minisatellite lineage in every population
(Table 2, Suppl. Information
S2). Furthermore, although many African lineages are found at too low a
frequency to allow their effects on disease susceptibility to be independently
analyzed, our haplotype phylogenetic analysis describes relationships between
different lineages, allowing different closely related lineages to be combined
into a single group of sufficient size for association analysis. This allows
for a hierarchical cladistic approach to association studies in Africans in
which analysis of a small number of SNPs will allow closely related lineages
to be initially analyzed as a single group
(Templeton et al. 2000 ). The
effects of specific lineages within individual groups could then be further
investigated by analyzing additional polymorphisms.
Finally, there is considerable interest in defining haplotype blocks in the
human genome and identifying haplotype-tag (ht) SNPs that can be used to
capture efficiently most or all of the haplotype information within a block
(Johnson et al. 2001 ;
Gabriel et al. 2002 ). There is
evidence that common haplotypes within a block can to some extent at least be
global, shared by most or all major population groups, and raising the
possibility of defining a set of universal ht-SNPs for an LD block applicable
to any population (Gabriel et al.
2002 ). The present data show that this approach may face
substantial difficulties. For example, the haplotypes carrying lineage IIIA
alleles in non-Africans (IIIAa,b) are absent from Africans, with allele
INS-40T which uniquely identifies this lineage in Europeans and Asians
(Table 2) being absent from
Africans. Similarly, marker INS-74 used elsewhere to define one of the most
common haplotypes within the U.K. population
(Johnson et al. 2001 ) was not
detectably polymorphic in other non-African populations. The implications are
that, even if LD blocks are shared across populations, the underlying
haplotypes and ht-SNPs may show substantial population specificity.
 |
METHODS
|
|---|
DNA Source
Genomic DNA was extracted from blood or sperm from individuals sampled at
random with respect to disease status from six populations: U.K. (102
individuals), Kazakhstan (28), Japan (59), Ivory Coast (78), Zimbabwe (69),
and Kenya (42). The use of human samples was approved by the Leicestershire
Local Research Ethics Committee.
Nomenclature
This study describes variation both at the insulin minisatellite and in the
flanking haplotype. We reserve the terms "lineage" and
"sublineage" exclusively for minisatellite diversity. Briefly,
structures of all alleles within a lineage can be aligned to each other,
whereas different lineages cannot be aligned. Some lineages contained
subgroups of alleles that were more similar to each other than to other
alleles within that lineage. These subgroups are termed `sublineages.' Our
nomenclature incorporates terminology used in previous publications
(Bell et al. 1984 ;
Stead and Jeffreys 2000 ).
Originally, minisatellite alleles were divided into three classes on the basis
of size: class I (small), II (intermediate), and III (large)
(Bell et al. 1984 ;
Rotwein et al. 1986 ). In
Europeans, class I alleles form a single lineage called lineage I, class II
alleles are rare, and class III alleles divide into two lineages termed IIIA
and IIIB (Stead and Jeffreys
2000 ). Other lineages are denoted with capital letters, and
sublineages are indicated by Roman numerals; thus IIIAii is the second
sublineage of lineage IIIA. The only exceptions are sublineages IC, ID, and IE
of lineage I, named prior to this study (Stead and Jeffreys
2000 ,
2002 ). The names of specific
minisatellite alleles reflect minisatellite lineage/sublineage, allele size
(number of repeats), and a further discriminator; for example, allele ID42.4
is the fourth allele of 42 repeats identified from the ID sublineage of
lineage I.
Haplotypes were inferred from combined SNP and minisatellite lineage data
and named according to minisatellite lineage, with different haplotype
variants indicated in lowercase letters; thus Wa and Wb are the first and
second haplotypes identified in chromosomes carrying the minisatellite from
lineage W.
SNP Discovery and Genotyping
PCR primers designed from the genomic sequence of the insulin region
(accession number L15440
[GenBank]
; Lucassen et al.
1993 ) were used to amplify a 3.163-kb region 5' of the
minisatellite (primers 5F1 to 5R1) and two overlapping amplicons covering a
4.237-kb region 3' of the minisatellite (primers 3F13R3 and
3F63R1). To screen for polymorphisms, these amplicons were resequenced
using BigDye terminators (ABI) on an ABI 377 Automated Sequencer, and SNPs
were identified using ABI AutoAssembler software. Diploid PCR products from 16
individuals were analyzed. Their sample references (italics) and genotypes at
the insulin minisatellite are as follows: 11 individuals from Zimbabwe
(Z-10 IC35.8/F17.1; Z-14 N82.2/U119.1; Z-15
Q86.1/X144.1; Z-16 T107.1/Y422.1; Z-20 G30.1/K44.1;
Z-21 J43.1/W145.1; Z-46 L51.1/K52.2; Z-51
IIIB142.2/W148.1; Z-58 M86.1/V216.1; Z-67 K44.2/Z191.1;
Z-86 IC31.8/IE39.1), two from Kenya (K-3I H29.1/S110.1;
K-5B P79.1/IIIA211.1), and three from the U.K. (UK-101
ID40.2/ID44.1; UK-102 IIID144.1/IIIA147.2; UK-103
IC30.1/IIIA159.2). Orthologous regions from one chimpanzee, one gorilla, and
one orangutan were also amplified and sequenced using the same primers
(GenBank acc. nos. for human AY138589
[GenBank]
, AY138590
[GenBank]
; chimpanzee AY137496
[GenBank]
,
AY137497
[GenBank]
; gorilla AY137498
[GenBank]
, AY137499
[GenBank]
, AY137500
[GenBank]
; orangutan AY137501
[GenBank]
, AY137502
[GenBank]
,
AY137503
[GenBank]
). We identified 53 polymorphisms in the 16 human samples, with one
additional SNP (INS-72) from dbSNP plus two (INS-73 and INS-74) described by
Johnson et al. (2001 ). We
genotyped all 56 SNPs using genomic DNA from our panel of unrelated
individuals from six populations by allele-specific oligonucleotide (ASO)
hybridization to dotblots of the three PCR-generated amplicons described above
(Wood et al. 1985 ;
Jeffreys et al. 1998 ). Each
dotblot included amplicons used for sequence-based SNP-discovery resulting in
genotypes being determined by both ASO hybridization and sequence analysis for
16 subjects (896 genotypes). No genotyping discrepancies between the two
techniques were observed. Furthermore, there was no evidence for ambiguity at
any of the resulting 21,168 genotypes. After Bonferroni correction, none of
the markers showed deviation from Hardy-Weinberg equilibrium in any of the
populations tested. Details of SNPs, ASOs, PCR primers, primate sequences,
genotypes, and allele frequencies, plus the full MVR data from 1985 alleles
analyzed in this and other studies (Stead
et al. 2000 ; Stead and Jeffreys
2000 ,
2002 ; J.D.H. Stead, M.I.
McCarthy, and A.J. Jeffreys, unpubl.) can be found on our Web site
(http://www.le.ac.uk/ge/ajj/insulin).
In Silico Analyses
Haplotypes were inferred from genotype data in silico using PHASE software
(Stephens et al. 2001 )
available at
http://www.stats.ox.ac.uk/mathgen/software.html
which can analyze multi-allelic loci including minisatellite lineages and
which assigns a probability of the correct inference of haplotype phase at
every heterozygous position. Each of the 22 minisatellite lineages were given
a unique identifier prior to analysis except for lineage I alleles, which were
subdivided into the three main sublineages IC, ID, and IE. PHASE simulations
were repeated five times. Differences between outputs were minimal (<2%)
and only involved haplotypes determined with <95% confidence. In these
cases, the most common output was accepted. PHASE inferred haplotypes with
>95% confidence for 186 of the 189 non-Africans genotyped, and 171 of the
189 Africans. Incorporation of data from specific alleles at the minisatellite
(as opposed to minisatellite lineage) allowed haplotypes to be inferred with
confidence from a further two non-Africans and eight Africans. PHASE-inferred
haplotypes for all 378 individuals can be found on our Web site
(http://www.le.ac.uk/ge/ajj/insulin).
Sequence divergence between hominoids was estimated using Jukes-Cantor
distances (Jukes and Cantor
1969 ) from an alignment of a 6774-bp region surrounding the
insulin locus for which full sequence data were available. Human values were
averaged over the 32 complete sequences, inferred using PHASE software from
the 16 diploid sequences described above. To compare our estimates of sequence
divergence between humans and chimpanzees at the insulin locus (2.11%) with
other autosomal loci, we randomly (with replacement) concatenated sequence
data from 53 noncoding autosomal loci described by Chen and Li
(2001 ) until total sequence
length matched that which was available for the insulin region, after which
sequence divergence was determined. Replication of this process 10,000 times
generated a divergence distribution of 0.96%1.51% (between 2.5 and 97.5
centiles). A sequence divergence as high as 2.11% was never observed, despite
the insulin region containing coding regions under selective constraint.
To determine whether these great ape and human sequences were evolving in a
clock-like fashion, a relative rates test was conducted on the aligned
sequences using TREE-PUZZLE software
(Strimmer and von Haeseler
1996 ) available at
http://www.tree-puzzle.de/.
This generated a likelihood test statistic of 4.67 which fails to reject the
molecular clock for this data (P > 0.05 assuming a
2 distribution with two degrees of freedom).
We applied two tests of neutrality to the sequence data. Tajima's D
statistic (Tajima 1989 )
compares estimates of derived either from the average pairwise
difference between sequences, or from the number of segregating sites. Under
neutrality and mutation/drift equilibrium these measures should be equal,
giving a Tajima's D value of 0. The Hudson/Kreitman/Aguade (HKA) test
(Hudson et al. 1987 ) compares
levels of intraspecific polymorphism with interspecific divergence between the
locus being investigated and a neutral reference locus. The ratio of
polymorphism against divergence should be equal between loci if both are
evolving neutrally, as both measures depend on the underlying neutral mutation
rate. In contrast, directional selection will cause a reduction in this ratio
in the locus under selection. The lipoprotein lipase gene (LPL [MIM
238600
[OMIM]
]; Clark et al. 1998 ),
which has been used elsewhere as a neutral control for HKA analysis
(Fullerton et al. 2000 ), was
selected as our reference locus. Tajima's D and the HKA test were both
analyzed using the program DnaSP (Rozas
and Rozas 1999 ) available at
http://www.ub.es/dnasp.
Fst calculations were performed on all SNP genotype data
excluding the insulin minisatellite, or on minisatellite lineage data
excluding flanking SNPs, from each population using Arlequin ver. 2.000
(Schneider et al. 2000 )
available at
http://lgb.unige.ch/arlequin.
Analysis of molecular variance (AMOVA;
Excoffier et al. 1992 ) was
calculated from pairwise distances between SNP haplotypes and was performed
using Arlequin ver. 2.000 (Schneider et
al. 2000 ). To investigate variation in the degree of population
subdivision across the insulin region, we used a program which performs a
sliding window analysis of population subdivision (SWAPS), written by M.E.
Hurles, which will be described in detail elsewhere. Here, the program
performs an AMOVA analysis (Excoffier et
al. 1992 ) on 41 overlapping windows, each composed of 16
informative sites, shifting in position by a single SNP across the region
analyzed. Larger windows reduce resolution, whereas smaller windows produce
more peaks and troughs by exaggerating the effects of individual
population-specific SNPs.
Linkage disequilibrium (LD) was analyzed from diploid genotype data
(excluding the minisatellite) and plotted using software described elsewhere
(Jeffreys et al. 2001 ) which
employs LD measures of both complete association (D') and absolute
association ( ), and estimates the likelihood ratio (LR) in favor of
significant linkage disequilibrium. Markers with minor allele frequencies
below 0.15 were excluded from LD analysis; these markers tend to lack
statistical power to detect LD and are also likely to be the result of
relatively recent mutations and will not therefore report on ancient
recombination events.
Indirect estimates of recombination activity were derived from the LD data
using markers with minor allele frequencies 0.15. Assuming that crossovers
are randomly distributed and that haplotypes are selectively neutral and at
crossover/drift equilibrium, the expected rate of decay of absolute
disequilibrium (| |) with physical distance is given by
where
Ne is the effective population size, r is the
recombination frequency per Mb, and d is the intermarker distance in
Mb (Sved 1971 ).
Ancestral relationships between inferred haplotypes were investigated using
the Median-Joining (MJ) network algorithm
(Bandelt et al. 1999 ) within
NETWORK 2.0 software available at
http://www.fluxus-engineering.com/sharenet.htm.
Haplotypes which could not be inferred with confidence were excluded.
Inclusion of all full-length haplotypes generated highly reticulated networks
due to historical recombination events. To reduce reticulation, singleton
haplotypes were excluded except for haplotypes of minisatellite lineages F and
Z, as these lineages were each identified only once in our data set. Haplotype
lengths were restricted to a 3.6-kb block of linkage disequilibrium
surrounding the minisatellite and including all 26 SNPs between INS-11 and
INS-40. Although networks were based on SNP data and not minisatellite
lineage, lineage data were included in haplotype inference. SNP and
minisatellite lineage data are therefore both represented within the network.
The network was rooted at lineages H, S, and V by comparison with the
ancestral states of these 26 SNPs determined from primate sequence analysis.
The ancestral state of INS-14 was ambiguous, so this site was discounted.
Minisatellite structures in primates bear no similarity to human minisatellite
lineages and are uninformative regarding ancestral state
(Stead and Jeffreys 2002 ).
 |
Acknowledgements
|
|---|
We thank J. Clegg and Y. Dubrova for kindly providing DNA samples and M.
Jobling and colleagues for invaluable discussions. This work was supported by
grants to A.J.J. from the Wellcome Trust (ref. 061869/Z/00/Z) and the Royal
Society, and by the McDonald Institute for Archaeological Research.
The publication costs of this article were defrayed in part by payment of
page charges. This article must therefore be hereby marked
"advertisement" in accordance with 18 USC section 1734 solely to
indicate this fact.
 |
Footnotes
|
|---|
[Supplemental material is available online at www.genome.org and at the
authors' Web site; http://www. leicester.ac.uk/genetics/ajj/insulin. The
sequence data from this study have been submitted to GenBank under accession
nos.: human AY138589
[GenBank]
, AY138590
[GenBank]
; chimpanzee AY137496
[GenBank]
, AY137497
[GenBank]
; gorilla
AY137498
[GenBank]
, AY137499
[GenBank]
, AY137500
[GenBank]
; orangutan AY137501
[GenBank]
, AY137502
[GenBank]
, AY137503
[GenBank]
. The SNP
data from this study have been submitted to dbSNP. The following individuals
kindly provided reagents, samples, or unpublished information as indicated in
the paper: J.Clegg and Y.Dubrova.]
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.948003.
3 Corresponding author. E-MAIL
jdstead{at}umich.edu;
FAX (734) 647-4130. 
4 Present address: Mental Health Research Institute, University of
Michigan, Ann Arbor, MI 48109, USA. 
 |
REFERENCES
|
|---|
Bandelt, H.J., Forster, P., and Rohl, A. 1999.
Median-joining networks for inferring intraspecific phylogenies.
Mol. Biol. Evol. 16:
3748.[Abstract]
Barbujani, G., Magagni, A., Minch, E., and Cavalli-Sforza, L.L.
1997. An apportionment of human DNA diversity. Proc.
Natl. Acad. Sci. 94:
45164519.[Abstract/Free Full Text]
Bell, G.I., Horita, S., and Karam, J.H. 1984. A
polymorphic locus near the human insulin gene is associated with
insulin-dependent diabetes mellitus. Diabetes
33:
176183.[Abstract]
Bennett, S.T. and Todd, J.A. 1996. Human type 1
diabetes and the insulin gene: Principles of mapping polygenes.
Annu. Rev. Genet. 30:
343370.[CrossRef][Medline]
Cavalli-Sforza, L., Menozzi, P., and Piazza, A. 1994.
History and geography of human genes, chapter 2.
Princeton University Press, Princeton, NJ.
Chen, F.C. and Li, W.H. 2001. Genomic divergences
between humans and other hominoids and the effective population size of the
common ancestor of humans and chimpanzees. Am. J. Hum.
Genet. 68:
444456.[CrossRef][Medline]
Clark, A.G., Weiss, K.M., Nickerson, D.A., Taylor, S.L., Buchanan,
A., Stengard, J., Salomaa, V., Vartiainen, E., Perola, M., Boerwinkle, E., et
al. 1998. Haplotype structure and population genetic inferences
from nucleotide-sequence variation in human lipoprotein lipase. Am.
J. Hum. Genet. 63:
595612.[CrossRef][Medline]
Doria, A., Lee, J., Warram, J.H., and Krolewski, A.S.
1996. Diabetes susceptibility at IDDM2 cannot be positively
mapped to the VNTR locus of the insulin gene.
Diabetologia 39:
594599.[Medline]
Elbein, S.C., Corsetti, L., and Permutt, M.A. 1985.
New polymorphisms at the insulin locus increase its usefulness as a genetic
marker. Diabetes 34:
11391144.[Abstract]
Excoffier, L., Smouse, P.E., and Quattro, J.M. 1992.
Analysis of molecular variance inferred from metric distances among DNA
haplotypes: Application to human mitochondrial DNA restriction data.
Genetics 131:
479491.[Abstract]
Fullerton, S.M., Clark, A.G., Weiss, K.M., Nickerson, D.A., Taylor,
S.L., Stengard, J.H., Salomaa, V., Vartiainen, E., Perola, M., Boerwinkle, E.,
et al. 2000. Apolipoprotein E variation at the sequence haplotype
level: Implications for the origin and maintenance of a major human
polymorphism. Am. J. Hum. Genet.
67:
881900.[CrossRef][Medline]
Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J.,
Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., et al.
2002. The structure of haplotype blocks in the human genome.
Science 296:
22252229.[Abstract/Free Full Text]
Hill, W. and Robertson, A. 1968. Linkage
disequilibrium in finite populations. Theor. Appl.
Genet. 38:
226231.[CrossRef]
Hudson, R.R., Kreitman, M., and Aguade, M. 1987. A
test of neutral molecular evolution based on nucleotide data.
Genetics 116:
153159.[Abstract/Free Full Text]
Jeffreys, A.J., MacLeod, A., Tamaki, K., Neil, D.L., and Monckton,
D.G. 1991. Minisatellite repeat coding as a digital approach to
DNA typing. Nature 354:
204209.[CrossRef][Medline]
Jeffreys, A.J., Murray, J., and Neumann, R. 1998.
High-resolution mapping of crossovers in human sperm defines a
minisatellite-associated recombination hotspot. Mol.
Cell 2:
267273.[CrossRef][Medline]
Jeffreys, A.J., Kauppi, L., and Neumann, R. 2001.
Intensely punctate meiotic recombination in the class II region of the major
histocompatibility complex. Nat. Genet.
29:
217222.[CrossRef][Medline]
Johnson, G.C., Esposito, L., Barratt, B.J., Smith, A.N., Heward,
J., Di Genova, G., Ueda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F., et al.
2001. Haplotype tagging for the identification of common disease
genes. Nat. Genet. 29:
233237.[CrossRef][Medline]
Jorde, L.B., Bamshad, M., and Rogers, A.R. 1998. Using
mitochondrial and nuclear DNA markers to reconstruct human evolution.
Bioessays 20:
126136.[CrossRef][Medline]
Jorde, L.B., Watkins, W.S., and Bamshad, M.J. 2001.
Population genomics: A bridge from evolutionary history to genetic medicine.
Hum. Mol. Genet. 10:
21992207.[Abstract/Free Full Text]
Jukes, T. and Cantor, C. 1969. Evolution of protein
molecules. In Mammalian protein metabolism (ed. H.
Munro), pp. 21132. Academic Press, New
York.
Julier, C., Hyer, R.N., Davies, J., Merlin, F., Soularue, P.,
Briant, L., Cathelineau, G., Deschamps, I., Rotter, J.I., Froguel, P., et al.
1991. Insulin-IGF2 region on chromosome 11p encodes a gene
implicated in HLA-DR4-dependent diabetes susceptibility.
Nature 354:
155159.[CrossRef][Medline]
Kelsoe, J.R., Stubblefield, B.K., and Ginns, E.I.
1988. Human tyrosine hydroxylase (TH) genomic fragment (pHGTH4)
identifies a PstI polymorphism. Nucleic Acids Res.
16: 7760.[Free Full Text]
Lewontin, R. 1964. The interaction of selection and
linkage. I. General considerations; heterotic models.
Genetics 49:
4967.[Free Full Text]
Lucassen, A.M., Julier, C., Beressi, J.P., Boitard, C., Froguel,
P., Lathrop, M., and Bell, J.I. 1993. Susceptibility to insulin
dependent diabetes mellitus maps to a 4.1 kb segment of DNA spanning the
insulin gene and associated VNTR. Nat. Genet.
4:
305310.[CrossRef][Medline]
Moore, G.E., Abu-Amero, S.N., Bell, G., Wakeling, E.L., Kingsnorth,
A., Stanier, P., Jauniaux, E., and Bennett, S.T. 2001. Evidence
that insulin is imprinted in the human yolk sac.
Diabetes 50:
199203.[Abstract/Free Full Text]
Oda, N., Nakai, A., Fujiwara, K., Imamura, S., Fujita, T.,
Hamagishi, M., Kato, T., Kobayashi, T., Himeno, Y., Yamamoto, K., et al.
2001. Polymorphisms of the insulin gene among Japanese subjects.
Metabolism 50:
631634.[CrossRef][Medline]
Owerbach, D. and Gabbay, K.H. 1993. Localization of a
type I diabetes susceptibility locus to the variable tandem repeat region
flanking the insulin gene. Diabetes
42:
17081714.[Abstract]
Relethford, J.H. and Harpending, H.C. 1994.
Craniometric variation, genetic theory, and modern human origins.
Am. J. Phys. Anthropol.
95:
249270.[CrossRef][Medline]
Relethford, J.H. and Jorde, L.B. 1999. Genetic
evidence for larger African population size during recent human evolution.
Am. J. Phys. Anthropol.
108:
251260.[CrossRef][Medline]
Rotwein, P., Yokoyama, S., Didier, D.K., and Chirgwin, J.M.
1986. Genetic analysis of the hypervariable region flanking the
human insulin gene. Am. J. Hum. Genet.
39:
291299.[Medline]
Rozas, J. and Rozas, R. 1999. DnaSP version 3: An
integrated program for molecular population genetics and molecular evolution
analysis. Bioinformatics
15:
174175.[Abstract/Free Full Text]
Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z., Richter,
D.J., Schaffner, S.F., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald,
G.J., et al. 2002. Detecting recent positive selection in the
human genome from haplotype structure. Nature
419:
832837.[CrossRef][Medline]
Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein,
L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., et
al. 2001. A map of human genome sequence variation containing
1.42 million single nucleotide polymorphisms. Nature
409:
928933.[CrossRef][Medline]
Schneider, S., Roessli, D., and Excoffier, L. 2000.
Arlequin. Genetics and Biometry Laboratory, University of
Geneva, Geneva, Switzerland.
Seino, S., Bell, G.I., and Li, W.H. 1992. Sequences of
primate insulin genes support the hypothesis of a slower rate of molecular
evolution in humans and apes than in monkeys. Mol. Biol.
Evol. 9:
193203.[Abstract]
Smith, N.G. and Lercher, M.J. 2002. Regional
similarities in polymorphism in the human genome extend over many megabases.
Trends Genet. 18:
281283.[CrossRef][Medline]
Stead, J.D. and Jeffreys, A.J. 2000. Allele diversity
and germline mutation at the insulin minisatellite. Hum. Mol.
Genet. 9:
713723.[Abstract/Free Full Text]
Stead, J.D. and Jeffreys, A.J. 2002. Structural
analysis of insulin minisatellite alleles reveals unusually large differences
in diversity between Africans and non-Africans. Am. J. Hum.
Genet. 71:
12731284.[CrossRef][Medline]
Stead, J.D., Buard, J., Todd, J.A., and Jeffreys, A.J.
2000. Influence of allele lineage on the role of the insulin
minisatellite in susceptibility to type 1 diabetes. Hum. Mol.
Genet. 9:
29292935.[Abstract/Free Full Text]
Stephens, M., Smith, N.J., and Donnelly, P. 2001. A
new statistical method for haplotype reconstruction from population data.
Am. J. Hum. Genet. 68:
978989.[CrossRef][Medline]
Strimmer, K. and von Haeseler, A. 1996. Quartet
puzzling: A quartet maximum likelihood method for reconstructing tree
topologies. Mol. Biol. Evol.
13:
964969.
Sved, J.A. 1971. Linkage disequilibrium and
homozygosity of chromosome segments in finite populations. Theor.
Popul. Biol. 2:
125141.[CrossRef][Medline]
Tajima, F. 1989. Statistical method for testing the
neutral mutation hypothesis by DNA polymorphism.
Genetics 123:
585595.[Abstract/Free Full Text] |