|
|
|
|
Vol. 11, Issue 7, 1221-1226, July 2001
LETTER
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Linkage disequilibrium (LD) is a proven tool for evaluating
population structure and localizing genes for monogenic disorders. LD-based methods may also help localize genes for complex traits. We
evaluated marker-marker LD using 43 microsatellite markers spanning
chromosome 20 with an average density of 2.3 cM. We studied 837 individuals affected with type 2 diabetes and 386 mostly unaffected spouse controls. A test of homogeneity between the affected individuals and their spouses showed no difference, allowing the 1223 individuals to be analyzed together. Significant (P < 0.01) LD was
observed using a likelihood ratio test in all (11/11) marker pairs
within 1 cM, 78% (25/32) of pairs 1-3 cM apart, and 39% (7/18) of
pairs 3-4 cM apart, but for only 12 of 842 pairs more than 4 cM apart. We used the human genome project working draft sequence to estimate kilobase (kb) intermarker distances, and observed highly significant LD
(P < 10
10) for all six marker pairs up to 350 kb
apart, although the correlation of LD with cM is slightly better than
the correlation with megabases. These data suggest that microsatellites
present at 1-cM density are sufficient to observe marker-marker LD in
the Finnish population.
| |
INTRODUCTION |
|---|
|
|
|---|
Linkage disequilibrium (LD), the nonrandom association
between alleles of linked markers, reflects the size
of chromosomal segments remaining intact in a population. LD analysis
has proved powerful for high-resolution mapping of disease genes for
monogenic disorders, including cystic fibrosis (Rommens et al. 1989
)
and diastrophic dysplasia (Hastbacka et al. 1994
). In the Finnish population, LD has narrowed the candidate gene interval for many recessive disorders that have one or a few predisposing alleles at a
single locus (Peltonen et al. 1999
).
In principle, LD also may aid in positional cloning of genes for complex traits in founder populations, particularly if one or a small number of founder alleles contributes substantially to disease risk, and if the density of markers is sufficiently high. Marker-marker LD may provide an upper bound for the needed marker density by demonstrating the presence of chromosomal regions inherited together since the population was founded or subjected to a bottleneck.
The limited available experimental data report varying evidence of LD
over cM distances. In one study of 50 Finns, up to one half of the
marker pairs within 2 cM showed LD at P < 0.05 (Peterson et
al. 1995
). However, a study of the X chromosome in 80 Finnish males
reported that only 2 of 16 marker pairs up to 2 cM apart showed
significant LD (P < 0.05) (Laan and Paabo 1997
). On
chromosomes 5, 6, and 8, in an average of 986 haplotypes from Irish
schizophrenia pedigrees, significant LD (P < 0.05) was
observed in 81% of marker pairs within 1 cM and 35% of pairs 1 to 2 cM apart (Kendler et al. 1999
). A genome-wide survey of LD performed
with 5048 microsatellites in 54 independent chromosomes from European,
Utah, and Amish CEPH families detected significant (P < 0.01)
LD in ~4% of markers within 4 cM of one another (Huttley et al.
1999
). Recently, Eaves et al. assessed LD in a 6.5-cM region of
chromosome 18q21 in samples of 800 chromosomes each from Finland,
Sardinia, the United States, and the United Kingdom (Eaves et al.
2000
). They found LD extending up to 1 cM, with somewhat greater LD in
the Finns and Sardinians. Additional data are needed in various
populations and genomic regions to estimate the extent and variability
of LD and to assess whether the presence of LD will be useful for
fine-mapping complex disease genes.
Type 2 diabetes is a common disorder that causes considerable morbidity
and mortality throughout the world. Evidence for a genetic component in
type 2 diabetes has been obtained using twin and family history studies
(Newman et al. 1987
; Rich 1990
; Kaprio et al. 1992
). Several groups
have identified linkage signals for type 2 diabetes affection status
(Ehm et al. 2000
), and one underlying gene has been identified
(Horikawa et al. 2000
). The Finland-United States Investigation of
NIDDM Genetics (FUSION) study aims to identify susceptibility genes for
type 2 diabetes and related quantitative traits in the Finnish
population. In this study, we ascertained type 2 diabetes-affected
sibling pairs and additional relatives (Valle et al. 1998
). A 10-cM
genome scan on >2000 individuals revealed several regions of suggested
affected sib-pair linkage (Ghosh et al. 2000
), including portions of
both arms of chromosome 20 (Ghosh et al. 1999
). To better assess the
importance of these regions, we genotyped a total of 43 markers at 2.3 cM average density on this chromosome. These data provide the
opportunity to assess the degree of marker-marker LD and the distances
over which LD extends. We present here microsatellite marker-to-marker LD across chromosome 20 in 1223 individuals. We show that significant intermarker LD is virtually always observed for marker pairs up to 1 cM
apart, and is generally absent for marker pairs at distances greater
than 4 cM apart.
| |
RESULTS |
|---|
|
|
|---|
Forty-three microsatellite markers spanning chromosome 20 were
genotyped on 837 unrelated Finns affected with type 2 diabetes and 386 of their spouses. Estimated genetic positions of the markers are shown
in Table 1. The 43 markers have an average
estimated heterozygosity of 0.77 and average spacing of 2.3 cM across
the 99.2-cM chromosome. Table 1 also shows the number of alleles we
observed for each marker and the number of alleles we analyzed when
low-frequency alleles were pooled.
|
To determine if the affected individuals and their spouses showed different haplotype frequencies, a permutation test of heterogeneity was performed for all 903 possible pairs of the 43 markers. Among these pairs, 12 were significant at the 0.01 level, consistent with randomness. Further, the distribution of the 903 P-values appeared to be uniform across the interval zero to 1 (data not shown). These results suggested no important differences in haplotype frequencies exist between the affected individuals and controls, allowing us to pool data across the two samples.
After pooling the diabetic individuals and their spouses to generate a
sample of 1223 individuals, we analyzed all 903 possible marker pairs
using a likelihood ratio test for LD. We plotted the
log10
P-values of all 903 comparisons in the combined sample as a
function of estimated intermarker distance (Fig.
1a). A total of 55 marker pairs are
significant at the 0.01 level (
log10 P-value of
2), compared to nine significant results expected by chance in the
absence of LD. LD between marker pairs less than 10 cM apart are shown
in greater detail (Fig. 1b). We observed significant LD between all
marker pairs within 1 cM, 78% between 1 and 2 cM, 79% between 2 and 3 cM, and 39% between 3 and 4 cM (Table 2). Only 1.4% (12/842) of marker pairs greater than 4 cM apart show significant LD at the 0.01 level, slightly greater than that expected by chance alone.
|
|
We also measured LD using a D` statistic modified for multiallelic
markers (Hedrick 1987
). The rise in D` values for intermarker distances
up to 4 cM is similar to the likelihood ratio P-values (Fig.
2). Importantly, this modified D` statistic
is inversely related to sample size, as can be observed by comparing
samples of 386 spouse controls and 837 affecteds to the combined
sample. For distant marker pairs not expected to exhibit LD, the
multiallelic D` statistic approaches asymptotes that are larger for
smaller samples (Fig. 2).
|
The extent of LD depends on recombination rather than physical
intermarker distance, although the presence of LD over kilobase distances can be used during positional cloning studies to determine the marker density needed to detect LD. We used finished and unfinished genomic sequence to estimate the physical distances between our markers. We evaluated the presence of significant LD for all 903 possible pairs of 43 markers, and show marker pairs up to 10 megabases (Mb) apart (Fig. 3). Significant LD at the
0.01 level is observed for 69% (20/29) of markers up to 1 Mb apart,
51% (18/35) between 1 and 2 Mb apart, 12% (4/33) between 2 and 3 Mb
apart, and 6% (2/36) between 3 and 4 Mb apart. Only 1.4% (11/770) of
markers greater than 4 Mb apart showed significant LD. Strongly
significant LD (P < 10
10) was observed for all
six marker pairs up to 350 kb apart, and 75% (9/12) for pairs up to
500 kb apart. As expected, the correlation between LD and physical
distance (Spearman rank correlation r =
0.56) is less strong
than with genetic distance (r =
0.63) for markers up to 10 cM apart.
|
| |
DISCUSSION |
|---|
|
|
|---|
We analyzed marker-marker LD across all of chromosome 20 using a
large sample of 1223 unrelated Finnish subjects. We observed significant (P < 0.01) marker-marker LD for all marker pairs
within 1 cM of each other and continued elevated LD for distances up to
4 cM (Fig. 1). We obtained similar conclusions when analyzing the data
based on significance level 0.001 (Table 2) or 0.05 (Fig. 1). We
observed significant LD (P < 0.01) at estimated physical distances up to 4 Mb, with strongly significant LD (P < 10
10) at distances up to 350 kb. We did not detect regions
of chromosome 20 that exhibited obviously higher or lower LD (data not
shown), although the variable spacing of our markers did not provide
ideal data to address this issue.
Our findings of significant LD cannot be compared directly to results obtained from most other studies using microsatellite markers because the likelihood ratio test is strongly affected by sample size. In particular, our large sample size enables quite sensitive detection of LD. The expected value of the likelihood ratio statistic is proportional to sample size, while P-values are affected even more strongly by sample size. This characteristic is consistent with the smaller number of significant marker pairs observed in the smaller sample of 386 spouses (Table 2).
The likelihood ratio statistic shown here detects the presence but does
not evaluate the strength of LD. To measure the strength of LD and to
compare our data with a recent report (Eaves et al. 2000
), we computed
a multiallelic extension of the D` statistic (Fig. 2). Although this
statistic is valid for the published comparison of samples with the
same size that was typed using the same markers, we realized that the
statistic is not valid for comparison with our data because it is
strongly affected by sample size and allele frequencies. In general,
smaller sample sizes show larger multiallelic D` values. At intermarker
distances well beyond those where we would reasonably observe LD, we
would expect the multiallelic D` values of different sized samples to
approach the same asymptote. However, multiallelic D` values in our
sample of 386 individuals approached an asymptote almost twice as large
as that observed for 1223 individuals. We also investigated a recently
reported statistic (Zhao et al. 1999) that has the advantage of
providing a quantity that appears near mean zero under the null
hypothesis of linkage equilibrium; however, the variance of this
statistic still depends on sample size. We are not aware of a good
summary measure that enables accurate comparison of LD between studies that use different sample sizes and markers with differing numbers of
alleles and differing allele frequencies.
We have previously reported evidence for linkage to type 2 diabetes on
chromosome 20 in our Finnish sample (Ghosh et al. 1999
). For LD
analysis, one might expect heterogeneity when combining case and
control samples. This would likely be true for a simple Mendelian
disease in the region of the disease locus. However, for a complex
disorder such as type 2 diabetes where a particular susceptibility
variant alone likely has only limited independent impact on disease
risk, we would expect little effect of susceptibility variants on LD
along the chromosome. Thus, the similar extent of LD among individuals
affected with diabetes compared to spouse controls (Table 2) and the
homogeneity of haplotype frequency estimates between these groups is
not surprising.
LD may depend on population substructure, which may be advantageous in
the search for genes conferring susceptibility to a complex trait such
as diabetes. Rare disease alleles in families from the north and east
of Finland travel on haplotypes as large as 15 cM (de la Chapelle and
Wright 1998
), suggesting that it may be possible to identify
diabetes-associated haplotypes in similar geographic subsets. Yet a
recent study found no difference between LD in 50 males from
southwestern and eastern regions of Finland, founded ~2000 and ~400
years ago, respectively (Jorde et al. 2000
). Two recent reports
detected only modestly increased LD for the isolated Finnish and
Sardinian populations compared to other European or American samples
for microsatellites (Eaves et al. 2000
) and single nucleotide
polymorphisms (SNPs) (Taillon-Miller et al. 2000
), respectively.
Additional comparative data will help clarify whether LD is increased
in various founder populations, enabling LD to be detected using
smaller sample sizes or a lower density of markers.
Although methods to genotype SNPs are becoming easier and more
accessible (Chen et al. 1998
; Ryan et al. 1999
; Griffin and Smith
2000
), a high-density map of microsatellite markers remains extremely
useful for studies of complex traits. The presence of multiple alleles
allows haplotypes to be inferred more easily, generating more power to
detect LD, although microsatellite marker mutation breaks down
conserved haplotypes, and can also reduce the extent of LD. A
significant challenge for assessing LD with microsatellite markers is
the difficulty in generalizing LD statistics to multiple alleles, as
described above.
We evaluated marker-to-marker LD as part of our investigation of disease variant-to-marker LD. Ideally, a threshold level of LD obtained between neighboring markers would indicate that the interval had been effectively scanned for an associated disease variant, although whether such a threshold can be found remains unclear. It seems highly likely that an interval lacking significant marker-to-marker LD has not yet been scanned thoroughly. Thus, to proceed with fine-mapping of a type 2 diabetes susceptibility gene on chromosome 20, we will add markers to regions where less marker-to-marker LD is observed while searching for alleles or haplotypes associated with disease. Additional empirical data are needed to determine the marker density required for association studies of complex diseases.
| |
METHODS |
|---|
|
|
|---|
Sample
The FUSION study design and first phase of sample recruitment have
been described previously (Valle et al. 1998
). Briefly, we sampled 580 families ascertained through a type 2 diabetes-affected sibling pair,
and collected additional affected and unaffected relatives. A second
cohort includes additional family members as well as a separate set of
275 families. All individuals studied are believed to be of Finnish
heritage based on their birthplaces and their grandparents'
birthplaces within Finland. In this report we used genotype data from
837 unrelated affected individuals and 386 unrelated spouses. Two
hundred six of these spouses tested unaffected by oral glucose
tolerance test (OGTT) in our study, 160 reported themselves to be
unaffected, and 20 had an unknown phenotype.
Markers
We selected 43 microsatellite markers, including 39 dinucleotide
and 4 tetranucleotide repeats, from chromosome 20 maps and genotyped
them as previously described (Ghosh et al. 1997
). We estimated
heterozygosities and the number of alleles from an average of 1142 unrelated individuals per marker. These heterozygosities are very
similar to those we reported previously (Ghosh et al. 1999
), but differ
slightly because we analyzed an expanded sample. Data for all markers
are consistent with Hardy-Weinberg equilibrium.
Maps
We estimated sex-averaged genetic maps from combined data on 983 individuals in 205 FUSION-extended families and from cleaned genotypes
of CEPH pedigrees (Broman et al. 1998
). The genetic map used in these
analyses differs slightly from a previously reported map (Ghosh et al.
1999
) because additional FUSION data have now been incorporated. This
marker order is consistent with the available mapping data produced by
the Chromosome 20 Mapping Group at the Sanger Centre, and were obtained
from the World Wide Web at http://www.sanger.ac.uk/HGP/Chr20/.
We estimated a physical map using chromosome 20 clone sequences available from the Sanger Centre and GenBank as of October 30, 2000. Most sequence data were produced by the human Chromosome 20 Sequencing Group at the Sanger Centre, and can be obtained from ftp://ftp.sanger.ac.uk/pub/human/chr20/. Other data were obtained from GenBank ftp://ncbi.nlm.nih.gov/genbank/genomes/H_sapiens/CHR_20/. We determined the nucleotide positions of our markers on individual clones using electronic PCR and BLAST (http://www.ncbi.nlm.nih.gov/), and used an unpublished clone order (P. Deloukas, pers. comm.) to generate a contiguous map. We assumed the size of six contig gaps to be 100 kb each, and the size of 12 unfinished BACs to be their current sequence length. We also assumed that 100 nucleotides of each clone overlaps the adjacent clone, and that all finished clones are numbered according to their orientation on the chromosome. The resulting physical map, which starts at marker D20S103 and ends at marker D20S173, is 59 Mb long.
Tests of Linkage Equilibrium between Pairs of Markers
We formed 903 = (43*42)/2 marker pairs by considering all
possible pairs of the 43 microsatellite markers. For each marker pair,
we performed joint analyses for the 837 diabetic individuals and their
386 spouse controls. Maximum likelihood estimates for allele
frequencies were obtained for each of the 43 markers by allele
counting. Maximum likelihood estimates of haplotype frequencies for the
903 marker pairs were obtained by employing the
expectation-maximization (E-M) algorithm. We used a likelihood ratio
statistic to test for LD. This statistic compared the maximum
probability of the observed sample of genotypes allowing for LD (based
on estimates of haplotype frequencies) to the maximum probability of
the observed sample assuming linkage equilibrium (based on estimates of
haplotype frequencies calculated as products of the relevant allele
frequency estimates). The distribution of
2 times the natural
logarithm of this likelihood ratio is asymptotically distributed as
2, with degrees of freedom equal to
(n1
1)(n2
1), where
n1 was the number of alleles used for the first
marker, and n2 was the number of alleles used for
the second marker. To reduce the effects of rare alleles and to
preserve the power to reject the null hypothesis of linkage equilibrium
by limiting the degrees of freedom of the test statistic, we pooled
alleles with frequencies less than 0.05.
Prior to pooling the data from the affected individuals and their spouses, we carried out a likelihood ratio test of homogeneity of haplotype frequencies for each pair of markers. For this test we estimated the haplotype frequencies separately for the affected individuals and their spouses using the E-M algorithm. The likelihood ratio statistic compared the product of the maximum likelihoods for the two samples to the maximum likelihood for the two samples analyzed jointly. Again, alleles with frequencies <0.05 in the joint sample were pooled. Because of small estimated haplotype frequencies, we assessed significance levels using a permutation test. For each of the 903 marker pairs, we generated 100 replicate samples by permutation. To construct these permuted samples, we randomly permuted affection status of the 1223 individuals, keeping the marker data the same. For each permuted sample, we calculated the likelihood ratio test for homogeneity of haplotype frequencies. Based on the 100 permuted samples for a marker pair, we estimated the P-value in the test for homogeneity of haplotype frequencies as the proportion of permuted-data statistics greater than the observed-data statistic.
To measure the strength of disequilibrium, we used Lewontin's D`
(Lewontin 1964
) modified for multiple alleles (Hedrick 1987
). For
two-allele markers, D` is the standardized disequilibrium value that
takes the usual disequilibrium coefficient P(AiBj)
P(Ai)P(Bj) and divides it by its maximal
possible value. Given multiple alleles, we calculate the weighted
average of the D` values where the weights are the products of the
corresponding allele frequencies. That is,
D` =
i
j pi qj
|Dij`|,
where pi and qj are allele frequencies at the two loci of interest, and Dij` is the standardized disequilibrium coefficient based on alleles Ai and Bj.
| |
ACKNOWLEGMENTS |
|---|
|
|
|---|
We thank the Finnish citizens who volunteered to participate in the FUSION study. This project was made possible by intramural funds from the National Human Genome Research Institute (project no. OH95-C-N030). The work in Finland was partially supported by the Finnish Academy (38387 and 46558). K.L.M. is a recipient of a Burroughs Wellcome Fund Career Award in the Biomedical Sciences. E.M.L. was supported previously by NIH training grant HG00040. R.M.W. was supported previously by an individual NRSA from the NIH (DK09525), and is now supported by a Career Development Award from the ADA. M.B. is supported by NIH grant HG00376.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
6 These authors contributed equally to this work.
Present addresses: 7Department of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC 27157-1063, USA; 8 The Max McGee National Research Center for Juvenile Diabetes, Children's Hospital of Wisconsin, and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI 53226, USA; 9Science and Technology Division, Corning Incorporated, Corning, NY 14831, USA; 10Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA.
11 Corresponding author.
E-MAIL boehnke{at}umich.edu; FAX (734) 763-2215.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.173201.
| |
REFERENCES |
|---|
|
|
|---|
Received November 29, 2000; accepted in revised form April 17, 2001.
This article has been cited by other articles:
![]() |
G. Tamiya, M. Shinya, T. Imanishi, T. Ikuta, S. Makino, K. Okamoto, K. Furugaki, T. Matsumoto, S. Mano, S. Ando, et al. Whole genome association study of rheumatoid arthritis using 27 039 microsatellites Hum. Mol. Genet., August 15, 2005; 14(16): 2305 - 2321. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. B. Sutter, M. A. Eberle, H. G. Parker, B. J. Pullar, E. F. Kirkness, L. Kruglyak, and E. A. Ostrander Extensive and breed-specific linkage disequilibrium in Canis familiaris Genome Res., December 1, 2004; 14(12): 2388 - 2396. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. E. Lilja, E. Suviolahti, A. Soro-Paavonen, T. Hiekkalinna, A. Day, K. Lange, E. Sobel, M.-R. Taskinen, L. Peltonen, M. Perola, et al. Locus for quantitative HDL-cholesterol on chromosome 10q in Finnish families with dyslipidemia J. Lipid Res., October 1, 2004; 45(10): 1876 - 1884. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Miller, S. L. Zheng, R. L. Dunn, A. V. Sarma, J. E. Montie, E. M. Lange, D. A. Meyers, J. Xu, and K. A. Cooney Germ-line Mutations of the Macrophage Scavenger Receptor 1 Gene: Association with Prostate Cancer Risk in African-American Men Cancer Res., July 1, 2003; 63(13): 3486 - 3489. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Varilo, T. Paunio, A. Parker, M. Perola, J. Meyer, J. D. Terwilliger, and L. Peltonen The interval of linkage disequilibrium (LD) detected with microsatellite and SNP markers in chromosomes of Finnish populations with different histories Hum. Mol. Genet., January 1, 2003; 12(1): 51 - 59. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Heutink and B. A. Oostra Gene finding in genetically isolated populations Hum. Mol. Genet., October 1, 2002; 11(20): 2507 - 2515. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hall, E. M. Wijsman, J. L. Roos, J. A. Gogos, and M. Karayiorgou Extended Intermarker Linkage Disequilibrium in the Afrikaners Genome Res., June 1, 2002; 12(6): 956 - 961. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||