|
|
|
|
Published online before print
May 12, 2004, 10.1101/gr.2165904 Genome Res. 14:1076-1084, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Letter Putative Ancestral Origins of Chromosomal Segments in Individual African Americans: Implications for Admixture Mapping1 Rowe Program in Human Genetics, Departments of Biological Chemistry and Medicine, University of California at Davis, Davis, California 95616-8669, USA 2 National Human Genome Center at Howard University, Washington, District of Columbia 20060, USA 3 Rosalind Russell Medical Research Center for Arthritis, University of California at San Francisco, San Francisco, California 94143, USA
Theoretically, markers that distinguish European from West African ancestry can be used to examine the origin of chromosomal segments in individual African Americans. In this study, putative ancestral origin was examined by using haplotypes estimated from genotyping 268 African Americans for 29 ancestry informative markers spaced over a 60-cM segment of chromosome 5. Analyses using a Bayesian algorithm (STRUCTURE) provided evidence that blocks of individual chromosomes derive from one or the other parental population. In addition, modeling studies were performed by using hidden real marker data to simulate patient and control populations under different genotypic risk ratios. Ancestry analysis showed significant results for a genotypic risk ratio of 2.5 in the African American population for modeled susceptibility genes derived from either putative parental population. These studies suggest that admixture mapping in the African American population can provide a powerful approach to defining genetic factors for some disease phenotypes.
Admixture mapping methods have the potential power to map susceptibility genes in complex genetic diseases and phenotypes (Briscoe et al. 1994
Previous studies have demonstrated that strong linkage disequilibrium (LD) in the admixed African American (AA) populations can be detected between markers separated by >15 cM (Lautenberger et al. 2000
Furthermore, the ability to define ancestry of chromosomal segments in admixed individuals may provide additional power over directly examining linkage disequilibrium. Conceptually, this approach examines the probability of linkage of a trait with the ancestry of a chromosomal location. Theoretically, the ancestral derivation of a particular chromosomal location can be inferred from combining information from multiple loci in the surrounding genome. Because markers separated by >50 kb are unlikely to be in linkage disequilibrium in the parental populations, the ancestry information can be combined by algorithms that condition on linkage disequilibrium created by admixture. Recent studies have developed such computational algorithms and have provided preliminary confirmation that the implementation of this approach in the program STRUCTURE can uncover ancestry relationships (Falush et al. 2003
The current study was initiated to further investigate the ability to assign ancestry for a chromosomal segment in the AA population by using a dense set of ancestry informative markers (AIMs) similar to what can be reasonably achieved in genome-wide studies. For this investigation, we applied the program STRUCTURE to examine the probability of ancestry in unrelated individuals using estimated haplotype data. This differs from the initial STRUCTURE studies of Falush et al. (2003
Estimating Haplotypes in European American (EA), African (AF), and African American (AA) Subjects Previous studies in our laboratory identified and characterized 14 diallelic EA/AF AIMs that were included within a 60-cM segment of human chromosome 5. We reasoned that haplotypes and chromosomal segment structure analysis would be enhanced by the inclusion of additional AIMs. Review of The SNP Consortium (TSC; http://snp.cshl.org/) and Applied Biosystems (https://myscience.appliedbiosystems.com/) databases suggested that additional SNP AIMs in this region could be easily confirmed by using Assay on Demand SNPs and reagents. As shown in Table 1, an additional 15 AIMs were validated by typing >90 AF and >90 EA subjects. Together, 29 markers (mean EA/AF = 0.57; mean EA/AF f = 0.37) spanning a 61-cM distance from 100 to 160.6 cM were selected for the subsequent haplotype and structure studies. The median distance between adjacent markers was 1.2 cM and 1.5 Mb.
AF, EA, and AA subjects were genotyped for these markers, and the haplotypes were estimated for each individual in each population separately by using the PHASE program (Stephens et al. 2001
Examining Ethnic Ancestry Across a Chromosomal Region
For the parental populations, the STRUCTURE analysis showed an overwhelming predominance of chromosomal segments derived from population 1 (positive Ln ratios) for the AF subjects and population 2 (negative Ln ratios) for the EA subjects (Fig. 1). For the AF subjects, only four of 90 individuals showed any chromosomal blocks with LnPR <2.0. Conversely, there were only two chromosomal blocks from the 180 EA chromosomes that appeared to derive from population 1. For AF subjects, the LnPR for 95% of the loci of individual chromosomes exceeded 2.0; for 70%, exceeded 5.0. Similarly in EA, >95% of the loci showed LnPR <2.0, and 65% were <5.0. In contrast to the results for the putative parental population, the AA haplotypes showed substantial contributions from both populations (Fig. 1). As expected, the contribution from population 1 (corresponding to the putative AF population) was much greater than that from population 2 (corresponding to the putative EA population). The mean contribution of ancestry from population 1 was 78%; from population 2, 22%. For many of the chromosomal loci, the probability of correct ancestry determination was high: LnPR >5.0 for 33% of the loci and LnPR <5 for 5% of the loci. For the majority of loci, the LnPR was either >2.0 (66%) or <2.0 (13%). Because there were many segments for which ancestry was uncertain, it is difficult to precisely define the length of the segments in AA derived from each population. The vast majority of chromosomal blocks that were derived from each population were >15 cM: For the 268 AA subjects, there was a total of 29 segments derived from population 2 (EA) that were <15 cM (distance defined as the length of continuous LnPR of <2.0). A recombination frequency between ancestry assignments was estimated at 0.0685 in the AA population, suggesting that, on average, an admixture event took place 6.9 generations ago, assuming a hybrid isolation model (see Discussion). The AA haplotypes were also examined without the putative parental populations by using the same STRUCTURE parameters. The results were nearly the same, with a similar distribution and ancestry probability of the chromosomal segments. However, without putative parental information, the specific ancestry (African or European) of each chromosomal segment cannot be assigned without the additional analyses (inspection of specific genotypes in the context of parental allele frequency information). In contrast, when these analyses were performed with fewer AIMs (20 rather than 29), the number of ambiguous segments was substantially greater when AA haplotypes were examined without the EA and AF haplotypes (data not shown). When the number of AIMs was decreased even further (<15), the segment assignments for the AA haplotypes were much more ambiguous, with >30% of the chromosomal segments showing absolute LnPRs <2.0 even when the EA and AF parental haplotypes were included.
Admixture Mapping Using Simulated Cases and Controls For both the EA and AF susceptibility models, there were peaks in the respective LnPRs (cases minus controls) at the chromosomal location of the modeled markers (Fig. 2). Evidence for the 2.5 RR models was not as strong as for the 4.0 RR models but was still detectable. The peaks were close to the modeled loci: For model 1, hidden locus location was 137.1 and peak probability ratio was 136 cM; for model 2, hidden locus location was 131.5 and peak probability ratio was 131.2 cM.
For these models, the P values were assessed by a comparison of the Ln odd ratio (OR) score between cases and controls using the Wilcoxon rank sum test. Median P values were determined from 100 random samplings of 300 cases and 300 controls from the 500 simulated cases and 500 simulated controls for each model. This assessment of the P value was chosen to minimize aberrant results from sampling variation. For model 1 (AF susceptibility gene) and a sample size of 300 cases and 300 controls, the P values at the best location for RR 4.0 and RR 2.5 (136 cM; Fig. 2) were 2.1 x 1014 and 2.8 x 105, respectively. For model 2 (EA susceptibility gene) the P values for RR 4.0 and 2.5 were 2.3 x 1010 and 3.7 x 105 at the best location (131.2 cM). These P values are still highly significant after conservative Bonferroni adjustment for the 29 loci examined. Examination of flanking markers showed that the P values, as expected, decreased with the distance from the maximum case-control LnPR. An interval defined by a two order of magnitude decrease in the confidence limit was 12 cM for the AF model RR 2.5 and 21 cM for the EA model RR 2.5. As noted in the Discussion, the P values provided here may be substantially lower than a true data set due to the limitation in the original sample number (268 typed AA subjects) and the resampling of multiple subjects in the simulations. Finally, we examined the results obtained for the same models when LD is examined rather than putative assignment of ancestry. For these analyses the log odds ratio was determined by comparing the alleles of each of the 29 markers in the 500 simulated cases with the 500 simulated controls (Fig. 3). In contrast to the results obtained by using the ancestry estimations, the analysis showed inconsistent results. For example, although a strong signal was observed for model 1 RR = 2.5 at the correct location, the peak signal for model 1 RR-4.0 was >10 cM from the simulated susceptibility gene.
In the current study, we show that the West African or European ancestry of chromosomal blocks can be assigned with high probability in the admixed AA population. By using unrelated AA individuals with or without parental information, haplotypes estimated from AIMs typing data were translated into chromosome blocks of ancestral inheritance by using a program, STRUCTURE, which uses clustering algorithms. This is the first demonstration of such chromosomal blocks, although their theoretical existence has been the basis of significant effort in developing admixture mapping tools and techniques. Simulations of cases and controls based on hidden markers provided strong evidence of the accuracy of ancestry assignment as well as the applicability of these methods to admixture mapping of ancestry-linked traits. The ability to assign ancestry to chromosomal segments in an admixed population is dependent on (1) markers with large frequency differences between the contributing ancestral populations, (2) approximation of the number of generations and/or model of admixture, and (3) a sufficiently dense map of informative markers. These conditions are not independent because, for example, the number of generations since admixture and the information content of each marker will determine the density of markers required to accurately assign ancestry to chromosomal blocks.
With respect to the first condition, previous studies as well as the current study have demonstrated that markers that distinguish very large ancestry differences can be readily identified and characterized (Shriver 1997; Smith 2001; Collins-Schramm et al. 2002a
Second, the history of admixture including the number of events over multiple generations and the number of generations since admixture is a critical variable in defining the ancestry of chromosomal blocks. A previous study has provided support for a continuous gene flow model to explain admixture in AA when this model is compared with hybrid isolation for an admixture event occurring 15 generations ago (Pfaff et al. 2001
Third, the current study provides some empirical guidelines with respect to the practical requirements for AIM density and informativeness for initial characterization of chromosomal blocks in the AA population. For the studies herein, with the exception of the extreme ends of the chromosomal segment examined, for each 10-cM or 10-Mb block a cumulative f > 1.5 was achieved. When this density was reduced by It is also worth noting that the best map position of AIMs in a specific admixed population may not be as simple as defining the megabase position or the interpolated genetic map position within the commonly used Genethon or Marshfield map. Hot and cold regions for meiotic recombination favor the use of genetic maps. However, these maps are largely or totally based on analyses of European or EA families and may not accurately reflect the meiotic recombination in other ethnic groups. For the current study, the results obtained using centimorgan or megabase positions were nearly the same, suggesting this may not be critical (data not shown). However, it is not clear whether this may be a critical factor in other genomic regions.
As part of the current study, we used the assignment of ancestry blocks in simulated cases and controls to both support the validity of these assignments as well as to test a potential method for admixture mapping of traits. In these analyses, a GRR of 2.5 in the AA population could be linked to chromosomal blocks both for a locus modeled for a risk gene originating in AF, present in 25% in AF and 0% in EA, and similarly for a risk gene originating in the EA, present in 4% in AF and 31% in EA. It is, of course, uncertain what the actual frequency of risk genes will be in the two populations for different diseases or traits. However, current evidence suggests that many regions of the genome contain sequence variation between major ethnicities that may be the result of selection in one or both populations (Akey et al. 2002
In the current study, very similar results could be obtained without using putative parental population haplotypes. At first, this finding appears surprising given the difficulty in assessing admixture proportions in mixed populations without parental genotypic information. However, in the current study the markers were all tightly linked, presumably allowing the STRUCTURE clustering algorithm to more correctly assign ancestry under the linkage model conditions used. This linkage model, by grouping the linked alleles that must come from the same population, can provide more accurate estimates of the ancestry vector (Falush et al. 2003 This study also had several limitations. First, the study examined only a single genomic region. Second, the uncertainty in haplotype assignment was not accounted for in the STRUCTURE analysis. Notably, the simulated model examined also showed similar results when genotype data rather than the estimated haplotypes were analyzed. However, the genotype data do not provide an explicit examination of ancestry of both chromosomes for a given chromosomal segment but rather estimate the probability for the population of origin of the maternal and paternal alleles. Under the genotype model, the ancestry haplotype of the segment cannot be derived. At present, it is unclear whether estimation of haplotypes will improve the power of admixture mapping, and this issue, including an approach to account for uncertainty in marker haplotypes, is under further investigation. The third, and perhaps most important, limitation is that the simulations were based on the typing results of only 268 subjects and thus required multiple sampling of the same subjects to obtain the sample sizes used for the 500 simulated cases and 500 simulated controls for each model. For the 2.5 GRR models, a total of 209 and 222 of the 268 were represented in the cases for the two models, respectively, compared with 267 and 268 of the 268 subjects that were represented in the simulated controls for these models. Of more potential concern, due to chance, 10 of the subjects were represented >5 times (maximum, nine times) for the cases in model 1, and 14 of the subjects were represented >5 times (maximum, 10 times) for the case in model 2. Thus the P values presented in this article are likely to be larger than with real data sets due to this limitation. Thus, additional and more extensive studies of multiple genomic regions will be needed to prove the general applicability of the current results.
In the current study, the linkage of modeled susceptibility genes to the estimated ancestry of chromosomal segments provided a much more powerful approach than did examination of LD to specific markers. The latter was inconsistent, presumably based on the limited information provided by each marker and the chance that some individual markers will not be in strong LD with the modeled susceptibility gene. For example, in model 2, GRR = 4.0, the marker closest to the modeled susceptibility gene showed a log OR of 0.25, whereas a marker 10 cM proximal to the modeled susceptibility gene showed a log OR of
Previous studies have examined the potential power of admixture with respect to population risk ratios (McKeigue 1998
GRRs in the admixed population and disease allele frequencies in the parental populations can also be related to disease prevalence that can be attributed to the putative susceptibility genes provided that a probability of disease in noncarriers is estimated (Fig. 4C,D). If we assume a disease probability in noncarriers of 5 x 104, the simulated disease locus for model 1 GRR2.5 and model 1 GRR4 correspond to disease prevalences of 9 x 104 and 1.5 x 103, respectively. For model 2 GRR 2.5 and GRR 4.0, the corresponding disease prevalences are 6 x 10 4 4, respectively. Thus, under certain disease allele fre- and 8 x 10 quencies, the current study supports the ability of the admixture mapping approach to define susceptibility genes that can account for relatively small differences in disease prevalence.
In this study, a possible method for admixture mapping in a case/control study is illustrated. Other methods of admixture mapping are currently under development in several laboratories. These include (1) a method in which linkage is examined by conditioning on the estimated admixture of both parents by using a multipoint analysis of the marker data informative for ancestry, even when parental data is missing (McKeigue 1998
Populations and Samples Blood- or buccal-cell samples were obtained from all individuals, according to protocols and informed-consent procedures approved by institutional review boards and were labeled with an anonymous code number. DNA samples were prepared from blood or buccal-cells as previously described (Bali et al. 1999
Markers and Conditions
The indel markers were amplified by using a standard PCR protocol previously reported (Collins-Schramm et al. 2002b Fourteen of the SNPs were genotyped by TaqMan assays scanned on an ABI 7900 Analyzer (PE Applied Biosystems). For 12 of the SNPs, the manufacturer's conditions and reagents for Assays-on-Demand were used (PE Applied Biosystems, CV numbers provided in Table 1). For TSC0569173 ABI Primers-by-Design (PE Applied Biosystems) were obtained and assayed by using the same condition as the Assays-on-Demand SNPs. The TaqMan assays were analyzed by using software (SDS v.2.0) provided by the manufacturer (PE Applied Biosystems). The TSC0232289 and TSC0237153 SNPs were assayed by a primer extension method using the ABI Prism SNaPshot Multiplex System kit (PE Applied Biosystems) and the ABI 3700 DNA Analyzer. The PCR primers for TSC0232289 were as follows: forward, 5'-CCAACCCCTTTACTAGGCACAT-3'; reverse, 5'-GG GAATCCCAGGGAATACGTTA-3'. The primer used for the extension reaction was AAACTAACAAAACACACCCTAAATGCA TCTAA. The PCR primers for TSC0237153 were as follows: forward, 5'-GACCAAAGACAGCAGGTTTGC-3'; reverse, 5'-TAGCCCTGCTAAGTAGTCCATTCC-3'. The primer used for the extension reaction was AAAAAAAAAAAAAAGCCTTTCCA GAATCTCTGAGGTCA. The primer extension data was analyzed by using the GeneScan and Genotyper software from ABI.
Estimation of Haplotypes
Statistical Analyses of Chromosomal Segment Structure
Statistical Analyses of Population Variances
Disease Model Simulations
Wilcoxon Rank Test
Support for this research was provided by National Institute of Health grants U01-DK57249, AR44804, and AR20684. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2165904. Article published online before print in May 2004.
4 Corresponding author.
Akey, J.M., Zhang, G., Zhang, K., Jin, L., and Shriver, M.D. 2002. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12: 18051814. Bali, D., Gourley, I.S., Kostyu, D.D., Goel, N., Bruce, I., Bell, A., Walker, D.J., Tran, K., Zhu, D.K., Costello, T.J., et al. 1999. Genetic analysis of multiplex rheumatoid arthritis families. Genes Immunol. 1: 2836.[CrossRef]
Briscoe, D., Stephens, J.C., and O'Brien S.J. 1994. Linkage disequilibrium in admixed populations: Applications in gene mapping. J. Hered. 85: 5963. Chakraborty, R., Kamboth, M.I., Nwankwo, M., and Ferrell, R.E. 1992. Caucasian genes in American blacks: New data. Am. J. Hum. Genet. 50: 145155.[Medline] Collins-Schramm, H.E., Kittles, R.A., Operario, D.J., Weber, J.L., Criswell, L.A., Cooper, R., and Seldin, M.F. 2002a. Markers that discriminate between European and African ancestry show limited variation within Africa. Hum. Genet. 111: 566569.[CrossRef][Medline] Collins-Schramm, H.E., Phillips, C.M., Operario, D.J., Lee, J.S., Weber, J.L., Hanson, R.L., Knowler, W.C., Cooper, R., Li, H., and Seldin, M.F. 2002b. Ethnic difference markers for use in mapping by admixture linkage disequilibrium. Am. J. Hum. Genet. 70: 737750.[CrossRef][Medline] Collins-Schramm, H.E., Chima, B., Operario, D.J., Criswell, L.A., and Seldin, M.F. 2003. Markers informative for ancestry demonstrate consistent megabase-length linkage disequilibrium in the African American population. Hum. Genet. 113: 211219.[CrossRef][Medline]
Falush, D., Stephens, M., and Pritchard, J.K. 2003. Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164: 15671587. Hoggart, C.J., Parra, E.J., Shriver, M.D., Bonilla, C., Kittles, R.A., Clayton, D.G., and McKeigue, P.M. 2003. Control of confounding of genetic associations in stratified populations Am. J. Hum. Genet. 72: 14921504.[CrossRef][Medline] Lander, E. and Kruglyak, L. 1995. Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat. Genet. 11: 241247.[CrossRef][Medline] Lautenberger, J.A., Stephens, J.C., O'Brien, S.J., and Smith, M.W. 2000. Significant admixture linkage disequilibrium across 30 cM around the FY locus in African Americans. Am. J. Hum. Genet. 66: 969978.[CrossRef][Medline] McKeigue, P.M. 1998. Mapping genes that underlie ethnic differences in disease risk: Methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am. J. Hum. Genet. 63: 241251.[CrossRef][Medline] McKeigue, P.M., Carpenter, J.R., Parra, E.J., and Shriver, M.D. 2000. Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: Application to African-American populations. Ann. Hum. Genet. 64: 171186.[CrossRef][Medline] Parra, E.J., Kittles, R.A., Argyropoulos, G., Pfaff, C.L., Hiester, K., Bonilla, C., Sylvester, N., Parrish-Gause, D., Garvey, W.T., Jin, L., et al. 2001. Ancestral proportions and admixture dynamics in geographically defined African Americans living in South Carolina. Am. J. Phys. Anthro. 114: 1829.[CrossRef][Medline] Pfaff, C.L., Parra, E.J., Bonilla, C., Hiester, K., McKeigue, P.M., Kamboh, M.I., Hutchinson, R.G., Ferrell, R.E., Boerwinkle, E., and Shriver, M.D. 2001. Population structure in admixed populations: Effect of admixture dynamics on the pattern of linkage disequilibrium. Am. J. Hum. Genet. 68: 198207.[CrossRef][Medline]
Pritchard, J.K., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945959.
Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., and Feldman, M.W. 2002. Genetic structure of human populations. Science 298: 23812385. Rybicki, B.A., Iyengar, S.K., Harris, T., Liptak, R., Elston, R.C., Sheffer, R., Chen, K.M., Major, M., Maliarik, M.J., and Iannuzzi, M.C. 2002. The distribution of long range admixture linkage disequilibrium in an African-American population. Hum. Hered. 53: 187196.[CrossRef][Medline] Shriver, M.D., Smith, M.W., Jin, L., Marcini, A., Akey, J.M., Deka, R., and Ferrell, R.E. 1997. Ethnic-affiliation estimation by use of population-specific DNA markers. Am. J. Hum. Genet. 60: 957964.[Medline] Smith, M.W., Lautenberger, J.A., Doo Shin, H., Chretien, J., Shrestha, S., Gilbert, D.A., and O'Brien, S.J. 2001. Markers for mapping by admixture linkage disequilbrium in African American and Hispanic populations. Am. J. Hum. Genet. 69: 10801094.[CrossRef][Medline] Stephens, J.C., Briscoe, D., and O'Brien, S.J. 1994. Mapping by admixture linkage disequilibrium in human populations: Limits and guidelines. Am. J. Hum. Genet. 55: 809824.[Medline] Stephens, M., Smith, M.J., and Donnelly, P. 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68: 978989.[CrossRef][Medline] Wahlund, S. 1928. Zusammensetzung von Populationen und Korrelationserscheinungen von Standpunkt der Vererbungslehre aus betrachtet Hereditas 11: 65106. Weber, J.L., David, D., Heil, J., Fan, Y., Zhao, C., and Marth, G. 2002. Human diallelic insertion/deletion polymorphisms. Am. J. Hum. Genet. 71: 854862.[CrossRef][Medline] Zheng, C. and Elston, R.C. 1999. Multipoint linkage disequilibrium mapping with particular reference to the African-American population. Genet. Epidemiol. 17: 79101.[CrossRef][Medline]
https://myscience.appliedbiosystems.com/; Applied Biosystems databases. http://pritch.bsd.uchicago.edu/software/readme/readme.html; for documentation for Structure Software, version 2. http://snp.cshl.org/; the SNP consortium Web site, for initial screening information on the SNPs utilized in this study and Asian typing results. http://research.marshfieldclinic.org/genetics; the Marshfield Center for Medical Genetics, for initial screening information of the MIDs used in this study. including allele frequencies in several populations, and for cM positions. http://genome.ucsc.edu/; UCSC Human Genome Project Working Draft, for megabase positions of MIDs.
Received November 12, 2003;
accepted in revised format February 19, 2004.
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||