|
|
|
|
Published online before print
July 15, 2004, 10.1101/gr.2204604 Genome Res. 14:1624-1632, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Methods Haplotype and Missing Data Inference in Nuclear FamiliesMcKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
Determining linkage phase from population samples with statistical methods is accurate only within regions of high linkage disequilibrium (LD). Yet, affected individuals in a genetic mapping study, including those involving cases and controls, may share sequences identical-by-descent stretching on the order of 10s to 100s of kilobases, quite possibly over regions of low LD in the population. At the same time, inferring phase from nuclear families may be hampered by missing family members, missing genotypes, and the noninformativity of certain genotype patterns. In this study, we reformulate our previous haplotype reconstruction algorithm, and its associated computer program, to phase parents with information derived from population samples as well as from their offspring. In applications of our algorithm to 100-kb stretches, simulated in accordance to a Wright-Fisher model with typical levels of LD in humans, we find that phase reconstruction for 160 trios with 10% missing data is highly accurate (>90%) over the entire length. Furthermore, our algorithm can estimate allelic status for missing data at high accuracy (>95%). Finally, the input capacity of the program is vast, easily handling thousands of segregating sites in 1000 chromosomes.
The field of elucidating complex diseases by using genetic methods is still without a generalized methodology. Association studies hold great promise in overcoming remaining challenges (Lander 1996
Single nucleotide polymorphisms (SNPs) have come into the fore as the marker of choice for association studies. Recent articles have clarified their various properties in the context of gene mapping. Individually, a SNP has the greatest power for detection of association to disease when it is the actual susceptibility mutation. Otherwise, even when high linkage disequilibrium (LD) exists between the two, the power of a SNP to detect a disease variant diminishes as its allele frequency differs from that of the disease mutation (Muller-Myhsok and Abel 1997 Unfortunately, phase information is not readily discernible from standard DNA molecular methodologies for diploid organisms. Three methods exist by which this information can be obtained: experiments, statistical algorithms exploiting population LD, and inference in pedigrees.
Experimental methods of haplotyping include diploid-to-haploid conversion (Papadopoulos et al. 1995
With regard to computational approaches, there exist three main methods of reconstructing the haplotypes of population samples. First, Clark's parsimony method (Clark 1990
In Lin et al. (2002
The final method of discerning phase is to genotype related individuals and deduce haplotypes within families. This approach is not without its own complexities, as evidenced by the plethora of strategies put forth to construct haplotypes in the context of linkage mapping (Kruglyak et al. 1996
As pointed out by Schaid et al. (2002
However, because the Rohde-Fuerst program (http://www.bioinf.mdc-berlin.de/
There are also front-end programs to existing phasing algorithms, for example, PHamily (http://archimedes.well.ox.ac.uk/pise). This program determines parental sequences rendered completely unambiguous by their children's haplotypes. Users then input these haplotypes with remaining ambiguous parental diplotypes into PHASE. Unfortunately, because the downstream analysis program (PHASE) is unaware of the upstream family structure (PHamily), this two-step approach, when applied to large data sets, often results in reconstructions that require inordinate numbers of double cross-over events within individual families. As a result their utility in post-haplotype reconstruction analyses, such as in transmission disequilibrium tests (Spielman et al. 1993
In this study, we reformulate our previous algorithm and its associated computer program (Lin et al. 2002
The performance of our previous and current methods on the eight X-linked genomic regions is shown in Table 1 (of note, males in these simulations were given two X-chromosomes). Comparison of our previous and current programs on single individuals reveals a modest but statistically significant improvement of 3.0% (P < 0.0001). As our current program has the capacity to handle nuclear family data, it was tested on data in which zero to three children were assigned randomly to couples. This accuracy increased more markedly by 12.1% (P < 0.0001) to 95.3% over the original result.
Comparing switch accuracy of our program to that of Rohde and Fuerst (2001 The real boon of incorporating family information is the ability to reconstruct accurately haplotypes over long distances, namely, over multiple regions of low LD. This feature is, of course, unappreciable by metrics such as switch accuracy. To evaluate phase reconstruction beyond neighboring phase relations, pairwise accuracies of reconstructed haplotypes were plotted against bins of 5 kb. Figure 1 shows the results derived from 100 random pairings of the eight X-linked regions, one curve with no children assigned, the other with one child. Clearly, the accuracy of phase relations drops off much more rapidly when offspring information is excluded.
Given that parents and children were simulated from the same empirically derived, X-linked sequences with missing data, the resultant correlations of dropped calls between parents and children could possibly confound the benefit observed upon including children's diplotypes. Moreover, only 40 haplotypes were available for each locus. Clearly, real-life studies requiring phase reconstruction will include many more chromosomes. As such, we turned to simulated haplotypes. Each of the 50 groups of 40, 640, and 1280 haplotypes were randomly paired twice to form 10, 160, and 320 couples, respectively (see Methods). This data was input into our program with all couples having no children, one child, and so on up to three children. Ten percent of genotypes were randomly dropped to simulate missing data. From phase reconstructions of parents only, we constructed the pairwise accuracy versus distance curves shown in Figure 2, plots a through c. It is apparent that inclusion of diplotypes of more than one child does not greatly improve phase reconstruction. Also, when no children are used, the decay of accuracy over distance is less steep when 320 chromosomes are phased as opposed to 40, although no further improvement is observed for 1280. In general, these trends hold as well when only common sites (minor allele frequency >0.10) are considered (Fig. 2, plots e,f). When no children are used, the decay of accuracy over distance is less steep for sequences with only common sites compared with sequences with all sites.
Because Figure 2 was generated from input with 10% dropped calls and the simulated sequences had no missing data originally, we were able to calculate the accuracy of missing data calls as well (Table 2). Theoretically (and in data not shown) if single individuals are phased randomly, the baseline switch accuracy should be 0.5. An analogous baseline for missing data calls is not so obvious given the different frequencies of alleles at each site. Thus, the baselines for missing data calls were computed by filling dropped positions at random in accordance to the site's allelic frequencies. We see in Table 2 that accuracies of missing data inferences increase the more chromosomes are phased and when more children diplotypes are included but decrease when only common sites are considered.
To measure the performance of our program in the face of large amounts of missing information, we input data in which all the fathers were completely composed of dropped calls. The rate of dropped calls for all other input sequences was maintained at 10%. We constructed pairwise accuracy versus distance curves (Fig. 3) and measured missing call accuracy (Table 3) as before for the mothers only. From Figure 3, it can be seen that phase accuracy plateaus when two children are included as opposed to one child. Otherwise, the various trends observed previously for input in which fathers' diplotypes are included hold just as well, although both phase and missing call accuracies are in general higher for the previous data set.
To demonstrate the input capacity of our new program, we input 20 sets of data simulating 1 Mb of diplotypes from 250 couples, some having as many as nine children. Moreover, we varied the amount of missed data and simulated recombination events in the input data. The results of these runs are shown in Table 4. Running these files using the same parameters as before on Pentium 4 3.06-Ghz processors took less than a half hour each.
To improve haplotype reconstruction, other groups have accounted for LD in their algorithms. Li and Stephens (2003
To improve phasing, we instead took the route of including familial information because individuals are, for many study designs, genotyped within familial contexts. It has been demonstrated both theoretically and by simulation that inclusion of family information significantly increases the accuracy of haplotype reconstruction (Rohde and Fuerst 2001 Our work represents an implementation that increases the accuracy of phase reconstructions for random population samples and greatly expands the input capacity over the current alternative methodology. Another distinction of our computer program is its ability to handle most instances of recombination. The two cases it cannot handle, that is, families with more than one child in which every child inherits at least one recombined haplotype or in which a child inherits two recombined haploypes, occurs with probability one in 10,000 or less when regions of 1 Mb are considered (a rate of 30 recombination events/3 x 109 bp is assumed). In our experience applying the program to real data sets, the recombination feature is most commonly invoked due to unexpected and most likely nonbiological scenarios. For instance, in a data set comprised of 21 segregating sites covering 120 kb from 33 families, the haplotypes of one family with five children were reconstructed such that one child inherited a haplotype with two recombination events, another with six. Recombination events were not inferred in the other children. Clearly, such a result cannot be taken at face value, biologically speaking. Much more probable explanations are genotyping error, data mishandling, or nonpaternity. Thus, our method can uncover such anomalies, which warrants further inspection. In the analysis of our program's capability to fill in missing genotype calls, we find that our program performs with a high degree of accuracy. However, in contrast to phase accuracy, missing call accuracy decreases when the input is comprised of only common sites. Both these phenomena are more easily understood when rare sites are considered. In terms of phasing, such sites are difficult to reconstruct because haplotypes with the minor allele appear rarely in the sample. In other words, there is little information in the sample with which to correctly reconstruct such haplotypes. On the other hand, dropped calls with corresponding sites that have low minor allele frequency are correctly filled in at a higher rate merely by dint of chance. Indeed, this explains the pronounced increase in baseline missing call accuracy when the input data consist of 640 and 1280 chromosomes as opposed to 40, all sites considered. Samples with greater numbers of chromosomes are sure to turn up more segregating sites, the vast majority of which are rare. By excluding sites in which these two influences are operating, the observed results follow. For the tested data sets in which all the fathers' genotypes were assigned missing data, the program did output haplotype reconstructions for these individuals. However, a cautionary word should be given to those who may be tempted to use such sequences for downstream analysis. In examinations of such output, it was observed that the number of inferred heterozygous sites was markedly undercalled (data not shown). The extent of this phenomenon gradually diminished as couples were assigned more children, until it all but disappeared at five children. The same was observed in examinations of haplotype reconstructions for parents in simulated input of sibship data. Clearly, haplotyping parents for whom diplotype data are unavailable is difficult. Apparently, the probability that all four parental haplotypes will be represented in children is quite high when families have five children, thus fully determining at least parental diplotypes if not haplotypes.
With regard to phasing sequences without children, Stephens and Donnelly (2003
The development of a haplotyping program that is capable of handling data on the order of megabases and incorporates family information is timely. Every month, a plethora of positive findings from genome-wide linkage analyses are entered into the literature. Unless promising candidates are immediately implicated within these linkage loci, follow-up fine-mapping of regions spanning megabases will be required. Tests of association with SNPs are the mainstay of such endeavors, and procuring phase information may potentially increase the power for detection. Simply using test statistics that expend degrees of freedom to test all different haplotypes depresses power (Chapman et al. 2003
Of late, several groups have published methods in which EM algorithm-based procedures are used to infer haplotypes to be used for transmission disequilibrium tests (Zhao et al. 2000
Data Analyzed Five sets of haplotypes were used in the various simulations. The first was comprised of eight X-linked genomic regions derived from 40 male subjects from the National Institutes of Health Polymorphism Discovery Resource at the Coriell Institute for Medical Research (Collins et al. 1998
The second, third, and fourth sets of haplotypes were generated from a computer program similar to Hudson's (2002
The final set of haplotypes was similarly generated from the aforementioned program. Twenty groups of 1000 haplotypes were produced with
Infinite-Alleles Algorithm
With nuclear family data, our program reconstructs the haplotypes of parents with children's genotypes used to constrain the former's haplotype space. The iterative stochastic sampling process underlying both SSD and our previous program is retained. In all, the program still captures the strategy set forth in Stephens et al. (2001 The algorithm implemented in our program is as follows. First, the constraints placed on couples with children are taken into account. For a given family i, let element a represent the four parental haplotypes and am1, am2, af1, and af2 be the two haplotypes of the mother and father, respectively. At the start, certain sites of a will be ambiguous due to heterozygosity and missing data. The children are queried in the following manner to fill in these ambiguities. am1 and af1 are arbitrarily chosen to be the inherited haplotypes of the first child considered, whereupon certain sites in a may be definitively filled in. However, other arrangements only allow a constraint to be placed. For example, if the mother, father, and child are all heterozygous at a particular site, then whatever am1 is designated to be for that site, af1 must have the other allele. A list of all constraints is stored.
If family i has more than one child, the same process is repeated. However, instead of am1 and af1 being necessarily chosen, in certain cases enough information has already been filled in such that the inherited haplotypes of the second child are apparent. If more than one out of the four possible pairings of the parental haplotypes are consistent with the second child, each one of these possibilities, designated ab, where b = 1...c, c Within the framework of the above process, inconsistencies from recombination events appear as if distinct regions of a child's genotype appear to be from two different pairings of parental haplotypes. In this situation, the child's genotype is replaced by two genotypes. The original genotype is split into two sequences at the site where a recombination event is posited. The resultant partial sequences are then padded with missing data. The process of filling in ambiguous sites and enumerating constraints occurs as before, except for the replacement. For example, in a particular family, after having extracted information from the first child's genotype, suppose the first half of the second child's genotype is consistent with the pairing of the first halves of haplotypes am1 and af1 and those the second half with of am1 and af2. Say the second child's sequence is GGGG. This child's genotype is then replaced with the GGNN and NNGG. If family i has more than two children, the process of filling in ambiguous sites and enumerating constraints described above is repeated with the third (and subsequent) child(ren) for each of the ab. Some ab may spawn yet more possibilities whereupon c will be incremented. On the other hand, all four pairings of the parental haplotypes of certain ab may be inconsistent with the third child. These ab are discarded, and c is decremented. If at any time in the process c reaches zero, a recombination event is posited for that child. The process for handling recombination is order dependent in that different numbers of recombinations will be inferred in different places depending on the order in which the children are processed. As such, the order of the children are permuted, and the arrangement (possibly multiple ones) with the least number of recombinations is retained as ab. It should be noted that this program can handle multiple recombinations along the transmitted haplotype from one parent. However, because in our scheme a recombinant child's genotype is split in two (and not more), families with multiple children in which every child inherits a recombined chromosome or in which a child inherits two recombined chromosomes cannot be analyzed. In these exceedingly rare cases, the program exits. At the end of this process, family i will have a corresponding set {ab}i. Each element in set{ab}i has a different list of constraints and residual ambiguous sites. For single individuals or equivalently, couples without children, each individual will have an analogous set {ab} with necessarily one element consisting of two haplotypes and a list of ambiguous sites. Second, phase reconstruction is first carried forth in blocks of high LD. Restricting all operations to the portions of {ab} relevant to the block at hand for family i, all four haplotype elements consistent with any element in {ab}i are enumerated and collected into a set, say H. The MCMC chain is then started. Let hi denote the four parental haplotypes of family i (or two haplotypes of single individual i) at a given point in the MCMC chain and h = (h1,..., hn), the collection of such families and individuals. h is initially constructed by randomly choosing elements from the corresponding sets H before the first iteration. The algorithm then proceeds as follows.
An analogous procedure is followed if i corresponds to a single individuals without children. More specifically, instead of four haplotypes being considered in the various steps, two are. As the chain is run, realizations h are stored periodically in accordance with parameters specified by the user. For each saved realization, the haplotypes of hi are consistent with some elements of {ab}i. A running tabulation of these occurrences is kept for each element of {ab}i. After the chain is stopped, the element in {ab}i with the highest tabulation is saved and the rest are discarded. The haplotype space for each family is trimmed in this way. Third, the whole process in the prior step is repeated with each set {ab}i containing only one element. At the termination of the chain, the stored realizations are used to make final reconstructions of haplotypes within the blocks. For families, the most frequently saved haplotype reconstruction is chosen. For individuals, sites with missing data are filled in with the most common nucleotide found at corresponding sites among the stored haplotypes. The first heterozygous position is called arbitrarily. For the second heterozygous position, the most common nucleotide is chosen from those haplotypes with the specific nucleotide of the first call. Likewise, for all subsequent positions, the most common nucleotide is chosen conditional upon the specific call made at the immediate prior heterozygous site. Fourth, phasing is next performed between blocks again by iterating an MCMC chain.
Measures of Accuracy In all simulations in this article, the following input parameters were specified for the execution of our program. For phasing within regions of high LD and between blocks, our computer program was run at 10,000 iterations at each stage, with the first 5000 discarded as "burn-in" and the remainder thinned by storing every 20th iteration.
To measure the accuracy of phase relations, the outputs of the parental and single individual haplotypes from the various programs tested in this study were compared with the original haploid sequences, and for genotypes with more than one heterozygous site, the accuracy was scored by switch accuracies (Lin et al. 2002
To compare between our previous and current algorithm, 100 simulations were performed for each X-linked region. For the latter program, the process was repeated except the diplotypes were randomly paired to form couples, and zero to three children were randomly assigned to each family (thus, fathers were assigned two X-chromosome haplotypes). Of note, a family without children was treated as two single individuals in the algorithm. The Wilcoxon rank-sum test was used to test for statistical significance (Rosner 2000
In comparisons of our current program to that of Rohde and Fuerst (2001 To test the input capacity of our new program, each of the 20 groups of 1000 haplotypes were randomly paired twice to form 250 couples. These couples were assigned numbers of children based on the Poisson distribution with the average set to one. Given that the sequences were supposed to simulate 1 Mb, recombinations were chosen to occur at a rate of 0.01 per meiosis (30 recombination events/3 x 109 bp/meisosis was assumed). To simulate input with missing data, the individual sites of the diplotypes of parents and children, if assigned, were dropped at rates of 0, 0.05, 0.10, or 0.15. To determine the baseline rates of accuracy for missing data, dropped calls were filled in randomly in proportion to the binomial frequencies of corresponding sites and scored for accuracy. (In the same spirit, baseline switch accuracy is switch accuracy if phase is determined at random, i.e., 0.5.) To measure the accuracy of missing data handling in parents and single individuals, the closest heterozygous sites were located to haplotype sites for which the corresponding diplotype calls were dropped. This step allowed the two reconstructed haplotypes of each diplotype to be matched with the two corresponding true haplotypes in a local manner.
We would like to thank Y. Lin for his technical assistance. This research was supported by National Institutes of Health grants HG02757 and HL54466 to A.C. and D.J.C. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be herebymarked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2204604. Article published online ahead of print in July 2004.
1 Corresponding authors.
Akey, J., Jin, L., and Xiong, M. 2001. Haplotypes vs. single marker linkage disequilibrium tests: What do we gain? Eur. J. Hum. Genet. 9: 291300.[CrossRef][Medline] Becker, T. and Knapp, M. 2002. Efficiency of haplotype frequency estimation when nuclear family information is included. Hum. Hered. 54: 4553.[CrossRef][Medline] Chapman, J.M., Cooper, J.D., Todd, J.A., and Clayton, D.G. 2003. Detecting disease associations due to linkage disequilibrium using haplotype tags: A class of tests and the determinants of statistical power. Hum. Hered. 56: 1831.[CrossRef][Medline]
Cheng, R., Ma, J.Z., Wright, F.A., Lin, S., Gao, X., Wang, D., Elston, R.C., and Li, M.D. 2003. Nonparametric disequilibrium mapping of functional sites using haplotypes of multiple tightly linked single-nucleotide polymorphism markers. Genetics 164: 11751187. Clark, A.G. 1990. Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7: 111122.[Abstract]
Collins, F.S., Guyer, M.S., and Charkravarti, A. 1997. Variations on a theme: Cataloging human DNA sequence variation. Science 278: 15801581.
Collins, F.S., Brooks, L.D., and Chakravarti, A. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8: 12291231.
Cutler, D.J., Zwick, M.E., Carrasquillo, M.M., Yohn, C.T., Tobin, K.P., Kashuk, C., Mathews, D.J., Shah, NA., Eichler, E.E., Warrington, J.A., et al. 2001. High-throughput variation detection and genotyping using microarrays. Genome Res. 11: 19131925. Dawson, E., Abecasis, G.R., Bumpstead, S., Chen, Y., Hunt, S., Beare, D.M., Pabial, J., Dibling, T., Tinsley, E., Kirby, S., et al. 2002. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418: 544548.[CrossRef][Medline] Douglas, J.A., Boehnke, M., Gillanders, E., Trent, J.M., and Gruber, S.B. 2001. Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nat. Genet. 28: 361364.[CrossRef][Medline] Ewens, W.J. 1972. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3: 87112.[CrossRef][Medline] Ewens, W. 1979. Mathematical population genetics. Springer-Verlag, New York. Excoffier, L. and Slatkin, M. 1995. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12: 921927.[Abstract] Excoffier, L., Laval, G., and Baldin, D. 2003. Gametic phase estimation over large genomic regions using an adaptive window approach. Hum. Genom. 1: 719. Halushka, M.K., Fan, J.B., Bentley, K., Hsie, L., Shen, N., Weder, A., Cooper, R., Lipshutz, R., and Chakravarti, A. 1999. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 22: 239247.[CrossRef][Medline] Hoppe, F.M. 1987. The sampling theory of neutral alleles and an urn model in population genetics. J. Math. Biol. 25: 123159.[Medline]
Hudson, R.R. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337338.
Hudson, R.R. and Kaplan, N.L. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147164. Kruglyak, L., Daly, M.J., Reeve-Daly, M.P., and Lander, E.S. 1996. Parametric and nonparametric linkage analysis: A unified multipoint approach. Am. J. Hum. Genet. 58: 13471363.[Medline]
Lander, E.S. 1996. The new genomics: Global views of biology. Science 274: 536539.
Lewontin, R.C. 1988. On measures of gametic disequilibrium. Genetics 120: 849852.
Li, N. and Stephens, M. 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165: 22132233. Lin, S., Cutler, D.J., Zwick, M.E., and Chakravarti, A. 2002. Haplotype inference in random population samples. Am. J. Hum. Genet. 71: 11291137.[CrossRef][Medline] Morris, R.W. and Kaplan, N.L. 2002. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet. Epidemiol. 23: 221233.[CrossRef][Medline] Muller-Myhsok, B. and Abel, L. 1997. Genetic analysis of complex diseases. Science 275: 13281329. Niu, T., Qin, Z.S., Xu, X., and Liu, J.S. 2002. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70: 157169.[CrossRef][Medline] Papadopoulos, N., Leach, F.S., Kinzler, K.W., and Vogelstein, B. 1995. Monoallelic mutation analysis (MAMA) for identifying germline mutations. Nat. Genet. 11: 99102.[CrossRef][Medline] Pritchard, J.K. 2001. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69: 124137.[CrossRef][Medline] Risch, N. and Merikangas, K. 1996. The future of genetic studies of complex human diseases. Science 273: 15161517.[Medline] Rohde, K. and Fuerst, R. 2001. Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. Hum. Mutat. 17: 289295.[CrossRef][Medline] Rosner, B. 2000. Fundamentals of biostatistics. Duxbury Press, Pacific Grove, CA. Schaid, D.J. 2002. Relative efficiency of ambiguous vs. directly measured haplotype frequencies. Genet. Epidemiol. 23: 426443.[CrossRef][Medline] Schaid, D.J., McDonnell, S.K., Wang, L., Cunningham, J.M., and Thibodeau, S.N. 2002. Caution on pedigree haplotype inference with software that assumes linkage equilibrium. Am. J. Hum. Genet. 71: 992995.[CrossRef][Medline] Slager, S.L., Huang, J., and Vieland, V.J. 2000. Effect of allelic heterogeneity on the power of the transmission disequilibrium test. Genet. Epidemiol. 18: 143156.[CrossRef][Medline] Sobel, E. and Lange, K. 1996. Descent graphs in pedigree analysis: Applications to haplotyping, location scores, and marker-sharing statistics. Am. J. Hum. Genet. 58: 13231337.[Medline] Sobel, E., Lange, K., O'Connel, J.R., and Weeks, D.E. 1996. Haplotyping algorithms. In Genetic mapping and DNA sequencing (eds. T. Speed and M.S. Waterman), pp. 89110. Springer-Verlag, New York. Spielman, R.S., McGinnis, R.E., and Ewens, W.J. 1993. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52: 506516.[Medline] Stephens, M. and Donnelly, P. 2003. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 11621169.[CrossRef][Medline] Stephens, M., Smith, N.J., and Donnelly, P. 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68: 978989.[CrossRef][Medline] Wall, J.D. and Pritchard, J.K. 2003. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4: 587597.[CrossRef][Medline] Weir, B.S. and Cockerham, C.C. 1989. Complete characterization of disequilibrium at two loci. In Mathematical evolutionary theory (ed. M.W. Feldman), pp. 86110. Princeton University Press, Princeton, NJ. Zhao, H., Zhang, S., Merikangas, K.R., Trixler, M., Wildenauer, D.B., Sun, F., and Kidd, K.K. 2000. Transmission/disequilibrium tests using multiple tightly linked markers. Am. J. Hum. Genet. 67: 936946.[CrossRef][Medline] Zwick, M.E., Cutler, D.J., and Chakravarti, A. 2001. A genetic variation analysis of neuropsychiatric traits. In Methods in genomic neuroscience, pp. 289302. CRC Press, Boca Raton, FL.
http://www.bioinf.mdc-berlin.de/ http://archimedes.well.ox.ac.uk/pise; PHamily.
Received November 24, 2003; accepted in revised format April 30, 2004. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||