|
|
|
|
Published online before print
September 4, 2007, 10.1101/gr.6665407 Genome Res. 17:1520-1528, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Methods Prediction of individual genetic risk to disease from genome-wide association studies1 Genetic Epidemiology, Queensland Institute of Medical Research, Queensland 4029, Brisbane, Australia; 2 Faculty of Land and Food Resources, University of Melbourne, Victoria 3010, Australia; 3 Department of Primary Industries, Victoria 3049, Australia
Empirical studies suggest that the effect sizes of individual causal risk alleles underlying complex genetic diseases are small, with most genotype relative risks in the range of 1.1–2.0. Although the increased risk of disease for a carrier is small for any single locus, knowledge of multiple-risk alleles throughout the genome could allow the identification of individuals that are at high risk. In this study, we investigate the number and effect size of risk loci that underlie complex disease constrained by the disease parameters of prevalence and heritability. Then we quantify the value of prediction of genetic risk to disease using a range of realistic combinations of the number, size, and distribution of risk effects that underlie complex diseases. We propose an approach to assess the genetic risk of a disease in healthy individuals, based on dense genome-wide SNP panels. We test this approach using simulation. When the number of loci contributing to the disease is >50, a large case-control study is needed to identify a set of risk loci for use in predicting the disease risk of healthy people not included in the case-control study. For diseases controlled by 1000 loci of mean relative risk of only 1.04, a case-control study with 10,000 cases and controls can lead to selection of 75 loci that explain >50% of the genetic variance. The 5% of people with the highest predicted risk are three to seven times more likely to suffer the disease than the population average, depending on heritability and disease prevalence. Whether an individual with known genetic risk develops the disease depends on known and unknown environmental factors.
An important benefit from the study of the genetics of human disease is to predict the risk that individuals may have of succumbing to a particular disease. Knowledge of this risk can then be used by the clinician in prevention, diagnosis, prognosis, and treatment. Currently, clinicians use the family history of a patient to help assess their risk of a disease with a known genetic component, with family history formally included in standard international disease classification systems. With modern molecular tools, can we improve on the use of family history to assess genetic risk of disease? For diseases caused by single genes, the answer is obviously "yes," but for diseases with complex inheritance, the best method to use and the success that might be expected are unclear. The dominant paradigm in human complex-trait genetics has been to map loci affecting disease risk and then to identify the causative mutations. Complex traits are likely to be affected by many genes and mutations, most of which have a small effect on disease risk. The relative risk of disease due to one allele is typically of the order of 1.1 to 2.0 (Ioannidis et al. 2006
Identification of causal variants and elucidating disease pathways through genetic and functional studies is difficult and time-consuming, particularly if there are many risk loci with small effects. However, knowledge of all risk loci or knowledge of causal variants at any one risk locus is not necessary for the prediction of the risk to disease of individuals in the population. The recent advances in high-density single-nucleotide polymorphism (SNP) technology (Kennedy et al. 2003 The objective of this study is to quantify the accuracy of risk prediction from genome-wide association studies and to quantify the true disease risk faced by the people predicted to be most at risk in subsequent samples from the population. To do this we consider models of the underlying genetic architecture assuming realistic distributions of the frequencies and effect sizes of risk loci, constraining the number of risk loci and the mean effect size to be consistent with disease prevalence and heritability. Using these models, we estimate the genetic risk of individuals based on a simulated genome-wide association study (GWAS).
The success of association studies, and also of genomic profiling, depends on the genetic architecture underlying complex diseases. First, we investigate the relationship between the relative risk (RR) of genetic loci and the number of loci that contribute to risk of a disease under constraints of known disease prevalence and heritability. We model the genetic architecture of complex disease by allowing the effect size and frequency of risk allele to vary across loci. We go on to use these results to investigate the possibilities of using multiple risk loci identified in a GWAS to predict risk of disease in a new population cohort.
We consider four disease scenarios based on realistic combinations of disease prevalence, K = 0.05 or 0.10, and heritabilities of the disease on the observed scale, h2 = 0.1 or 0.2. We consider two distributions of frequency of risk alleles underlying the disease (Fig. 1): A uniform distribution of allele frequencies that broadly corresponds to the common-disease common-variant (CDCV) hypothesis in which the frequency of the increasing risk allele was simulated as pi
Number of loci underlying complex diseases For a given number of disease loci we force the effect sizes to be consistent with the disease prevalence and heritability parameters. The average relative risks for fixed numbers of disease loci for the four disease scenarios are given in Figure 2. Summary statistics of the mean and maximum RR, the maximum percentage of genetic variance explained by a single locus, and the percentage of genetic variance explained by extreme frequency risk variants describe the properties and differences of the models (Table 1). For the CDCV model, an average RR of 1.2 corresponds to 40 or more loci. As expected, fewer loci imply larger average RR, and the average risk of loci for the approximate neutral model of evolution is always larger than that for the CDCV model for the same number of loci. Similarly, the maximum percentage of genetic variance explained by a single locus is always larger for the neutral model compared to the CDCV model when the number of risk loci is the same. However, the relationship between the number of disease loci and their average RR is broadly similar for the two models. When 1000 risk loci influence a disease, the maximum contribution to genetic variance of any single locus is only 3%–4%. As expected, as the sibling risk increases, the average RR increases if the number of risk loci is fixed; or, the number of risk loci required increases if the mean RR is held constant.
We derived an analytic expression for number of loci when RR and allele frequency are fixed for all loci (Equation 3), which agrees well with the results of the CDCV model when p = 0.5 and with the neutral model when allele frequency p = 0.1; e.g., for K = 0.05 and h2 = 0.2, then for p = 0.5, the number of loci for fixed relative risks of 1.1, 1.2, 1.4, and 1.6 are 346, 95, 29, and 15, respectively; and for p = 0.1, the corresponding number of loci are 889, 227, 59, and 28. Different combinations of K and h2 can lead to the same sibling (sib) relative risk (Fig. 2); it is this combined parameter that drives the results. Equation 3 can be used to investigate the impact of K or h2 on the number of loci underlying complex diseases (Fig. 3).
Use of GWAS to predict disease risk Using our models for the genetic architecture of complex diseases, we go on to investigate prediction of genetic risk to disease from multiple risk loci identified in a GWAS. To do this we simulated a case-control study assuming a single-stage genome-wide association screen with 500,000 SNPs. The number of disease risk loci was fixed at 10, 20, 50, 100, 300, or 1000, and allele frequencies were simulated from either the U-shaped (neutral) or uniform distribution (CDCV). Table 2 summarizes the number of loci selected for prediction of genetic risk and the proportion of variance in log risk that they explain in an independent sample of people.
For all simulated scenarios, when 10,000 cases and controls were used, the accuracy with which the genetic risk of disease was predicted in a new random sample of the population of 1000 individuals was very high (Fig. 4 for CDCV model; results for the neutral model were similar but less conservative). For example, for the CDCV model of a disease with prevalence 0.05 and heritability 0.1 caused by 100 risk loci with average RR of 1.15 (Table 1), the accuracy of prediction was 0.97 (Fig. 4). The prediction equation used 45 loci that explained 94% of the genetic variance (Table 2). As the number of risk loci increases from 100 to 300 to 1000, the accuracy remains above 0.70, even though the average genotype relative risk falls below 1.1 (Fig. 4). The number of loci included in the prediction profile continues to increase as the total number of risk loci increases (Table 1), although the percentage of genetic variation they explain decreases. Even when only 1000 cases and controls were used, the accuracy of prediction was high (>0.7) unless the number of disease loci was >50, corresponding to average RR of disease alleles of <1.2. A GWAS of this size does not have sufficient power to detect risk loci with low average RR, and hence the number of loci selected for inclusion in the prediction profiles and the percentage of genetic variation they explain drops off (Table 1). The results are broadly similar for the CDCV and neutral disease models. The power of our approach is demonstrated in Figure 5 for CDCV, where the true RR of disease for the individuals with the highest 5% of predicted risk in a new sample is shown relative to the mean empirical population risk ( 0.05 or 0.10 when population prevalence K = 0.05 or 0.10, respectively). Case-control samples of 1000 can generate SNP risk profile sets that identify individuals who have risk of disease three times higher than the population average when the number of disease loci is <50.When the case-control sample is 10,000, individuals in the population that have a three to seven times increased risk of disease can be identified even when the number of disease loci is very large (1000). That is, individuals that have an absolute risk of disease of 15%–70% can be identified. The results are broadly similar for the CDCV and neutral disease models; the accuracy was slightly higher under the neutral model except when the number of risk loci was small. The high accuracy of prediction is not explained by the presence of a few loci of very large effect (the mean maxima RR are listed in Table 1). Under the null hypothesis, one marker is expected, by chance, to have a test statistic that exceeds the threshold of 22.59. Selection of, on average, one false positive was confirmed in the simulations. When the number of true risk loci is small, their mean RR is higher for the same heritability of the disease, so that even with only 1000 cases and 1000 controls in the association study, most of the true disease loci are selected. When the number of risk loci is high, the mean RR is low, but the distribution of RR means that almost all the genetic variance is explained by a fraction of the risk loci.
We have quantified the number of disease loci underlying common disease using realistic parameters and have shown that results from GWAS can be used to identify healthy individuals in the population who are at a substantially increased risk of developing disease, even when individual risk loci confer small relative risks. From our model we first determined the relationship between the number of susceptibility loci underlying a complex disease and their average RR, given the allele frequency distribution of risk alleles, the population prevalence, and the heritability. Our results are robust to the distribution of risk allele frequencies assumed (approximating the CDCV or neutral model). We assume additive gene action on the log risk scale (multiplicative gene action on the risk scale), that loci act independently and that there is no linkage disequilibrium between disease predisposition loci. Four disease scenarios were considered that are representative of complex diseases, such as major depression, hypertension, heart disease, or type II diabetes; a population prevalence of 5% or 10%; and heritability on the observed disease scale of 10% or 20%. These choices of parameters translate to diseases with relative risks for full-sibs of affected probands ( s) ranging from 1.45 to 2.90. The analytic formula for the number of loci, assuming all loci to have the same effect and the same allele frequency, was found to be a robust predictor of the number of loci estimated by simulation when frequencies and effect sizes were sampled from a distribution; using Equation 3 with allele frequency, p = 0.5 or 0.1, gave results that agreed well with the CDCV or neutral model simulation, respectively. The analytic result is a convenient way to investigate the impact of disease prevalence and heritability on the number of loci underlying disease (Fig. 3). The number of disease risk loci that underlie complex disease have previously been investigated (Yang et al. 2005In addition to the simulations reported here, we also simulated a disease with prevalence K = 0.01 and heritability h2 = 0.05 (corresponding to a sibling RR of 3.48) and found results similar to those for a disease with similar relative sibling RR, e.g., K = 0.05, h2 = 0.25. As with the other disease scenarios, we were able to identify individuals who had an increased risk of disease that was three to five times higher than average, but when prevalence is so low this still translates to a small absolute risk of disease, and so genomic profiling may be less useful for rare diseases. However, we note that low-prevalence diseases often show evidence for nonadditive genetic effects (monozygotic twin concordance rates several fold higher than dizygotic twin concordance rates, e.g., schizophrenia, type 1 diabetes, Crohns disease), implying that models that include nonadditive genetic effects may be more relevant to these disorders.
Our results show that, even for diseases controlled by 1000 loci with mean RR of only 1.04, a case-control study with 10,000 cases and controls can lead to selection of
The accuracy of prediction
To investigate the use of high-density genome-wide genetic markers for prediction of genetic risk of disease, we have made some simplifying assumptions. We assumed that the true causal SNPs were always included in the GWAS, and we ignored linkage disequilibrium (LD) between simulated SNPs. If all of our SNPs are viewed as "tag-SNPs" (Carlson et al. 2004
In the past, lack of replication has been a recurring problem for genetic association studies, which must, in part at least, be attributable to lack of power resulting from small sample sizes. In contrast, GWAS and their subsequent replication studies are characterized by large study samples. Time will tell if nonreplication of results and identification of large numbers of false positives is a characteristic of large-scale GWAS. Nonreplication of results may remain a problem if there are, as yet, undetermined methodological problems in genotyping, subtle population stratification effects, or important gene x environment interaction effects. If such problems exist, then our predictions for genetic risk provide an upper bound on the potential for prediction of genetic risk. GWAS for type 2 diabetes have just been published (Saxena et al. 2007
We considered two models for the distribution of risk effects, and we assumed that all genetic variance was attributable to variants of frequency 0.01 to 0.99. If the true genetic architecture underlying complex diseases means that the majority of genetic variance is explained by variants with minor allele frequency <0.01 (the rare variants model; Pritchard 2001 Our simulation model allows direct investigation of the most important underlying factors that drive whether genomic profiling is feasible. Our results provide a foundation stone upon which further layers of complexity can be added, but such an exercise is only worthwhile if the foundation is sufficiently solid. All the caveats that apply to GWAS and their replication apply to the derivation of a SNP set that together predict genetic risk and its validation, ensuring that discovery, validation, and application populations are the same.
The need for new methodology for prediction of genetic risk has been recognized (Collins et al. 2003
The success of association studies and also of genomic profiling depends on the genetic architecture underlying complex diseases. Our first aim is to investigate the relationship between the RR of genetic loci and the number of loci that contribute to risk of a disease under constraints of known disease prevalence and heritability. Ultimately, we will model the genetic architecture of complex disease by allowing the effect size and frequency of risk allele to vary across loci. However, to give insight into our results we first derive an analytical expression for the number of loci that contribute to a disease when the RR ( ) and the allele frequency (p) of the risk alleles are both fixed. We will go on to use these results to investigate the possibilities of using multiple risk loci identified in a genome-wide association study to predict risk of disease in a new population cohort. We introduce the following notations:
Number of loci underlying complex disease when frequency and RR of risk loci are fixed across all loci
Let g = Prob(affected | genotype) = f0
Number of loci underlying a complex disease when frequency and RR of risk alleles vary across loci
Uniform(0.01,0.99), or a U-shaped distribution that broadly corresponds the neutral allele hypothesis (Pritchard 2001 to 1 – , with a small number. Solving )/ ]. To simulate an allele frequency from this distribution, we first draw a random number r Uniform(0,1), which is a draw from the cumulative density function F(p), and then solve for p, as p = 1/{1 + exp[–(r – 1/2)/C]}. To avoid the simulation of many allele frequencies that are close to 0 or 1 (with resulting finite samples that would be monomorphic), we truncated the allele frequencies at 0.01 and 0.99. To truncate the allele frequencies at pt and 1–pt, satisfies the relationship, 1/pt – 1 = exp[(1/2 – )/C], which can be solved iteratively. For pt = 0.01, = 0.009183. For each of the n risk loci, RR was simulated as i = 1 + x( 0 – 1), with x Exponential(1). This results in i always being larger than 1.0, provided that 0, an arbitrary input parameter is >1.0. The mean of the simulated RR is E( ) = 0. The i are transformed so that all the genetic variance is explained by the n loci as
= – 1) with x Exponential(1). This procedure of simulating and transforming relative risks for a fixed number of loci was implemented to force the set of simulated risk alleles to be consistent with a given heritability and disease prevalence, while keeping the distribution of the effects exponential.
Analysis of case-control data for identification of multiple risk loci
2 test) for association was above a predetermined threshold of 22.59, which corresponds to an expected number of one false positive from 500,000 tests and a nominal P-value of 2 x 10–6.
Risk prediction in a new sample from the same population
Disease parameters
Closed solutions for number of risk loci for parameterization of Yang et al. (2005) Yang et al. (2005)
g is the RR of the risk genotype. Using the expected value of the number of risk genotypes n in the population of pg, this equation reduces to a closed form of K = f0[1 + npg( g – 1)]. Defining population-attributable fraction (PAF), as PAF = (K – f0)/K, a closed solution for the number of risk loci is
g – 1)], with x = 1 with probability pg and x = 0 with probability (1 – pg), reduces to a closed form of K = f0[1 + pg( g – 1)]n; solving for n gives
We thank Stuart Macgregor and Bill Hill for commenting on the manuscript and three reviewers for many helpful comments and suggestions. This work was supported by the Australian National Health and Medical Research Council grants 389892, 442915, 339450, and 443011 and Australian Research Council grant DP0770096.
4 Corresponding author.
E-mail Naomi.Wray{at}qimr.edu.au; fax 61-7-3362-0101. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6665407
Barrett, J.C. and Cardon, L.R. 2006. Evaluating coverage of genome-wide association studies. Nat. Genet. 38: 659–662.[CrossRef][Medline] Barton, N.H. and Keightley, P.D. 2002. Understanding quantitative genetic variation. Nat. Rev. Genet. 3: 11–21.[Medline] Bell, J. 2004. Predicting disease using genomics. Nature 429: 453–456.[CrossRef][Medline] Bertram, L., McQueen, M.B., Mullin, K., Blacker, D., and Tanzi, R.E. 2007. Systematic meta-analyses of Alzheimer disease genetic association studies: The AlzGene database. Nat. Genet. 39: 17–23.[CrossRef][Medline] Carlson, C.S., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004. Mapping complex disease loci in whole-genome association studies. Nature 429: 446–452.[CrossRef][Medline] Chakravarti, A. 1999. Population genetics—Making sense out of sequence. Nat. Genet. 21: 56–60.[CrossRef][Medline] Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835–847.[CrossRef][Medline] Cox, A., Dunning, A.M., Garcia-Closas, M., Balasubramanian, S., Reed, M.W., Pooley, K.A., Scollen, S., Baynes, C., Ponder, B.A., Chanock, S., et al. 2007. A common coding variant in CASP8 is associated with breast cancer risk. Nat. Genet. 39: 352–358.[CrossRef][Medline] Falconer, D. and Mackay, T. 1996. Introduction to quantitative genetics. Longman, London. Grosse, S.D. and Khoury, M.J. 2006. What is the clinical utility of genetic testing? Genet. Med. 8: 448–450.[Medline] Henderson, N.D., Turri, M.G., DeFries, J.C., and Flint, J. 2004. QTL analysis of multiple behavioral measures of anxiety in mice. Behav. Genet. 34: 267–293.[CrossRef][Medline] Hirschhorn, J.N. and Daly, M.J. 2005. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6: 95–108.[Medline] Ioannidis, J.P., Trikalinos, T.A., and Khoury, M.J. 2006. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164: 609–614. Jacobsson, L., Park, H.B., Wahlberg, P., Fredriksson, R., Perez-Enciso, M., Siegel, P.B., and Andersson, L. 2005. Many QTLs with minor additive effects are associated with a large difference in growth between two selection lines in chickens. Genet. Res. 86: 115–125.[CrossRef][Medline] Janssens, A.C., Aulchenko, Y.S., Elefante, S., Borsboom, G.J., Steyerberg, E.W., and van Duijn, C.M. 2006. Predictive testing for complex diseases using multiple genes: Fact or fiction? Genet. Med. 8: 395–400.[Medline] Kennedy, G.C., Matsuzaki, H., Dong, S., Liu, W.M., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., et al. 2003. Large-scale genotyping of complex DNA. Nat. Biotechnol. 21: 1233–1237.[CrossRef][Medline] Khoury, M.J., Jones, K., and Grosse, S.D. 2006. Quantifying the health benefits of genetic tests: The importance of a population perspective. Genet. Med. 8: 191–195.[Medline] Khoury, M.J., Yang, Q., Gwinn, M., Little, J., and Dana Flanders, W. 2004. An epidemiologic assessment of genomic profiling for measuring susceptibility to common diseases and targeting interventions. Genet. Med. 6: 38–47.[Medline] Lynch, M. and Walsh, B. 1998. Genetics and analysis of quantitative traits. Sinauer Associates, Inc., Sunderland, MA. Lyssenko, V., Almgren, P., Anevski, D., Orho-Melander, M., Sjögren, M., Saloranta, C., Tuomi, T., and Groop, L. 2005. Genetic prediction of future type 2 diabetes. PLoS Med. 2: e345. doi: 10.1371/journal.pmed.0020345.[CrossRef][Medline] Lyssenko, V., Anevski, D., Almgren, P., and Groop, L. 2006. Authors reply. PLoS Med 3: e127. doi: 10.1371/journal.pmed.0030127.[CrossRef] Mackay, T.F. 2004. The genetic architecture of quantitative traits: Lessons from Drosophila. Curr. Opin. Genet. Dev. 14: 253–257.[CrossRef][Medline] Maller, J., George, S., Purcell, S., Fagerness, J., Altshuler, D., Daly, M.J., and Seddon, J.M. 2006. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nat. Genet. 38: 1055–1059.[CrossRef][Medline] Meuwissen, T.H., Hayes, B.J., and Goddard, M.E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829. Pharoah, P.D.P., Antoniou, A., Bobrow, M., Zimmern, R.L., Easton, D.F., and Ponder, B.A.J. 2002. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31: 33–36.[CrossRef][Medline] Pritchard, J.K. 2001. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69: 124–137.[CrossRef][Medline] Reich, D.E. and Lander, E.S. 2001. On the allelic spectrum of human disease. Trends Genet. 17: 502–510.[CrossRef][Medline] Risch, N. 1990. Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet. 46: 222–228.[Medline] Risch, N. and Merikangas, K. 1996. The future of genetic studies of complex human diseases. Science 273: 1516–1517.[Medline] Saxena, R., Voight, B.F., Lyssenko, V., Burtt, N.P., de Bakker, P.I., Chen, H., Roix, J.J., Kathiresan, S., Hirschhorn, J.N., Daly, M.J., et al. 2007. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316: 1331–1336. Scott, L.J., Mohlke, K.L., Bonnycastle, L.L., Willer, C.J., Li, Y., Duren, W.L., Erdos, M.R., Stringham, H.M., Chines, P.S., Jackson, A.U., et al. 2007. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316: 1341–1345. Sladek, R., Rocheleau, G., Rung, J., Dina, C., Shen, L., Serre, D., Boutin, P., Vincent, D., Belisle, A., Hadjadj, S., et al. 2007. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445: 881–885.[CrossRef][Medline] Storey, J.D. and Tibshirani, R. 2003. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100: 9440–9445. Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99: 6567–6572. Valdar, W., Solberg, L.C., Gauguier, D., Burnett, S., Klenerman, P., Cookson, W.O., Taylor, M.S., Rawlins, J.N., Mott, R., and Flint, J. 2006. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat. Genet. 38: 879–887.[CrossRef][Medline] Wellcome Trust Case Control Consortium. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.[CrossRef][Medline] Yang, Q., Khoury, M.J., Friedman, J., Little, J., and Flanders, W.D. 2005. How many genes underlie the occurrence of common complex diseases in the population? Int. J. Epidemiol. 34: 1129–1137. Zeggini, E., Weedon, M.N., Lindgren, C.M., Frayling, T.M., Elliott, K.S., Lango, H., Timpson, N.J., Perry, J.R., Rayner, N.W., and Freathy, R.M. 2007. Replication of genome-wide association signals in U.K. samples reveals risk loci for type 2 diabetes. Science 316: 1336–1341. Zollner, S. and Pritchard, J.K. 2007. Overcoming the winners curse: Estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80: 605–615.[CrossRef][Medline]
Received May 2, 2007; accepted in revised format July 19, 2007.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||