Genome Research scroll

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Teng, J.
Right arrow Articles by Risch, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Teng, J.
Right arrow Articles by Risch, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 9, Issue 3, 234-241, March 1999

The Relative Power of Family-Based and Case-Control Designs for Linkage Disequilibrium Studies of Complex Human Diseases. II. Individual Genotyping

Jun Teng,1 and Neil Risch1,2,3,4

1 Department of Statistics, Stanford University and Departments of 2 Genetics and 3 Health Research and Policy, Stanford University School of Medicine, Stanford, California 94305 USA

    ABSTRACT
Top
Abstract
Introduction
DISCUSSION
References

In this paper we consider test statistics based on individual genotyping. For sibships without parents, but with unaffected as well as affected sibs, we introduce a new test statistic (referred to as TDS), which contrasts the allele frequency in affected sibs versus that estimated for the parents from the entire sibship. For sibships without parents, this test is analogous to the TDT and is completely robust to nonrandom mating patterns. The efficiency of the TDS test is comparable to that of the THS test (which compares affected vs. unaffected sibs and was based on DNA pooling), for sibships with one affected child. However, as the number of affected sibs in the sibship grows, the relative efficiency of the TDS test versus the THS test also increases. For example, for sibships with three affected, one-third fewer families are required; for families with four affected, nearly half as many are required. Thus, when sibships contain multiple affected individuals, the TDS test provides both an increase in power and robustness to nonrandom mating.

    INTRODUCTION
Top
Abstract
Introduction
DISCUSSION
References

In the first paper in this series, Risch and Teng (1998), we considered statistics based on data derived from DNA pooling. Only overall allele frequency estimates for a pool are available from such experiments; hence, only statistics based on pooled allele frequencies are possible, such as the haplotype-based haplotype relative risk (HHRR) (Falk and Rubinstein 1987; Terwilliger and Ott 1992). Such statistics are not automatically robust to nonrandom mating, although they are conservative under population stratification. Furthermore, such statistics may not extract all the available information in some study designs if individual genotyping is performed. Therefore, in this paper we consider analyses of data obtained from individual genotyping of all study subjects. We compare the same family constellations as described in Risch and Teng (1998). As individual genotyping provides more information than DNA pooling, it enables us to improve the statistical treatment in two ways: by increasing robustness and power.

We consider statistics of the form (p1 - p2)/sigma , in which the numerator contrasts the estimated allele frequencies in two groups (affected sibs vs. parents) and the denominator is the estimated standard deviation of the numerator. Typically, the variance of (p1 - p2) is a function of genotype frequencies in the parents. When DNA pooling has been performed, this variance has to be estimated based on the assumption of Hardy-Weinberg equilibrium. On the other hand, individual genotyping allows us to get an unbiased estimate of the variance under more general conditions and thus provides further robustness to non-random mating. More importantly, in the case where parents are unavailable, individual genotyping gives us a greater choice of the contrast we can make in the numerator, which potentially can improve the power of the test.

Study designs that include affected offspring with parents lend themselves to the calculation of a TDT statistic, provided individual genotyping is performed. Although the TDT offers additional robustness to nonrandom mating in this case, the power of this test statistic is generally comparable to that of the HHRR statistic, at least when mating is nearly at random. This is because the Hardy-Weinberg estimator of parental heterozygosity, used in the denominator of the HHRR statistic, is close to the directly counted parental heterozygosity estimate used in the TDT (Risch and Teng 1998, formula 4). Thus, sample size requirements using individual genotyping for designs involving affected offspring with parents, based on TDT, are essentially identical to those we have presented previously (Risch and Teng 1998) for the same designs based on DNA pooling and HHRR statistics (calculations performed but not presented). Therefore, we use the sample size requirements for affected sibships with parents derived in Risch and Teng (1998) for comparison with individually genotyped sibships without parents.

In the classic TDT, p1 is the allele frequency in the affected child (or children) and p2 the allele frequency in the parents. For sibships without parents, the test described in Risch and Teng (1998) proposes p1 to be the allele frequency in the affected sibs, and p2 the allele frequency in the unaffected sibs. When the locus-related penetrance is low, the allele frequency p2 in unaffected sibs can also be viewed as providing a nearly unbiased estimate of the allele frequency in the parents (in this sense, it is similar to the TDT, in which p2 is the observed allele frequency in the parents). When more than one child has been individually genotyped, however, it is possible to obtain a more efficient estimate of the parent allele frequency p2, as well as an estimate of the variance of p1 - p2 that is robust to nonrandom mating. We derive such a statistic below and describe its properties.

We use the same notation as given in Risch and Teng (1998); namely, mij denotes the conditional probability of mating type (i,j) given an affected child (and similarly m(r)ij for r affected children), in which i and j are the number of A alleles in the two parents (we use parentheses in subscripts to denote unordered genotypes); fk is the ratio of penetrance in individuals with k D alleles compared with dd individuals; hats over letters (circumflexes) denote sample estimates. To simplify some formulas, we also introduce the following notation:
c<SUB>21</SUB> = <FR><NU>f<SUB>2</SUB></NU><DE>f<SUB>2</SUB> + f<SUB>1</SUB></DE></FR>, c<SUB>20</SUB> = 1 − c<SUB>21</SUB> = <FR><NU>f<SUB>1</SUB></NU><DE>f<SUB>2</SUB> + f<SUB>1</SUB></DE></FR>,
c<SUB>01</SUB> = <FR><NU>f<SUB>1</SUB></NU><DE>f<SUB>1</SUB> + 1</DE></FR>, c<SUB>00</SUB> = 1 − c<SUB>01</SUB> = <FR><NU>1</NU><DE>f<SUB>1</SUB> + 1</DE></FR>,
c<SUB>12</SUB> = <FR><NU>f<SUB>2</SUB></NU><DE>f<SUB>2</SUB> + 2f<SUB>1</SUB> + 1</DE></FR>, c<SUB>11</SUB> = <FR><NU>2f<SUB>1</SUB></NU><DE>f<SUB>2</SUB> + 2f<SUB>1</SUB> + 1</DE></FR>, c<SUB>10</SUB> = <FR><NU>1</NU><DE>f<SUB>2</SUB> + 2f<SUB>1</SUB> + 1</DE></FR>
We assume, as in Risch and Teng (1998), that unaffected sibs have a random genotype distribution (low penetrance) given the parental mating type.

Affected-Unaffected Sib Pairs

We first examine the case of one affected and one unaffected sib, without parents. For this case, there are nine possible marker genotype outcomes for the sib pair, as listed in Table 1, along with their probabilities of occurrence. To estimate the frequency of allele A in the parents (p2), we notice that under the null hypothesis, f2 = f1 = 1 and the affected and unaffected sibs become symmetric; so Table 1 can be simplified to six possible outcomes: (1) Both sibs are AA; (2) both sibs are aa; (3) both sibs are Aa; (4) one is AA, the other is Aa; (5) one is Aa, the other is aa; and (6) one is AA, the other is aa. There are also the same six possible genotype combinations (mating types) for the parents with respective probability m(ij). Because there is an equal number of parameters and independent observations, maximum likelihood estimates of the parental mating type frequencies m(ij) can be calculated by equating the sample frequency of each sib-pair outcome with its respective probability, namely
n<SUB>22</SUB>/n = <A><AC>m</AC><AC>ˆ</AC></A><SUB>22</SUB> + <A><AC>m</AC><AC>ˆ</AC></A><SUB>(21)</SUB>&cjs0823;    4 + <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB>&cjs0823;    16
n<SUB>00</SUB>/n = <A><AC>m</AC><AC>ˆ</AC></A><SUB>00</SUB> + <A><AC>m</AC><AC>ˆ</AC></A><SUB>(10)</SUB>&cjs0823;    4 + <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB>&cjs0823;    16
n<SUB>11</SUB>/n = <A><AC>m</AC><AC>ˆ</AC></A><SUB>(20)</SUB> + <A><AC>m</AC><AC>ˆ</AC></A><SUB>(21)</SUB>&cjs0823;    4 + <A><AC>m</AC><AC>ˆ</AC></A><SUB>(10)</SUB>&cjs0823;    4 + <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB>&cjs0823;    4
n<SUB>21</SUB>/n + n<SUB>12</SUB>&cjs0823;    n = <A><AC>m</AC><AC>ˆ</AC></A><SUB>(21)</SUB>&cjs0823;    2 + <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB>&cjs0823;    4
n<SUB>10</SUB>/n + n<SUB>01</SUB>&cjs0823;    n = <A><AC>m</AC><AC>ˆ</AC></A><SUB>(10)</SUB>&cjs0823;    2 + <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB>&cjs0823;    4
n<SUB>20</SUB>/n + n<SUB>02</SUB>&cjs0823;    n = <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB>&cjs0823;    8
Solving these equations, we get the unbiased maximum likelihood estimators &mcirc;ij. These are given by
<A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB> = 8(n<SUB>20</SUB> + n<SUB>02</SUB>)&cjs0823;    n
<A><AC>m</AC><AC>ˆ</AC></A><SUB>(10)</SUB> = [2(n<SUB>10</SUB> + n<SUB>01</SUB>) − 4(n<SUB>20</SUB> + n<SUB>02</SUB>)]&cjs0823;    n
<A><AC>m</AC><AC>ˆ</AC></A><SUB>(21)</SUB> = [2(n<SUB>21</SUB> + n<SUB>12</SUB>) − 4(n<SUB>20</SUB> + n<SUB>02</SUB>)]&cjs0823;    n
<A><AC>m</AC><AC>ˆ</AC></A><SUB>00</SUB> = [2n<SUB>00</SUB> − (n<SUB>10</SUB> + n<SUB>01</SUB>) + (n<SUB>20</SUB> + n<SUB>02</SUB>)]&cjs0823;    2n
<A><AC>m</AC><AC>ˆ</AC></A><SUB>22</SUB> = [2n<SUB>22</SUB> − (n<SUB>21</SUB> + n<SUB>12</SUB>) + (n<SUB>10</SUB> + n<SUB>01</SUB>)]&cjs0823;    2n
<A><AC>m</AC><AC>ˆ</AC></A><SUB>(20)</SUB> = [2n<SUB>11</SUB> − (n<SUB>21</SUB> + n<SUB>12</SUB>) + (n<SUB>10</SUB> + n<SUB>01</SUB>)]&cjs0823;    2n
Then the frequency of A in the parents can be estimated by
<A><AC>p</AC><AC>ˆ</AC></A><SUB>2</SUB> = <A><AC>m</AC><AC>ˆ</AC></A><SUB>22</SUB> + ¾<A><AC>m</AC><AC>ˆ</AC></A><SUB>(21)</SUB> + ½<A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB> + ½<A><AC>m</AC><AC>ˆ</AC></A><SUB>(20)</SUB> + ¼<A><AC>m</AC><AC>ˆ</AC></A><SUB>(10)</SUB>
= [n<SUB>22</SUB> + ¾(n<SUB>21</SUB> + n<SUB>12</SUB>) + ½(n<SUB>11</SUB> + n<SUB>20</SUB> + n<SUB>02</SUB>)
 + ¼(n<SUB>10</SUB> + n<SUB>01</SUB>)]&cjs0823;    n
which, in this case, is the same as the A allele frequency in the combined sibling sample. Because
<A><AC>p</AC><AC>ˆ</AC></A><SUB>1</SUB> = [n<SUB>22</SUB> + n<SUB>21</SUB> + n<SUB>20</SUB> + ½(n<SUB>12</SUB> + n<SUB>11</SUB> + n<SUB>10</SUB>)]&cjs0823;    n
we have
<A><AC>p</AC><AC>ˆ</AC></A><SUB>1</SUB> − <A><AC>p</AC><AC>ˆ</AC></A><SUB>2</SUB> = [(n<SUB>21</SUB> − n<SUB>12</SUB>) + (n<SUB>10</SUB> − n<SUB>01</SUB>) + 2(n<SUB>20</SUB> − n<SUB>02</SUB>)]&cjs0823;    4n.
The variance of p1 - p2 is a function of h, the frequency of heterozygosity in the parents. Whereas DNA pooling required us to use the Hardy-Weinberg assumption in the estimation of h (formula 5 of Risch and Teng 1998), individual genotyping allows us to obtain a more direct estimate, robust to nonrandom mating. Specifically,
<A><AC>h</AC><AC>ˆ</AC></A> = <A><AC>m</AC><AC>ˆ</AC></A><SUB>11</SUB> + ½<A><AC>m</AC><AC>ˆ</AC></A><SUB>(21)</SUB> + ½<A><AC>m</AC><AC>ˆ</AC></A><SUB>(10)</SUB>
= [n<SUB>21</SUB> + n<SUB>12</SUB> + n<SUB>10</SUB> + n<SUB>01</SUB> + 4(n<SUB>20</SUB> + n<SUB>02</SUB>)]&cjs0823;    n
In this case, under the null hypothesis, var(p1 - p2) =h/16n (e.g., this can be calculated from the variance of S in Table 1 using f2f1 = 1). Therefore, we can construct the statistic
<UP>T<SUB>DS</SUB> = </UP><FR><NU>(n<SUB>21</SUB> − n<SUB>12</SUB> + n<SUB>10</SUB> − n<SUB>01</SUB> + 2n<SUB>20</SUB> − 2n<SUB>02</SUB>)&cjs0823;    4n</NU><DE><RAD><RCD>(n<SUB>21</SUB> + n<SUB>12</SUB> + n<SUB>10</SUB> + n<SUB>01</SUB> + 4n<SUB>20</SUB> + 4n<SUB>02</SUB>)&cjs0823;    16n<SUP>2</SUP></RCD></RAD></DE></FR> (1)
The subscripts on T denote that we do not assume Hardy-Weinberg equilibrium and that sibs are used to contruct the parent allele frequency.

                              
View this table:
[in this window]
[in a new window]
 
Table 1.   Genotype Outcomes, Scores, and Probabilities for Affected-Unaffected Sib Pair

To calculate the power of statistic 1, we reformat TDS to
<UP>T<SUB>DS</SUB> = </UP><FR><NU>(n<SUB>21</SUB> − n<SUB>12</SUB> + n<SUB>10</SUB> − n<SUB>01</SUB> + 2n<SUB>20</SUB> − 2n<SUB>02</SUB>)&cjs0823;    <RAD><RCD>16n</RCD></RAD></NU><DE><RAD><RCD>(n<SUB>21</SUB> + n<SUB>12</SUB> + n<SUB>10</SUB> + n<SUB>01</SUB> + 4n<SUB>20</SUB> + 4n<SUB>02</SUB>)&cjs0823;    16n</RCD></RAD></DE></FR> (2)
We assume the denominator converges to its expected value (by the Law of Large Numbers), and thus, we need only calculate this expectation along with the mean and variance of the numerator under the alternative hypothesis. We denote the expectation of the square of the denominator as E(sigma 20) and the mean and variance of the numerator as radical n nu  and sigma 2a. From Table 1,
E(&sfgr;<SUP>2</SUP><SUB>0</SUB>) = <FR><NU>1</NU><DE>32</DE></FR>[m<SUB>(21)</SUB> + m<SUB>(10)</SUB> + m<SUB>11</SUB> (3f<SUB>2</SUB> + 2f<SUB>1</SUB> + 3)&cjs0823;    (f<SUB>2</SUB>
 + 2f<SUB>1</SUB> + 1)]
&ngr; = ½(m<SUB>(21)</SUB> &pgr;<SUB>21</SUB> + m<SUB>(10)</SUB>&pgr;<SUB>10</SUB> + m<SUB>11</SUB>&pgr;<SUB>11</SUB>)
and
&sfgr;<SUP>2</SUP><SUB>a</SUB> = E(&sfgr;<SUP>2</SUP><SUB>0</SUB>) − &ngr;<SUP>2</SUP>
Then, the power is given by
&PHgr;<FENCE><FR><NU><RAD><RCD>E(&sfgr;<SUP>2</SUP><SUB>0</SUB>)</RCD></RAD> &Zgr;<SUB>&agr;</SUB> + &ngr;<RAD><RCD>n</RCD></RAD></NU><DE>&sfgr;<SUB>a</SUB></DE></FR></FENCE> (3)

r Affected and s Unaffected Sibs

By using the same logic described above for one affected and one unaffected sib, we can construct a sibship-based disequilibrium test statistic for the general case of r affected and s unaffected sibs. We classify the various outcomes into six groups based on the possible matings that could have produced them: (I) All sibs are AA; (II) all sibs are aa; (III) all sibs are Aa; (IV) all sibs are either AA or Aa; (V) all sibs are either Aa or aa; and (VI) the genotypes AA and aa (and possibly Aa) appear among the sibs. These categories are meant to be mutually exclusive, so that, for example, group IV excludes the case of all sibs being AA. In theory, it may be possible to obtain additional information by subdividing groups IV and V by the number of Aa individuals; however, by the above grouping scheme, we are able to obtain analytic formulas for power and sample size, as described below. We can characterize each possible outcome as a vector with the six elements (j2, j1, j0, k2, k1, k0) where ji is the number of affected sibs with i A alleles, and ki is the number of unaffected sibs with i A alleles. Note that j2 + j1 + j0 = r, and k2 + k1 + k0 = s, and we define t = r + s. The possible outcomes, by group, are listed in Table 2, along with their probabilities under the alternative hypothesis. Under the null hypothesis, the corresponding probabilities can be obtained by using the population mating-type frequencies instead of the conditional (on having r affected children) mating-type frequencies and substituting in f2 = f1 = 1.

To derive the TDS statistic, we first sum up the probabilities across all possible outcomes within each group under the null hypothesis. We obtain the following totals:
<UP>I: </UP>m<SUB>22</SUB> + m<SUB>(21)</SUB>(½)<SUP>t</SUP> + m<SUB>11</SUB>(¼)<SUP>t</SUP>
<UP>II:</UP> m<SUB>11</SUB>(¼)<SUP>t</SUP> + m<SUB>(10)</SUB>(½)<SUP>t</SUP> + m<SUB>00</SUB>
<UP>III:</UP> m<SUB>(21)</SUB>(½)<SUP>t</SUP> + m<SUB>(20)</SUB> + m<SUB>11</SUB>(½)<SUP>t</SUP> + m<SUB>(10)</SUB>(½)<SUP>t</SUP> (4)
<UP>IV:</UP> m<SUB>(21)</SUB>[1 − (½)<SUP>t−1</SUP>] + m<SUB>11</SUB>[(¾)<SUP>t</SUP> − (½)<SUP>t</SUP> − (¼)<SUP>t</SUP>]
<UP>V:</UP> m<SUB>11</SUB>[(¾)<SUP>t</SUP> − (½)<SUP>t</SUP> − (¼)<SUP>t</SUP>] + m<SUB>(10)</SUB>[1 − (½)<SUP>t−1</SUP>]
<UP>VI:</UP> m<SUB>11</SUB>[1 + (½)<SUP>t</SUP> − 2(¾)<SUP>t</SUP>]
We denote by nI the number of observations that fall into group I and similarly for the other groups. By equating the sample frequencies of each group, that is, nI/n, nII/n, etc., with their respective probabilities, and solving the six equations, we can get unbiased maximum likelihood estimates of the m(ij)'s under the null hypothesis, which are denoted by &mcirc;(ij). Recalling that p2 = m22 + 3/4 m(21) + 1/2 m(20) + 1/2 m11 + 1/4 m(10), and using the maximum likelihood estimates of the mij based on the simplified classification scheme given above, we can estimate p2 by
<A><AC>p</AC><AC>ˆ</AC></A><SUB>2</SUB> = [n<SUB><UP>I</UP></SUB> + ½n<SUB><UP>III</UP></SUB> + ¾n<SUB><UP>IV</UP></SUB> + ¼n<SUB><UP>V</UP></SUB> + ½n<SUB><UP>VI</UP></SUB>]&cjs0823;    n (5)
This formula can be easily derived by taking the linear combination in equation 5 applied to the formulas in equation 4. Then, to obtain p2, we can simply assign a score S(p2) of 1, 3/4, 1/2, 1/4, or 0 depending on the group membership of the outcome; these scores are given in Table 2.

This derivation is similar to the approach we took for the simple case of one affected and one unaffected sib. However, in this general case, collapsing all possible sibship outcomes (ignoring affection status) into the six groups defined above, although unbiased, does not use all of the information available. Specifically, within group IV there is additional information about parental mating type based on the frequency of sibships defined by the number of AA and Aa sibs. For example, in sibships of size 3, this would correspond to the relative frequency of sibships with two AA and one Aa sib versus those with one AA and two Aa sibs, which provides some information on the relative frequency of the parental mating type AA × Aa versus Aa × Aa. A similar comment applies to group V (for matings Aa × aa and Aa × Aa). For the four other sibship groups, further subdivision is either not possible (groups I, II, and III) or provides no additional information about mating type (group VI, in which the parental mating type is automatically Aa × Aa). By not further subgrouping groups IV and V, we are able to derive formulas for the estimate of p2 and Var(p1 - p2) that are simple and robust and can therefore also perform all power calculations and sample estimates analytically. Presumably, there is also some loss of efficiency in doing so, although much of the information about parental-mating type frequencies is contained in the relative frequency of groups I to VI. A maximum likelihood solution to estimate the parental mating type frequencies allowing for subgrouping of groups IV and V may be possible by numerical means; however, no simple formulas for parameter estimation, power calculations, or sample size estimates are possible in this case. Furthermore, we demonstrate below in numerical examples that our simple statistic is more efficient than one based on comparing the frequency of allele A in affected versus unaffected sibs, for sibships of size 3 or greater.

Scores can also be assigned for the estimate of p1. To do so, we simply take (j2 + 1/2j1) / r, independent of which group contains the outcome. These scores [S(p1)] are also given in Table 2. To estimate p1 - p2, we can then assign scores based on the difference in the scores S(p1) and S(p2); these scores, S(p1 - p2), are also given in Table 2. As can be seen there, the score is (j2 - j1) / 4r in sibships with only AA and Aa sibs, (j1 - j0) / 4r in sibships with only Aa and aa sibs, and (j2 - j0) / 2r in sibships with AA and aa sibs.

In some sense, some of the scoring of sibships, as given in Table 2, may seem counterintuitive. Consider a sibship of two affected and one unaffected. For groups I to III, the uniform scoring of 0 is straightforward, as all sibs (affected and unaffected) have the same genotype. Now, suppose the two affected sibs have genotypes AA and Aa. This sibship will be scored the same (0) if the unaffected sib has genotype AA or Aa. This is because, in either case, the sibship belongs to group IV, and the unaffected child does not change the possible mating types of the parents. On the other hand, if the unaffected sib is genotype aa, the sibship now belongs to group VI and gets a score of +1/2 because the parental mating type is Aa × Aa. As another example, suppose the two affected sibs have genotypes AA and aa. Then the sibship will be scored 0 whatever the genotype of the unaffected sib (i.e., AA, Aa, or aa) because the sibship automatically belongs to group VI. A scoring routine based on the frequency of the A allele in the affected sibs versus the unaffected sib would score this family differently based on whether the unaffected sib was AA, Aa, or aa (e.g., -1/2 if the unaffected sib is AA, 0 if Aa, and +1/2 if aa). However, it is clear that in the creation of a TDT-type statistic (comparing offspring with parents' allele frequency), in this case the unaffected child provides no additional information.

Under the null hypothesis, E(p1 - p2) = 0.To calculate Var(p1 - p2), we note that p1 - p2 =[Sigma Si(p1 = p2)] / n is the average of n independent, identically distributed scores, so that Var(p1 - p2) =<FR><NU><IT>1</IT></NU><DE><IT>n</IT></DE></FR>Var[S(p1 - p2)], where the subscript i has been suppressed. Because E[S(p1 - p2)] = 0, we simply calculate Var[S(p1 - p2)] = E{[S(p1 - p2)]2}. After some lengthy algebra, we obtain
<UP>Var</UP>[S(p<SUB>1</SUB> − p<SUB>2</SUB>)] = (m<SUB>(21)</SUB> + m<SUB>(10)</SUB>)<FENCE><FR><NU>1</NU><DE>r</DE></FR> − (½)<SUP>t−1</SUP></FENCE>&cjs0823;    16
 + m<SUB>11</SUB><FENCE><FR><NU>1</NU><DE>r</DE></FR> − <FR><NU>1</NU><DE>3</DE></FR>(¾)<SUP>t</SUP> − (½)<SUP>t</SUP> − (¼)<SUP>t</SUP></FENCE>&cjs0823;    8
By using logic similar to that used in the derivation of p2 and using the maximum likelihood estimates of the mij, we can estimate this variance by
<A><AC>V</AC><AC>ˆ</AC></A>[S(p<SUB>1</SUB> − p<SUB>2</SUB>)] = <A><AC>&sfgr;</AC><AC>ˆ</AC></A><SUP>2</SUP><SUB>0</SUB> = <FR><NU>1</NU><DE>16n</DE></FR>(n<SUB><UP>IV</UP></SUB> + n<SUB><UP>V</UP></SUB>) <FR><NU><FENCE><FR><NU>1</NU><DE>r</DE></FR> − (½)<SUP>t−1</SUP></FENCE></NU><DE>[1 − (½)<SUP>t−1</SUP>]</DE></FR>
 + <FR><NU>n<SUB><UP>VI</UP></SUB></NU><DE>8n</DE></FR> <FR><NU><AR><R><C><FENCE><FR><NU>1</NU><DE>r</DE></FR> [1 − (½)<SUP>t</SUP> − (¾)<SUP>t</SUP> + (¼)<SUP>t</SUP>]</FENCE></C></R><R><C><FENCE>+ <FENCE><FR><NU>3</NU><DE>8</DE></FR></FENCE><SUP>t−1</SUP> − <FR><NU>1</NU><DE>3</DE></FR>(¾)<SUP>t</SUP> − (½)<SUP>t</SUP> − (¼)<SUP>t</SUP></FENCE></C></R></AR></NU><DE>[1 − (½)<SUP>t−1</SUP>][1 + (½)<SUP>t</SUP> − 2(¾)<SUP>t</SUP>]</DE></FR>
(6)

Thus, the TDS statistic, for the general case of r affected and s unaffected sibs, is given by
<UP>T<SUB>DS</SUB></UP> = <FR><NU>&Sgr;S<SUB>i</SUB>(p<SUB>1</SUB> − p<SUB>2</SUB>)</NU><DE><A><AC>&sfgr;</AC><AC>ˆ</AC></A><SUB>0</SUB> <RAD><RCD>n</RCD></RAD></DE></FR>
in which the scores are given in Table 2 and sigma 0 by the square root of formula 6. Under the null hypothesis, the TDS statistic is approximately normally distributed with mean 0 and variance 1.

To calculate the power of this test, we need to determine nu  = E[S(p1 - p2)], E(sigma 02), and Var[S(p1 - p2)] under the alternative hypothesis. Then, using the formulas in Table 2, and after some tedious algebra, we obtain the following results:
&ngr; = E[S(p<SUB>1</SUB> − p<SUB>2</SUB>)] = ¼m<SUP>(r)</SUP><SUB>(21)</SUB>[c<SUB>21</SUB> − c<SUB>20</SUB> − (½)<SUP>s</SUP>(c   <SUP>r</SUP><SUB>21</SUB> − c   <SUP>r</SUP><SUB>20</SUB>)]
 + ¼m<SUP>(r)</SUP><SUB>(10)</SUB>[c<SUB>01</SUB> − c<SUB>00</SUB> − (½)<SUP>s</SUP>(c   <SUP>r</SUP><SUB>01</SUB> − c   <SUP>r</SUP><SUB>00</SUB>)] (7)
 + ¼m<SUP>(r)</SUP><SUB>11</SUB>[2(c<SUB>12</SUB> − c<SUB>10</SUB>) − (¾)<SUP>s</SUP>(c<SUB>12</SUB> + c<SUB>11</SUB>)<SUP>r</SUP>
 + (¾)<SUP>s</SUP>(c<SUB>11</SUB> + c<SUB>10</SUB>)<SUP>r</SUP> − (¼)<SUP>s</SUP>c   <SUP>r</SUP><SUB>12</SUB> + (¼)<SUP>s</SUP>c   <SUP>r</SUP><SUB>10</SUB>]
E(<A><AC>&sfgr;</AC><AC>ˆ</AC></A><SUP>2</SUP><SUB>0</SUB>) = <FR><NU><FENCE><FR><NU>1</NU><DE>r</DE></FR> − (½)<SUP>t−1</SUP></FENCE></NU><DE>16[1 − (½)<SUP>t−1</SUP>]</DE></FR> <FENCE>m<SUP>(r)</SUP><SUB>(21)</SUB>[1 − (½)<SUP>s</SUP>(c   <SUP>r</SUP><SUB>21</SUB> + c   <SUP>r</SUP><SUB>20</SUB>)]</FENCE>
 + m<SUP>(r)</SUP><SUB>(10)</SUB>[1 − (½)<SUP>s</SUP>(c   <SUP>r</SUP><SUB>01</SUB> + c   <SUP>r</SUP><SUB>00</SUB>)] + m<SUP>(r)</SUP><SUB>11</SUB>[(¾)<SUP>s</SUP>(c<SUB>12</SUB> + c<SUB>11</SUB>)<SUP>r</SUP>
 + (¾)<SUP>s</SUP>(c<SUB>11</SUB> + c<SUB>10</SUB>)<SUP>r</SUP> − (¼)<SUP>s</SUP>c   <SUP>r</SUP><SUB>12</SUB> − (¼)<SUP>s</SUP>c   <SUP>r</SUP><SUB>10</SUB> − 2(½)<SUP>s</SUP>c   <SUP>r</SUP><SUB>11</SUB>}
 + m<SUP>(r)</SUP><SUB>11</SUB>{<FR><NU>1</NU><DE>r</DE></FR> [1 − (½)<SUP>t</SUP> − (¾)<SUP>t</SUP> + (¼)<SUP>t</SUP>] + <FENCE><FR><NU>3</NU><DE>8</DE></FR></FENCE><SUP>t−1</SUP>
 − <FR><NU>1</NU><DE>3</DE></FR> (¾)<SUP>t</SUP> − (½)<SUP>t</SUP> − (¼)<SUP>t</SUP>}
× <FR><NU><AR><R><C><FENCE>1 − (¾)<SUP>s</SUP>(c<SUB>11</SUB> + c<SUB>10</SUB>)<SUP>r</SUP>− (¾)<SUP>s</SUP>(c<SUB>12</SUB>+ c<SUB>11</SUB>)<SUP>r</SUP> + (½)<SUP>s</SUP>c   <SUP>r</SUP><SUB>11</SUB></FENCE></C></R></AR></NU><DE>8[1 − (½)<SUP>t−1</SUP>][1 + (½)<SUP>t</SUP> − 2(¾)<SUP>t</SUP>]</DE></FR> (8)
and
&sfgr;<SUP>2</SUP><SUB>a</SUB> = <UP>Var</UP>[S(p<SUB>1</SUB> − p<SUB>2</SUB>)] = <FR><NU>1</NU><DE>16r</DE></FR>m<SUP>(r)</SUP><SUB>(21)</SUB>[r − 4(r − 1)c<SUB>21</SUB>c<SUB>20</SUB> − r(½)<SUP>s</SUP>(c   <SUP>r</SUP><SUB>21</SUB> + c   <SUP>r</SUP><SUB>20</SUB>)]
 + <FR><NU>1</NU><DE>16r</DE></FR>m<SUP>(r)</SUP><SUB>(10)</SUB>[r − 4(r − 1)c<SUB>01</SUB>c<SUB>00</SUB> − r(½)<SUP>s</SUP>(c   <SUP>r</SUP><SUB>01</SUB> + c   <SUP>r</SUP><SUB>00</SUB>)]
 + <FR><NU>1</NU><DE>16r</DE></FR> m<SUP>(r)</SUP><SUB>11</SUB>{(¾)<SUP>s</SUP>[r(c<SUB>21</SUB> + c<SUB>11</SUB>)<SUP>r</SUP> − 4(r − 1) c<SUB>12</SUB>c<SUB>11</SUB>
(c<SUB>12</SUB> + c<SUB>11</SUB>)<SUP>r−2</SUP> + r(c<SUB>11</SUB> + c<SUB>10</SUB>)<SUP>r</SUP> − 4(r − 1)c<SUB>11</SUB>c<SUB>10</SUB>
(c<SUB>11</SUB> + c<SUB>10</SUB>)<SUP>r−2</SUP> − 4(2f<SUB>1</SUB> + r)c   <SUP>2</SUP><SUB>10</SUB>(c<SUB>11</SUB> + c<SUB>10</SUB>)<SUP>r−2</SUP> − 4(rf<SUB>2</SUB>
+ 2f<SUB>1</SUB>)c<SUB>12</SUB>c<SUB>10</SUB>(c<SUB>12</SUB> + c<SUB>11</SUB>)<SUP>r−2</SUP>] − r(½)<SUP>s−1</SUP>c   <SUP>r</SUP><SUB>11</SUB> − r(¼)<SUP>s</SUP>
(c   <SUP>r</SUP><SUB>12</SUB> + c   <SUP>r</SUP><SUB>10</SUB>) + 4(r − 1)(c<SUB>12</SUB> − c<SUB>10</SUB>)<SUP>2</SUP> + 4(c<SUB>12</SUB> + c<SUB>10</SUB>)} (9)
The power can then be calculated using formula 3, substituting formulas 7, 8, and 9 for nu , E(sigma 20), and sigma 2a, respectively.

Numerical Results---Individual Genotyping vs. Pooling

Using the power formulas described above, we can calculate required sample sizes to detect linkage disequilibrium. The logic is the same as described in Risch and Teng (1998) for sample pooling; again, we use a significance level of 5 × 10-8 and 80% power. The required sample sizes are given in Table 3. Using the TDS test for sibships without parents with individual genotyping can produce a significant advantage over the pooled statistic (THS), depending on the family structure (compare with Table 4 in Risch and Teng 1998). For families with one affected sib, the sample sizes are roughly comparable, with low allele frequencies slightly favoring the TDS statistic but high allele frequencies slightly favoring the THS statistic. As the number of affected sibs increases, however, the advantage of the TDS statistic increases. For two affected sibs, on average (across genetic models), 25% fewer families are required; for three affected sibs, 35% fewer are needed, whereas for four affected sibs, nearly half as many families are necessary using individual genotyping and the TDS statistic. As in the case for one affected child, the ratios are highest at low allele frequencies. The only exception is the high frequency dominant situation, in which the THS test may retain a slight advantage. We note also that these conclusions are reasonably independent of the number of unaffected sibs used.

                              
View this table:
[in this window]
[in a new window]
 
Table 2.   Probabilities of Different Outcomes for r Affected and s Unaffected Sibs and Scores for the TDS Statistic

                              
View this table:
[in this window]
[in a new window]
 
Table 3.   Number of Sibships Without Parents Required to Detect LD Using Individual Genotyping

From Table 2 and Table 3 of Risch and Teng (1998), we can also contrast the number of families required under individual genotyping when both parents are available versus using two unaffected sibs when they are not (giving an identical number of family members). Using two unaffected sibs requires ~50% more families, roughly independent of the number of affected sibs and genetic model. This number can be substantially higher, however, for a very common dominant allele.

Combining Families of Different Structure

As described previously in Risch and Teng (1998), it is typical that an investigator will have families of different structure, including different numbers of affected sibs and possibly unaffected sibs. As in the case for pooled samples, we suggest taking a weighted sum of allele frequency differences (p1 - p2) for the various family structures, in which the weight is according to the number affected in the family and the number of families of that structure. Thus, for families with r affected sibs, we multiply (p1 - p2) by rnri before summing, in which nri is the number of families with r affected of structure i, and then divide the total by N Sigma rnri. To obtain the denominator, we simply sum r2n2riVar(p1 - p2), in which the variance of p1 - p2 for a given family structure under the null hypothesis is given in the formulas above, divide by N2, and then take the square root.

    DISCUSSION
Top
Abstract
Introduction
DISCUSSION
References

We have considered test statistics that can be created when individual genotyping is performed in nuclear families containing affected and unaffected sibs without parents. We have shown previously that to calculate the TDT for families with parents, individual genotyping is only required for the parents, to obtain a direct estimate of h. The child allele frequencies can still be obtained by DNA pooling, which could lead to a significant reduction in genotyping effort, especially for larger sibships.

Because it is possible to estimate the variance in the allele frequency difference between the affected and unaffected sibs without the Hardy-Weinberg assumption in families without parents, estimators that are immune to population stratification artifacts can be constructed. The statistic we have described, the TDS test, is analogous to the TDT because it contrasts allele frequencies between parents and affected offspring, as in the TDT, and uses a variance estimate independent of the Hardy-Weinberg assumption. In this case, the parent allele frequencies are estimated from the total offspring sibship, including both the affected and unaffected offspring.

When the tested sibship contains only a single affected, the power of the TDS statistic is quite close to the pooled THS statistic, so the primary advantage of the TDS statistic is its robustness. However, as the number of affected in the sibship increases, the power of the TDS test increases relative to the THS test, providing an additional advantage. We also note that the TDS statistic is easily calculated using the scores given in Table 2 and its variance by formula 6 above.

When families with multiple affected sibs are used, neither the pooled statistic THS described in Risch and Teng (1998) nor the TDS test described here compare favorably in terms of power with tests based on using unrelated controls instead of unaffected sibs. Thus, strategies involving both family-based as well as unrelated controls may be preferable.

It may be tempting to use the same group of affecteds in a two-stage process---that is, first comparing them to unrelated controls to increase power to identify candidate loci and then comparing these same affected individuals to family-based controls (parents or unaffected sibs) for robustness. However, in this approach, the two tests will be positively correlated under the null hypothesis, and so the threshold for significance for the second test needs to be constructed taking this correlation into account.

Other tests of linkage disequilibrium based on sibships without parents and individual genotyping have been proposed. Penrose first suggested the use of unaffected sibs as controls in association studies to protect against artifactual results owing to population stratification (Clarke et al. 1956). The method of C.A.B. Smith (Smith 1961), as also described in Clarke et al. (1956), is essentially based on a comparison of genotypes in affected children with their unaffected sibs. The proposal of Curtis (1997) is similar in this regard. Since our paper was submitted, two additional papers (Boehnke and Langefeld 1998; Spielman and Ewens 1998) have appeared describing sibship-based statistics. These tests are also based on allele (or genotype) frequency difference between affected and unaffected sibs, similar to the original Smith test. For sibships with one affected and one unaffected sib, all of these tests (including ours) are equivalent. However, for larger sibships the tests diverge.

We have chosen to focus on a TDT-like statistic, estimating parental allele and heterozygosity frequency, as this approach yields a more efficient test for sibships with multiple affecteds. However, a critical assumption underlying this advantage is that unaffected sibs reflect a random distribution of parental alleles. This will certainly be nearly true whenever the "locus-specific" penetrance for the tested locus is low and the unaffected sibs are selected randomly. However, this statistic would not necessarily be more efficient than a statistic based on comparison of allele frequencies in affected versus unaffected sibs, when the locus-specific penetrance is high or when the unaffected sibs are chosen from the opposite extreme of a continuous distribution from which the affecteds are chosen (e.g., lean sibs of obese sib pairs) (Eaves and Meyer 1994; Risch and Zhang 1995). In this case, the allele frequency in unaffected sibs is also expected to deviate from the parental allele frequency. The relative efficiency of the two types of tests, in this case, will depend on the degree to which the allele frequency in affected sibs is expected to deviate from that in unaffected sibs relative to that in the parents, and on the number of unaffected sibs.

At first glance, it may seem mysterious as to why the TDS statistic has increased efficiency over other sibship-based statistics that compare affected and unaffected sibs. These latter statistics are based solely on comparisons of genotypes within sibships. However, there is additional information available in the sample that our statistic incorporates, namely, the relative frequency of the different sibship genotype constellations (ignoring affection status in the sibship). For example, for sibships of size 3, we also use the frequency of sibships with three AA sibs, two AA and one Aa sib, two AA and one aa sib, and so on (for all possible genotype combinations). This distribution of sibship genotypes provides information regarding the frequency of the six possible parent mating types. Because the mating-type frequencies are estimated without assuming random mating, the estimation procedure is robust to any deviation from random mating including population stratification. For example, in the extreme stratification case in which half the sibships have three AA sibs and the other half three aa sibs, our procedure estimates half the parent mating types to be AA × AA and the other half to be aa × aa, a complete deviation from random mating and Hardy-Weinberg genotype frequencies.

The analogy of the TDS statistic to the TDT statistic may also seem mysterious if the latter is viewed as a statistic derivable only from intact nuclear families. As we showed in Risch and Teng (1998), however, the TDT is calculated from three components: (1) the frequency of allele A in the offspring (p1); (2) the frequency of allele A in the parents (p2); and (3) the frequency of heterozygous parents (h). It is entirely unnecessary to have intact families to derive these statistics. For example, p1 and p2 can be obtained, in theory, by DNA pooling, whereby all children are pooled together and all parents are pooled together. Even if parent DNA samples are separated from their offspring's, a TDT can still be calculated. All that is required is knowing that a sample is from a child or a parent. Thus, it is obviously unnecessary to know which child genotypes are associated with which parent genotypes to construct a TDT.

In the TDS statistic, we are effectively recreating a TDT-type statistic. In this case, however, parental allele frequencies and heterozygosity are not estimated directly from the parents, who are missing, but from the offspring. That this can be done without bias derives from the fact that there are at least as many different possible sibship genotype constellations as parent mating types.

    ACKNOWLEDGMENTS

This work was supported, in part, by grants from the National Human Genome Research Institute (HG00348) and the Nancy Pritzker Foundation. We are grateful to Dr. Michael Boehnke for many helpful comments and suggestions on this manuscript and to Drs. David Curtis and Cedric Clarke for pointing out the Clarke et al. reference.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

    FOOTNOTES

4 Corresponding author.

EMAIL risch{at}lahmed.stanford.edu; FAX (650) 725-1534.

Received January 7, 1998; accepted in revised form January 20, 1999.

    REFERENCES
Top
Abstract
Introduction
DISCUSSION
References

  • Boehnke, M. and C.D. Langefeld. 1998. Genetic association mapping based on discordant sib pairs: The discordant-alleles test. Am. J. Hum. Genet. 62: 950-961[CrossRef][Medline].
  • Clarke, C.A., J. Wyn Edwards, D.R.W. Haddock, A.W. Howel-Evans, R.B. McConnell, and P.M. Sheppard. 1956. ABO blood groups and secretor character in duodenal ulcer. Br. Med. J. 2: 725-731.
  • Curtis, D. 1997. Use of siblings as controls in case-control association studies. Ann. Hum. Genet. 61: 319-333[CrossRef][Medline].
  • Eaves, L. and J. Meyer. 1994. Locating human quantitative trait loci: Guidelines for the selection of sibling pairs for genotyping. Behav. Genet. 24: 443-455[CrossRef][Medline].
  • Falk, C.T. and P. Rubinstein. 1987. Haplotype relative risks: An easy reliable way to construct a proper control sample for risk calculations. Ann. Hum. Genet. 51: 227-233[Medline].
  • Risch, N. and H. Zhang. 1995. Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science 268: 1584-1589[Abstract/Free Full Text].
  • Risch, N. and J. Teng. 1998. The relative power of family-based and case-control designs for association studies of complex human diseases. I. DNA pooling. Genome Res. 8: 1273-1288[Abstract/Free Full Text].
  • Smith, C.A.B. 1961. Statistical methods and theory. In Recent advances in human genetics (ed. L.S. Penrose), pp. 148-149. J.&A. Churchill, Ltd., London, UK.
  • Spielman, R.S., R.E. McGinnis, and W.J. Ewens. 1993. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52: 506-516[Medline].
  • Spielman, R.S. and W.J. Ewens. 1998. A sibship based test for linkage in the presence of association: The sib transmission/disequilibrium test. Am. J. Hum. Genet. 62: 450-458[CrossRef][Medline].
  • Terwilliger, J.D. and J. Ott. 1992. A haplotype-based "haplotype-relative risk" approach to detecting allelic associations. Hum. Hered. 42: 337-346[CrossRef][Medline].

Received November 9, 1998; accepted in revised form January 20, 1999.


9:234-241 ©1999 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/99 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
K. Allen-Brady, L. A. Cannon-Albright, S. L. Neuhausen, and N. J. Camp
A Role for XRCC4 in Age at Diagnosis and Breast Cancer Risk.
Cancer Epidemiol. Biomarkers Prev., July 1, 2006; 15(7): 1306 - 1310.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
C. Weinberg
Invited Commentary: Making the Most of Genotype Asymmetries
Am. J. Epidemiol., December 1, 2003; 158(11): 1033 - 1035.
[Full Text] [PDF]


Home page
GeneticsHome page
R. Cheng, J. Z. Ma, F. A. Wright, S. Lin, X. Gao, D. Wang, R. C. El