|
|
|
|
Vol. 9, Issue 10, 936-949, October 1999
LETTER
|
| |
ABSTRACT |
|---|
|
|
|---|
A number of chronic diseases, including cardiovascular disease, appear to have a multifactorial genetic risk component. Consequently, techniques are needed to facilitate evaluation of complex genetic risk factors in large cohorts. We have designed a prototype assay for genotyping a panel of 35 biallelic sites that represent variation within 15 genes from biochemical pathways implicated in the development and progression of cardiovascular disease. Each DNA sample is amplified using two multiplex polymerase chain reactions, and the alleles are genotyped simultaneously using an array of immobilized, sequence-specific oligonucleotide probes. This multilocus assay was applied to two types of cohorts. Population frequencies for the markers were estimated using 496 unrelated individuals from a family-based cohort, and the observed values were consistent with previous reports. Linkage disequilibrium between consecutive pairs of markers within the apoCIII, LPL, and ELAM genes was also estimated. A preliminary analysis of single and pairwise locus associations with severity of atherosclerosis was performed using a composite cohort of 142 individuals for whom quantitative angiography data were available; evaluation of the potentially interesting associations observed will require analysis of an independent and larger cohort. This assay format provides a research tool for studies of multilocus genetic risk factors in large cardiovascular disease cohorts, and for the subsequent development of diagnostic tests.
| |
INTRODUCTION |
|---|
|
|
|---|
Multiple genetic and environmental risk factors appear to contribute to common diseases such as cardiovascular disease and cancer. Although specific genetic causes have been identified among certain families with a history of disease, the association of these genes with disease in the general population is not fully understood. One reason is that genetic predisposition to such diseases can result from the cumulative effect of common allelic variants, variants that individually confer only a modest increased risk. Thus, one challenge is to identify the common multilocus profiles that confer a high risk for disease; a second challenge is to understand how environmental factors modulate expression of a genetic predisposition to disease.
A growing number of genetic variants have been implicated in the development of complex diseases. As these candidate genes are identified, there is an increasing need for assays capable of simultaneously genotyping multiple loci. Studies focused on single markers can be used to assign relative risk values, but this approach provides only a limited context for evaluating genetic risk factors. Studies encompassing multiple markers provide a broader context that is critical to assess information on candidate markers for multifactorial diseases, and multilocus assays can greatly facilitate the necessary genotyping process. Multilocus results can provide insight into mechanisms of disease susceptibility and identify key subsets of predictive markers that are clinically informative. These informative genetic markers can then be used to supplement routine biochemical assays for patient care, for example, in lieu of protein activity or concentration measurements that are difficult to make or show significant intra- and interindividual variability independent of disease state.
In developing a prototype multilocus genotyping assay, we focused on
cardiovascular disease (CVD), a leading cause of death worldwide.
Monogenic disorders, such as familial hypercholesterolemia and
hypertrophic cardiomyopathy (for reviews, see Hobbs et al. 1992
; Day et
al. 1997
; Bonne et al. 1998
), have been identified among some families.
Established risk factors for disease in the general population include
age, gender, diabetes mellitus, obesity, high serum cholesterol levels,
and hypertension, as well as cigarette smoking and physical inactivity
(Pasternak et al. 1996
). These factors, however, do not explain all
premature CVD cases (Hoeg 1997
). A number of these established factors
have genetic components, and as yet unknown risk factors may be
primarily genetic. In addition, recent evidence indicates that genetic
factors influence patient responsiveness to therapeutic intervention,
both dietary (Humphries et al. 1996
) and pharmaceutical (Kuivenhoven et
al. 1998
).
The "CVD35" assay described here is comprised of 35 biallelic sites
within 15 genes representing pathways implicated in the development and
progression of atherosclerotic plaques: lipid metabolism, homocysteine
metabolism, blood pressure regulation, thrombosis, and leukocyte
adhesion (Table 1). The panel
includes well-known polymorphisms in the apolipoprotein E
(apoE; for review, see Mahley 1988
) and angiotensinogen
(AGT; Jeunemaitre et al. 1992
) genes, mutations such as
apoB Gln-3500 (Soria et al. 1989
) and factor V Leiden (Bertina
et al. 1994
), and more recently identified sequence variations in the
methylene tetrahydrofolate reductase (MTHFR; Frosst et al.
1995
; Goyette et al. 1995
) and E-selectin (ELAM; Wenzel et al.
1996
) genes. The CVD35 assay uses pooled polymerase chain reaction
(PCR; Mullis and Faloona 1987
; Saiki et al. 1988
) primer pairs to
coamplify 27 targets from genomic DNA in two reactions. Amplified
fragments within each PCR product pool are then detected
colorimetrically with sequence-specific oligonucleotide probes
immobilized in a linear array on nylon membranes (Saiki et al. 1989
).
Probe sequences have been optimized carefully to permit genotyping of
all sites under a single assay condition. Therefore, large cohorts can
be typed rapidly at all 35 sites, providing an extended database for
evaluating the disease association of these markers, and a multilocus
context for evaluating new candidate markers. We have applied this
multilocus assay to a population-based cohort to estimate allele
frequencies and intragenic haplotypes, and to a lipid clinic-based
cohort to model a case-control study.
|
| |
RESULTS |
|---|
|
|
|---|
CVD35 Assay
A three-primer, two-probe system for apoE was used as the
basis for the CVD35 assay; the specificity of this system has been described previously (Cheng et al. 1998
). No Arg-112/Cys-158 alleles were detected among any of the 1400 control or cohort samples genotyped, consistent with previous studies of the apoE gene
(Houlston et al. 1989
). In future versions of this assay, the
allele-specific primers for codon 112 will be replaced by probes to
simplify genotyping of the apoE marker (data not shown).
The PCR products range from 95 to 535 bp in size. As shown in Figure 1, nearly all of the PCR products in each of the final multiplexes (14 in multiplex A, 13 in multiplex B) could be clearly distinguished by gel electrophoresis. Although the largest product bands appeared relatively weak in fluorescence intensity, these yields were sufficient for detection by the immobilized probes.
|
Detection of the amplified alleles is illustrated in Figure 2, as well
as the specificity of the probe panel; for example, the results distinguish each of the three possible genotypes at LPL(
93) (Fig. 2A: strips three to five), PON192
(Fig. 2A: strips one to three), AGT235 (Fig. 2B: strips one to
three), and factor V 506 (Fig. 2B: strips one, two, and four).
Figure 2C shows the third strip (strip B2) for two different
individuals; of the ~720 unrelated individuals genotyped from all
sources, only one variant allele was detected among these four
candidate markers (data not shown).
|
Samples representing a subset of 303 families from the Stanislas cohort
(Siest et al. 1998
) were used to assess the performance of the CVD35
assay in a large-scale genotyping effort. Eight families were excluded
from the haplotype analysis because of inconsistencies of genotypes
between parents and offspring, although the unrelated parents were
included in the analysis of allele frequencies. In addition, four
samples were omitted as a result of weak second allele signals for
several markers. Because the assay was designed to yield comparable
signals for both alleles in heterozygotes, these weak signals may have
resulted from sample contamination. When available, all questionable
samples will be retyped using new DNA preparations. A total of 1190 samples were used for subsequent analyses of the allele frequencies for
all markers and haplotypes within the apoCIII, ELAM, and
LPL genes.
Genotyping data for apoE, ACE, and LPL447, which had been obtained previously through independent methods, were used to evaluate the accuracy of the CVD35 assay for these three markers. No discordant results were noted for LPL447; those few samples yielding discordant genotype results for apoE and ACE were investigated further. For apoE, detection of the e4 allele by the CVD35 assay was problematic if insufficient template DNA was used for amplification; this difficulty should be corrected in future versions of the assay that no longer rely on codon 112-specific primers (data not shown). Three initially discordant results were traced to the specific aliquots tested.
For the ACE I/D marker, one D allele and two
I alleles that had been identified by capillary
electrophoresis were not detected by the CVD35 assay. This particular
ACE-D allele was detectable if an alternative primer pair was
used, suggesting the presence of a novel sequence variation within the
default priming sites. One of the undetected ACE-I alleles was
also amplifiable with an alternative upstream primer, although with
poor efficiency, and gel electrophoresis revealed a truncated insertion
of ~50 bp in size (data not shown); this ACE-I allele had
been detected previously based on the 3' insertion junction
sequence. The second undetected ACE-I allele was identified
correctly after single-target amplification with the original primer
pair, or using a multiplex amplification with an alternative
ACE-I-specific primer spanning the 3' insertion junction,
as in Evans et al. (1994
; data not shown). Failure of the CVD35 assay
to amplify the full-length Alu element from a wild-type
sequence in a multiplex reaction appeared to be unique to this sample.
With the 3' junction-specific primer, the ACE-I target was
reduced from 533 to 155 bp; this primer was used to confirm all
D/D genotypes within the UCSF cohort (data not shown). Use of
the ACE-I-specific primer spanning the 3' insertion
junction in future assays should address these few difficulties.
Allele Frequencies
No variant alleles were observed within either cohort for 6 of the
35 sites: LPL nucleotide (
39); CETP codon 442;
CBS codons 125, 131, and 307; and MTHFR nucleotide
692. Within the Stanislas cohort, a single CBS Val-114 allele
and no apoB codon Gln-3500 alleles were detected. The
genotypes of 496 unrelated parents from this group were used to
estimate population frequencies for each of the remaining 28 markers;
these data (Table 2) extend the previous report based on 455 individuals (Cheng et al. 1998
). The observed
frequencies are consistent with previous reports for caucasian
populations (as cited in Table 2). Four markers (CBS
Ile/Thr-278, CBS del/ins, ELAM 98, and ELAM
128) appeared to differ (
2 > 3.84) from
expectations based on Hardy-Weinberg equilibrium, and complete
concordance in genotypes between the two CBS markers and
between the two ELAM markers was also noted. This may be type I error, given the number of loci tested; in general, the observed allele frequencies did predict the observed genotype frequencies.
|
Within the UCSF cohort of 142 individuals, no variant CBS
Val-114 alleles were observed. Four carriers of the apoB
Gln-3500 mutation were noted within this clinic-based cohort, in
contrast to the population-based Stanislas cohort. One carrier of the
CBS Thr-278 allele without the 68-bp insertion was also
detected; this genotype was confirmed by sequencing (data not shown).
One sample initially yielded a null result for the ACE target,
but subsequently was assigned the I/I genotype with use
of the insertion-junction-specific primer. All ACE D/D
genotypes were confirmed by reanalysis with the junction-specific
primer (data not shown). Allele frequencies are given in Table 2. One
marker (ELAM554) appeared to differ (
2 = 4.14) from expectations based on Hardy-Weinberg
equilibrium, but given the small sample size and number of loci, this
may reflect type I error; in general, the observed allele frequencies
predicted the observed genotype frequencies.
Linkage Disequilibrium
Using 1100 chromosomes from 275 families, three loci were examined for haplotypes defined by multiple sites typed within each gene: ELAM (three sites), LPL (four sites), and apoCIII (six sites). Linkage disequilibrium estimates between proximal marker sites are presented in Figure 3; the haplotype data will be explored in greater detail elsewhere (W. Klitz et al., in prep.). Allele frequencies among this subset of the Stanislas cohort were comparable to those listed in Table 2.
|
Maximum disequilibrium (D' = 1.0) was observed among the three
sites genotyped within the ELAM gene. The lower statistical significance associated with codon 554 was due to the low frequency of
the Phe-554 allele. Codon Phe-554 was observed only on chromosomes carrying the G98 and Ser-128 alleles. Although complete concordance in
genotypes between the ELAM 98 and 128 sites was noted in this study, chromosomes bearing only one of the two variants have been reported (Wenzel et al. 1996
).
Maximum linkage disequilibrium was also observed between the
LPL promoter site (
93) and codon 9. Three chromosomes were
observed with the (
93)G (less frequent allele) and Asp-9 (more
frequent allele) haplotype; all others were concordant in genotype
between the two sites. Codons 291 and 447 appeared to be in modest
linkage disequilibrium, although statistical significance was not
achieved due to the low frequency of the Ser-291 allele (0.016). The
strong disequilibrium between promoter site (
93) and codon 9 among
caucasians has been reported previously (Hall et al. 1997
). The
suggestion of greater recombination rates upstream of codon 291 and
between codons 291 and 447 (exons 6 and 9), as inferred from the
linkage disequilibrium results, is consistent with recently reported
data (Clark et al. 1998
; Nickerson et al. 1998
).
For the apoCIII gene, linkage disequilibrium was greatest
within the promoter and exon 4 (markers 3175, 3206), regions with marker sites separated by fewer than 150 bp. Within the promoter, interestingly, linkage disequilibrium was not maximal between the
(
482) and (
455) sites, separated by only 27 bp. Disequilibrium was low between the promoter region and site 1100, which are separated by 1.5 kb, yet strong disequilibrium was observed between sites 1100 and 3175 (exons 3 and 4), a 2-kb separation. The relatively infrequent
3175-G variant did appear to be in strong disequilibrium with the
promoter variants (data not shown), as had been reported previously
(Dammerman et al. 1993
). Strong linkage disequilibrium between sites
1100 and 3206 has also been previously noted (Xu et al. 1994
).
Although the likelihood of detecting association with disease is expected to be greatest with allelic variants of demonstrated functional significance, this information may not be readily available. In the absence of clear evidence as to the most functionally significant variations within a single gene, linkage disequilibrium data assist in determining the most informative sites to genotype for disease association studies. Redundant sites may then be replaced by new candidate markers. Linkage disequilibrium data may also lead to hypotheses tracing the evolution of haplotypes that may be associated with disease.
Disease Association
The variant allele frequencies observed within the lipid clinic-based UCSF cohort are listed alongside those of the population-based Stanislas cohort in Table 2. Although the Stanislas and UCSF cohorts were not specifically matched for population substructure, higher frequencies of apoE e2 and e4, apoB Gln-3500, and apoCIII 1100T and 3206G alleles (P < 0.05) were observed within the clinic-based UCSF cohort, consistent with previous reports associating these variants with elevated lipid levels; none of the nonlipid-related markers showed a nominally significant difference in frequencies between the two cohorts. The overall trend of increased frequencies for reportedly disease-associated alleles that was observed in the UCSF cohort might be expected among these individuals who are at higher risk for coronary events than the general population.
The UCSF cohort was comprised of 142 unrelated caucasians for whom
angiograms had been quantitated and scored by the Gensini method
(Gensini 1975
). These scores were used to subdivide the cohort into
quintiles that represented differing severities of coronary arterial
occlusion. No significant deviations from Hardy-Weinberg equilibrium
were noted within these quintiles (data not shown). The Gensini-based
quintiles did not show significant correlation with total, low-density
lipoprotein (LDL), or high-density lipoprotein (HDL) cholesterol
levels, although there was an unexpected, suggestive trend toward lower
average very low-density lipoprotein (VLDL)-triglyceride (TG) and
VLDL-cholesterol levels with increasing Gensini score (data not shown).
Disease association with the allelic variants of 15 markers was explored among female-only (FQ1 vs. FQ5) and the combined gender (Q1 vs. Q5) quintiles, as described under Methods. Although the small size of this UCSF cohort limited the statistical power to detect one- and two-locus effects on risk for CVD, the intent of this analysis was to demonstrate how the CVD35 assay could be used to evaluate disease association and genotype interactions with a case-control study design. Given the exploratory nature of these preliminary analyses, no formal statistical correction for multiple testing was applied.
The markers were first considered individually for association with
disease. The test for apoB codon 71 among women yielded a
nominally significant difference in frequency between the extreme Gensini quintiles (12 carriers of the Ile-71 allele among 20 individuals in FQ1, 4 carriers among 18 in FQ5; uncorrected, two-tailed
P < 0.03). These results are potentially interesting, but
no conclusions can be drawn in light of the small sample size
available. Previous evidence for association of the apoB
Ile-71 site with plasma lipoprotein levels has been mixed (Young et al.
1987
; Tikkanen et al. 1988
).
Multilocus data are of particular value in enabling evaluation of
combinations of markers for their association to complex disease.
Although only large effects would be expected to yield statistically
significant results with this limited sample size, we sought to explore
this opportunity by considering two-locus effects. As shown in Table
3, analysis for two-locus effects within the UCSF
cohort yielded 14 pairs of variant alleles that showed nominally
significant associations (uncorrected P < 0.05) with
angiographic scores in the combined gender or female-only quintile
comparisons. One marker pair, GPIIIa Pro-33 with
ATIIR 1166C, yielded a possibly predisposing association in
both the combined gender and female-only quintile comparisons; the
small cohort size did not permit direct evaluation of the role of
gender. This preliminary study did suggest a number of potentially
interesting two-locus effects, such as increased risk for disease
associated with having two hypertension gene variants, ATIIR1
1166C and AGT Thr-235. A particularly high relative risk was
estimated if CBS Thr-278 (associated with
hyperhomocysteinemia; Hu et al. 1993
) was paired with either apoE
e4 or apoCIII promoter variants (associated with
hypertriglyceridemia; Dammerman et al. 1993
). In contrast, when this
CBS variant was paired with ATIIR1 1166C
(hypertension pathway; Bonnardeaux et al. 1994
), the effect appeared to
be protective. Overall, the number of nominally significant
(uncorrected P < 0.05) marker pairs just exceeded
expectations given a type I error rate of 5%. Analysis of larger
cohorts would provide the necessary power to detect true effects of
clinical relevance, leading to hypotheses that could be tested
subsequently in independent cohorts.
|
| |
DISCUSSION |
|---|
|
|
|---|
With this immobilized probe assay format, large cohorts can be
assayed rapidly for multiple biallelic sites, providing the necessary
epidemiological data for evaluation of these markers in association
with disease or therapeutic response. An additional advantage of this
technology is the relative ease with which the panel of targets can be
modified to include new markers of interest. The assay described here
is currently being expanded to type >60 sites in 36 genes, and could
be expanded even further. One limitation of this approach is that
sequence-specific probes will not identify new mutations or
polymorphisms; only those new sequence variations resulting in
unusually weak signal intensities would be detected. Furthermore, this
format does not detect variable number tandem repeat polymorphisms, and
higher density probe arrays are more appropriate for detection of
specific mutations in genes such as the LDL receptor gene, for which
>600 mutations, including large deletions, have been reported
(University College of London 1999
). Our multilocus assay can be
adopted more readily by individual laboratories, however, particularly
for candidate gene evaluations similar to those described here, as
compared with genome-wide scanning efforts using high-density arrays.
The use of minisequencing on primer arrays with 33P
incorporation to simultaneously genotype 12 variable sites was reported
recently (Pastinen et al. 1998
); this approach is also promising for
rapid analysis of large cohorts.
With larger cohorts, multiple regression or logistic regression methods
have greater power to identify those combinations of genotypes that are
most clinically informative with regard to disease phenotypes and
endpoints. Alternative analytical approaches for multilocus genotype
data sets may also reveal interesting associations. Extensive
(n > 10,000) epidemiological and intervention studies
such as the Framingham Heart Study (Dawber et al. 1951
), Women's
Health Initiative (The Women's Health Initiative Study Group 1998
),
and Multiple Risk Factor Intervention Trial (The MRFIT Research Group
1982
) may offer the greatest power to detect multilocus risk factors,
but smaller cohorts of carefully characterized individuals should also
be informative for factors having significant impact on disease. Even
with relatively large cohorts, direct evaluation of disease risk
associated with combinations of four or more genotypes may be
difficult, and inferences may need to be drawn from analyses of smaller
subsets of markers. As increasing numbers of markers are analyzed, the
issue of multiple testing must also be addressed to provide appropriate
statistical interpretion of the results. This issue arises whether
samples are genotyped using a multilocus assay, as described here, or
through a series of single-locus studies.
Understanding the molecular basis of genetic predisposition to common multifactorial diseases such as cardiovascular disease will depend on the joint efforts of those performing genome-wide scans to identify candidate loci in regions detected through linkage studies and those studying specific mutations and polymorphisms through association studies. Functional studies will also be critical to identify the genetic variations contributing to disease development. The assay described here was designed to provide multilocus genotype information for CVD, but this format can be applied to other diseases such as asthma, bipolar disorder, and osteoporosis. Given the complexity of these diseases, well-defined cases and phenotypes will be essential components of studies seeking to provide insight into disease development from the complex genetic data. Genotype data can then guide the development of algorithms incorporating genetic contributions to calculate aggregate scores of risk, expanding on the approach developed by the Framingham Heart Study investigators for coronary heart disease (Wilson et al.), for example. Clinically informative subsets of these research markers may then form the basis of panels for diagnostic or prognostic use in patient care.
| |
METHODS |
|---|
|
|
|---|
Primers
The sites targeted for PCR amplification are listed in Table 1.
Primers were synthesized with 5' biotinylation using the cyanoethoxyphosphoramidite method (1-µmole scale) on an Applied Biosystems 394 DNA Synthesizer (Perkin-Elmer, Foster City, CA). The use
of allele-specific primers at codon 112 combined with probes for codon
158 to genotype apoE alleles has been described (Cheng et al.
1998
). Primers for the CBS exon 8, factor V Leiden, and MTHFR targets were published previously (Hu et al. 1993
;
Goyette et al. 1995
; Ridker et al. 1995
); the forward primer for
CBS exon 8 was later relocated further upstream, to eliminate
duplication of the 68-bp insertion sequence (Tsai et al. 1996
). The
remaining primer sequences were selected with the assistance of two
software packages, Oligo (v. 5.0, National Biosciences, Plymouth, MN)
and Amplify (v. 1.2, W. Engels, University of Wisconsin, Madison).
Two PCR pools were developed: Multiplex A consisted of 14 biotinylated primer pairs designed to amplify the e2 and e3 alleles of apoE, and targets within the apoB, apoCIII, CETP, LPL, and PON genes. Multiplex B consisted of 13 biotinylated primer pairs designed to amplify the e4 allele of apoE, and targets within the ACE, ATIIR1, AGT, CBS, MTHFR, GPIIIa, fibrinogen, factor V, and ELAM genes. To the extent possible, PCR targets were chosen to be within the 100- to 400-bp size range and to permit resolution of all products by agarose gel electrophoresis. Gel analysis was then used to guide the optimization of PCR conditions. Primer concentrations were adjusted for generally comparable yields of all targets, and ranged from 0.04 to 0.75 µM.
As others have reported (Houlston et al. 1989
; Hixson and Vernier
1990
), amplification of the apoE region, which is relatively high in GC-bp content, was most efficient in the presence of DMSO. Even
in the presence of DMSO, however, the apoCIII promoter target was amplified most effectively if divided into two separate amplicons of 163 and 165 bp.
Unexpectedly weak probe signal intensities necessitated primer redesign for the AGT and GPIIIa targets. Reducing the size of each amplicon resulted in much stronger probe intensities, suggesting the possibility that the longer amplicons were able to form stable secondary structures that inhibited probe binding (data not shown). The AGT target was reduced from 360 to 171 bp; the GPIIIa target was reduced from 312 to 131 bp.
Oligonucleotide Probes
Two probes were designed for each biallelic site, to detect and
distinguish between the variant sequences. Most of the markers required
discrimination of single base differences. To confirm successful
amplification for the two largest PCR targets, probes were also
designed for invariant regions of apoE and ACE.
Candidate probe sequences were selected initially using published
guidelines (Thein and Wallace 1986
), with the assistance of the MELT
program by J. Wetmur (Mt. Sinai School of Medicine, New York, NY; see also Wetmur 1991
) for calculation of dissociation temperatures. Sequences were then modified to meet sensitivity and specificity requirements under the assay temperature and buffer conditions. Concentrations of the final 70 probes were chosen to achieve signal balance between alleles at each variable site, and for generally comparable intensities among all of the loci. Probes were conjugated at
their 5' ends to bovine serum albumin (BSA) by methods similar to
Tung et al. (1991)
, then applied in a linear array to sheets of backed
nylon membrane using a Linear Striper and Multispense2000 controller
(IVEK, N. Springfield, VT). Each sheet was cut into strips between 0.35 and 0.5 cm in width. The probes on "Probe Strip A" corresponded to
the targets amplified by the multiplex A primer pool; "Probe Strips B
and B2" corresponded to the targets amplified by the multiplex B primer pool.
Control DNA Templates
Total genomic DNA from three cell lines was used for preliminary
experiments: Molt-4 (GM02219C from the Human Genetic Mutant Cell
Repository, Coriell Institute, Camden, NJ), KASO11 (no. 9009 from the
10th International Histocompatibility Workshop; Dupont 1987
), and CRK
(kindly provided by the Clinical Immunogenetics Laboratory, Fred
Hutchinson Cancer Center, Seattle, WA). Genomic DNA samples previously
characterized at individual sites by other methods were generously
provided by G. Assmann and H. Funke (Westfälisches Wilhelms-Universität, Münster, Germany) for ACE,
apoB3500, apoE, CETP405, LPL9, LPL291, LPL447, and
MTHFR677; P.F. Bray (Johns Hopkins University, Baltimore, MD)
for GPIIIa; F. Chehab (University of California, San
Francisco, CA) for factor V Leiden; R.M. Krauss and P. Blanche
(Lawrence Berkeley Laboratory, Berkeley, CA) for apoB3500,
apoCIII3206, and apoE; and B. Shane (University of
California, Berkeley, CA) for MTHFR677. Single-stranded
templates containing the point mutations in CBS exons 3 and 8 were prepared on an Applied Biosystems 394 DNA Synthesizer, then
converted to double-stranded templates by PCR, using the appropriate
primer pairs from the multiplex primer pools. For the remaining
markers, variant alleles that were identified during development of the
assay itself were confirmed by sequencing using Dye Terminator and
dRhodamine Terminator Cycle Sequencing Kits with an ABI Prism DNA
Sequencer (Perkin-Elmer). All of these samples were used as controls to
guide the optimization of probe sequences and concentrations for
specificity and sensitivity.
Additional Reagents
MicroAmp tubes for PCR, dNTPs (N = A, G, C, U), and AmpliTaq Gold DNA polymerase were obtained from Perkin-Elmer. Deaza-dGTP was obtained from Boehringer Mannheim Biochemicals (Indianapolis, IN; now Roche Molecular Biochemicals). For higher volume assays, PCRs were performed in 96-well Thermowell Polypropylene Plates with Sealing Mats (Corning Costar, Cambridge, MA). Typing Trays (20-well capacity, amber lid), denaturation solution (1.6% NaOH), SSPE concentrate (20× sodium phosphate solution with NaCl, EDTA), SDS concentrate (20%), streptavidin-horseradish peroxidase conjugate (SA-HRP), substrates A (0.01% H2O2 in citrate solution) and B (0.1% 3,3',5,5'-tetramethylbenzidine in 40% dimethylformamide) for color development, and citrate concentrate (40×) were obtained from Roche Diagnostic Systems (Branchburg, NJ). For manual assays, color development reagent was prepared by mixing five volumes of substrate A per volume of substrate B; for automated assays, substrate A was reduced to four volumes per volume of substrate B.
PCR Amplifications
Approximately 50 ng of total genomic DNA was used for each assay,
25 ng for each multiplex A and multiplex B reaction. In addition to the
primer pools, each 50-µl reaction contained 20 mM
Tris-HCl [0.2 M stock (pH 8.3) at 25°C], 50 mM
KCl, 8.5% DMSO (vol/vol), 0.1 mM dATP, 0.1 mM
dCTP, 0.07 mM dGTP, 0.03 mM 7-deaza-dGTP, 0.2 mM dUTP, 1.7 mM MgCl2 or
MgOAc2, and 7 units of AmpliTaq Gold. The final concentration
of 8.5% DMSO was chosen to enable reliable amplification of the
apoE alleles with minimal adverse impact on the yields of
other products such as the
-fibrinogen promoter region, which is
relatively high in AT-bp content. Deaza-dGTP was also incorporated to
facilitate amplification of regions high in GC content. Deoxy-UTP was
included for compatibility with the use of uracil N-glycosylase to
eliminate PCR product contamination (Longo et al. 1990
). Samples were
amplified in a Perkin-Elmer GeneAmp PCR System 9600 using a 2.4-hr
thermal cycling profile: an initial hold of 94°C for 12.5 min; then
33 cycles of 96°C for 15 sec, 60°C for 1 min, and 72°C for
1.25 min; and a final extension step of 68°C for 5 min.
During assay development, 3- to 5-µl aliquots were run on
horizontal agarose gels using 3% NuSieve, 1% SeaKem GTG agarose (FMC
BioProducts, Rockland, ME) in TBE (89 mM Tris-borate, 1 mM EDTA) with ethidium bromide.
X174 RF
DNA/HaeIII fragments and 123-bp DNA ladder (GIBCO BRL,
Gaithersburg, MD) were used as molecular weight standards.
Allele-Specific Detection
The assay was initially developed at 50°C, and the final 52°C
assay temperature reflects a compromise made to improve specificity at
the apoCIII (
625) site. This marker is the presence (more frequent allele) or absence (less frequent allele) of an A:T bp between
a G:C bp doublet and quartet within a generally GC-rich region.
Sufficient discrimination between these alleles was achieved only by
introducing G:T mismatches into the deletion-specific probe sequence;
these relatively stable mismatches (Thein and Wallace 1986
)
destabilized the region sufficiently to reduce cross-hybridization with
wild-type PCR product while maintaining sensitivity for the variant
allele. Improved discrimination between apoCIII (
625) alleles was also observed at assay temperatures >52°C, but the signal intensities from probes for other markers were adversely affected by these higher temperatures (data not shown).
Therefore, detection of amplified alleles was performed at 52°C using a water bath rotating at 50-60 rpm (Hot Shaker Plus; Bellco, Vineland, NJ). Probe strips were first washed to remove unbound probe in 2× SSPE (0.36 M NaCl, 0.02 M Na2HPO4, 2 mM EDTA, adjusted to pH 7.4 with NaOH), 0.5% SDS. Twenty-microliter aliquots of the biotinylated PCR product pools from multiplex A and B reactions were denatured with equal volumes of denaturation solution, then added to Typing Tray wells containing 3 ml of hybridization buffer (4× SSPE, 0.5% SDS) and a correspondingly labeled probe strip A or probe strip B. Probe strip B2 was included with strip B. After 20 min at 52°C, the hybridization solution was replaced with fresh buffer containing 10 µl of SA-HRP and the strips were returned to the water bath for 5 min. This enzyme conjugate solution was then replaced with the stringent wash buffer (2× SSPE, 0.5% SDS), and the strips were returned to the water bath for 12 min. The washed strips were equilibrated in 50 mM Na-citrate at room temperature on a rotating (50-60 rpm) platform (Gyrotory Shaker Model G2; New Brunswick Scientific, Edison, NJ), then agitated in color development reagent for 8-10 min at room temperature. Developed strips were rinsed with distilled water, aligned on a flat surface next to a guide identifying the allele detected by each probe line, and photographed using type 559 or 55 film from Polaroid (Cambridge, MA). Genotype interpretations were made manually and independently by two individuals. Given this protocol, at least 40 DNA samples per day can be genotyped by one individual. An SLT ProfiBlot IIT (Tecan US, Research Triangle Park, NC) can also be used to automate the hybridization, stringent wash, and color development steps for 12 samples (24 wells) at a time. This level of automation can be used to increase the throughput of one individual to at least 75 samples per day.
Test Cohorts
To estimate population frequencies, the assay was used to genotype
a subset of 1190 samples from 286 families of the Stanislas cohort
recruited from families within eastern France (Siest et al. 1998
). DNA
was prepared from whole blood by the method of salting-out (Miller et
al. 1988
). These samples had been genotyped for apoE, LPL447,
and ACE by methods described previously (Hixson and Vernier
1990
; Evans et al. 1994
; Salah et al. 1997
).
The UCSF cohort was a composite cohort of 142 unrelated caucasian
individuals recruited from clinics within the San Francisco Bay area
(California, USA). These individuals had been recruited on the basis of
a family history of disease, hyperlipidemia, or a treadmill test
indication for angiography. Total cholesterol levels ranged from 162 to
548 mg/dl, with an average of 323 ± 69 mg/dl. DNA was prepared
from whole blood either by the method of Bell (Bell et al. 1981
) or
using the Puregene DNA Isolation Kit (Gentra Systems, Inc.,
Minneapolis, MN). Each DNA sample was associated with a Gensini score,
which assigns greater weight to proximal lesions identified through
quantitative angiography (Gensini 1975
). Fifty of the samples were from
men with an average age of 45.5 ± 8.5 years at the time of
angiography, and Gensini scores ranging from 5 to 135. Ninety-two
samples were from women with an average age of 54.0 ± 11.4 years,
and Gensini scores ranging from 0 to 120. Some of these individuals had
already been genotyped for apoE and apoB3500 by
methods described previously (Hixson and Vernier 1990
; Pullinger et al. 1995
).
Allele Frequencies and Linkage Disequilibrium Analysis
Population frequencies were estimated from allele counts among 496 unrelated parents from the Stanislas cohort for whom all 35 sites had
been genotyped. Allele frequencies were also calculated for the UCSF
cohort. Deviation from Hardy-Weinberg equilibrium was assessed using
the
2 statistic.
Intragenic haplotypes for multiple markers within the apoCIII,
LPL, and ELAM loci were estimated using the Family
Analysis Program (v. PL1; M. Neugebauer and M.P. Baur; Neugebauer et
al. 1984
) from 275 families (1100 chromosomes) within the Stanislas cohort. This data set included families for whom markers on probe strip
B2 had not been genotyped because of the rarity of variation at these
four markers; therefore, this data set included samples that were not
counted in the estimation of population frequencies. The intragenic
haplotypes were used to estimate pairwise linkage disequilibrium values
(D'; Lewontin 1964
; Klitz et al. 1995
) between consecutive sites
within each locus.
Evaluation of Disease Association
The UCSF cohort was divided into quintiles based on the Gensini scores. The first combined gender quintile (Q1) contained the 28 lowest scores (0-8), and the fifth quintile (Q5) contained the 28 highest scores (35-135). The male subset was deemed insufficient in size for separate analysis, but the female subset was considered separately: female-only quintile 1 (FQ1) contained the 20 lowest scores (0-7); FQ5 contained the 18 highest scores (36-120).
A preliminary analysis of disease association was undertaken for a
subset of 15 markers: apoE, apoB71, all apoCIII sites
except 3175, CETP405, PON192, ACE, ATIIR, AGT,
CBS278, MTHFR677, GPIIIa, and
fibrinogen. For the remaining markers, the observed variant allele frequencies were deemed too rare for such an analysis. In
addition, nearly complete linkage disequilibrium was observed between
apoCIII sites (
625) and (
455); therefore, these two were subsequently treated as one marker. Disease association was examined by comparing the extreme quintiles, interpreting Q1 and FQ1 as
individuals having little or no disease, and Q5 and FQ5 as individuals
having the most severe disease. Heterozygous and homozygous carriers of
the variant alleles were counted together, with the exception of
ACE. The ACE alleles were grouped in two ways,
consistent with combining either carriers of the reported risk allele
(D/D, I/D) or carriers of the less frequent
allele (I/I, I/D). For apoE, e3/e4 and
e4/e4 were counted together; no e2/e4 genotypes were
observed in any of the quintiles used for this analysis. For single
sites (more frequent allele A, less frequent allele
a), the odds ratios corresponded to the risk associated with
carriers of the less frequent allele (Aa or aa
genotypes) relative to the AA genotype. For pairwise
combinations of sites, odds ratios were calculated for the risk
associated with having variant alleles at two sites compared with just
one site. Odds ratios were calculated with Haldane's correction when
necessary (Haldane 1955
). P values were calculated using
Fisher's two-tailed exact test (Sokal and Rohlf 1995
). These analyses
were intended to be exploratory; therefore, no formal correction for
multiple testing was applied.
| |
ACKNOWLEDGMENTS |
|---|
We are indebted to our collaborators for their expert advice and generous gifts of characterized DNA samples: G. Assmann and H. Funke (Westfälische Wilhelms-Universität), S. Humphries and I. Day (University College of London Medical School), R. Krauss and P. Blanche (Lawrence Berkeley National Laboratory), P. Bray (Johns Hopkins Medical School), F. Chehab (University of California, San Francisco), and B. Shane (University of California, Berkeley). This work would not have been possible without the support of the Oligo Synthesis and Sequencing Groups at Roche Molecular Systems, and we thank A. Turck and J. Novotny for their technical advice. We also thank R. Higuchi, J. Sninsky, and T. White for their enthusiastic support.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Present address: School of Public Health, St. Louis University, St. Louis, Missouri 63123 USA.
9 Corresponding author.
E-MAIL suzanne.cheng{at}roche.com; FAX (510) 522-1285.
| |
REFERENCES |
|---|
|
|
|---|
93t
g promoter variant in the lipoprotein lipase gene.
Arterioscler. Thromb. Vasc. Biol.
17:
2672-2678
-fibrinogen genotype in determining plasma fibrinogen levels in young survivors of myocardial infarction and healthy controls from Sweden.
Thromb. Haemost.
70:
915-920[Medline].
-blocking agents with a beneficial influence on lipoprotein lipase activity, HDL cholesterol, and triglyceride levels in coronary artery disease patients.
Circulation
95:
2628-2635
93T/G, is associated with lower plasma triglyceride levels and increased promoter activity in vitro.
Arterioscler. Thromb. Vasc. Biol.
17:
1969-1976
-synthase gene deficiency in pyridoxine responsive and nonresponsive homocystinuria.
Hum. Mol. Genet.
2:
1857-1860
455-A
-gene) is associated with differences in plasma fibrinogen levels in young men and women from different regions in Europe.
Arterioscler. Thromb. Vasc. Biol.
15:
96-104
-synthase deficiency.
Hum. Mutat.
1:
113-123[CrossRef][Medline].
Asn).
Arterioscler. Thromb. Vasc. Biol.
15:
468-478
-synthase allele with