|
|
|
|
Published online before print
February 8, 2006, 10.1101/gr.4559106 Genome Res. 16:331-339, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Letter Analysis of allelic differential expression in human white blood cellsPerlegen Sciences, Mountain View, California 94043, USA
Allelic variation of gene expression is common in humans, and is of interest because of its potential contribution to variation in heritable traits. To identify human genes with allelic expression differences, we genotype DNA and examine mRNA isolated from the white blood cells of 12 unrelated individuals using oligonucleotide arrays containing 8406 exonic SNPs. Of the exonic SNPs, 1983, located in 1389 genes, are both expressed in the white blood cells and heterozygous in at least one of the 12 individuals, and thus can be examined for differential allelic expression. Of the 1389 genes, 731 (53%) show allele expression differences in at least one individual. To gain insight into the regulatory mechanisms governing allelic expression differences, we analyze a set of 60 genes containing exonic SNPs that are heterozygous in three or more samples, and for which all heterozygotes display differential expression. We find three patterns of allelic expression, suggesting different underlying regulatory mechanisms. Exonic SNPs in three of the 60 genes are monoallelically expressed in the human white blood cells, and when examined in families show expression of only the maternal copy, consistent with regulation by imprinting. Approximately one-third of the genes have the same allele expressed more highly in all heterozygotes, suggesting that their regulation is predominantly influenced by cis-elements in strong linkage disequilibrium with the assayed exonic SNP. The remaining two-thirds of the genes have different alleles expressed more highly in different heterozygotes, suggesting that their expression differences are influenced by factors not in strong linkage disequilibrium with the assayed exonic SNP.
The correlations between DNA variation and human phenotypic differences, such as height, weight, and susceptibility to certain diseases, are not well understood. While there is evidence that both coding (Koschinsky et al. 2001
Oligonucleotide arrays have been used previously to screen genes for allele-specific expression in yeast (Ronald et al. 2005
Genome-wide allelic expression analysis in human white blood cells We performed a genome-wide analysis to determine the prevalence and characteristics of allele-specific expression in human white blood cells. DNA and RNA were extracted from the white blood cells of 12 unrelated individuals chosen at random from the Stanford Blood Center. High-density oligonucleotide arrays were designed to assay the allele-specific expression of 8406 exonic SNPs, in 4102 genes, in each of the individuals in a high-throughput manner. The arrays are generated by the tiling of 25-bp oligonucleotide probes, such that each SNP is queried by 80 distinct 25-bp probes (Fig. 1). Genomic DNA and cDNA samples from the same individual were amplified with PCR primers specific for intervals surrounding each SNP. The PCR products were then labeled, and hybridized to the high-density oligonucleotide arrays. We extracted the fluorescence intensities for all 80 probes corresponding to each SNP allele, and estimated the concentration of each allele in the DNA and cDNA samples. We then used the estimates to genotype the SNPs in each genomic DNA sample and to quantify the ratio of reference to alternate SNP alleles in the cDNA samples. Each experiment was performed in duplicate, with a total of four arrays being hybridized for each individual (two hybridized with cDNA and two with genomic DNA).
Exonic SNPs were considered to be expressed in white blood cells if transcripts were detected in at least nine of the 12 individuals examined, and were considered to be differentially expressed if the allele frequency fold ratio (reference allele/alternate allele) in heterozygotes was 1.5 or 0.67 (i.e., the apparent reference allele frequency in the RNA, P, was 0.6 or 0.4) (Fig. 2A). Of the 8406 exonic SNPs examined, 3349 were expressed in the white blood cells, and 1983 of these were heterozygous in at least one individual and could therefore be examined for differential expression (Table 1). The 1983 heterozygous exonic SNPs are located in 1389 genes, with 401 of the genes containing multiple exonic SNPs and five of the exonic SNPs located in multiple RefSeq gene transcripts (http://www.ncbi.nlm.nih.gov/RefSeq). More than 50% of the 1389 assayable genes showed differential allelic expression in at least one individual. The false-positive and false-discovery rates are dependent on the fold-ratio threshold used for defining alleles as differentially expressed. For the fold ratio used in this study, 1.5, we estimate the rate of false-positive differential expression in the heterozygote data as 2.5%, and the false-discovery rate as 11.6%. Increasing the fold-ratio threshold from 1.5 to 2.0 would decrease the estimated false-discovery rate by 50%, whereas decreasing the fold-ratio threshold to 1.2 would substantially increase the estimated false-discovery rate.
The allelic expression data for each of the 1983 exonic SNPs are shown in Supplemental Table 1, with data for 13 exonic SNPs specifically discussed in this manuscript shown in Table 2. In these tables we provide the allele frequency fold ratios for each of the heterozygotes. On average, each individual had 502 heterozygous exonic SNPs, and of these, 22% were differentially expressed (Table 3). We report fold ratios that fall between 0.1 and 10, but because of limitations on the technology's ability to reliably determine extreme fold ratios, we report the rest as either 10 or 0.1. As an example of the distribution of allele frequencies for expressed genes in an individual, Figure 2A shows the RNA reference allele frequencies plotted against DNA reference allele frequencies for all the exonic SNPs for individual #9.
Validation To validate our approach for studying allelic expression differences, we first examined the reproducibility of the observed differences between RNA preparations isolated from the same cells at different times as well as the effect of varying input cDNA concentration in the PCR reaction. Independently isolated RNA preparations were assayed using the high-density oligonucleotide arrays, and a regression of the resulting SNP data had an R2 of 0.98. Additionally, a regression of the SNP data obtained by varying input cDNA concentrations between 0.4 ng/µL and 2 ng/µL into the PCR reaction had an R2 of 0.99. These data suggest that our sample preparation methodology contributes surprisingly little to the observed allelic differences, and that the data obtained for a given SNP are highly reproducible.
We next examined the consistency of allelic expression estimates across multiple informative SNPs within the same gene and individual. There were 1321 such pairwise comparisons, and when the 1.5-fold allele frequency ratio threshold was used to define differentially expressed alleles, 1001 (75.8%) of them agreed. Given that 19.5% (22% observed 2.5% false-positive rate) of the exonic SNPs are estimated to be differentially expressed, 68.6% of the SNP pairs are expected to agree by chance. Thus, the observed number of SNP pairs in agreement is greater than that expected by chance but low considering the high reproducibility of the allelic expression results observed for a given SNP. We decided to analyze the concordance of SNP pairs as a function of distance to determine if SNP pairs in close proximity to each other on the mRNA transcript were more likely to agree with each other than those spaced farther apart. This analysis was performed using
As a final validation of the array methodology, we compared our allelic expression results with those obtained by real-time PCR analysis for seven randomly chosen exonic SNPs, for a total of 22 comparisons (Table 4). When using the 1.5-fold allele frequency ratio cutoff to define differentially expressed alleles, the results of the two technologies agreed 82% of the time. In 13 of the comparisons the exonic SNP alleles differentially expressed in the array analysis also showed differential expression by real-time PCR, in five comparisons exonic SNP alleles showed nearly equal expression in both techniques, and in four comparisons the techniques disagreed. For two of the four comparisons with results that disagreed, the fold ratios were in the correct direction and close in value, but the 1.5-fold threshold for differential expression was only reached using one of the technologies. Thus, the two technologies were significantly discrepant in only two of the 22 comparisons. Linear regression on the log fold ratios from the two techniques gave a correlation coefficient R2 of 0.707 (P = 9.3 x 107). Thus, while they correlated well in terms of their ability to identify differentially expressed genes, the fold ratios provided by the two technologies matched less closely.
These validation data show that when we determine that exonic SNP alleles are differentially expressed, those results are reproducible both between replicates on the array platform and across different platforms. The exact fold ratios of differential expression for exonic SNPs are not consistent across platforms, suggesting that they are not accurately determined by our assay. Additionally, our assay appears to detect differential expression of different exonic SNPs with varying sensitivity.
Allelic expression patterns reveal underlying molecular regulatory mechanisms
Examining the differential expression of the 61 exonic SNPs, we observed three distinct patterns: (1) monoallelic expression (defined here as a fold ratio of
Exonic SNPs in three of the 60 genes (5%) showed monoallelic expression in each of the expressing heterozygotes, ss23480954 in FLJ33071, ss38338836 in PRIM2A, and ss24225694 in ZNF463 (Fig. 2B), with data from three, four, and five heterozygotes respectively (Table 2). Monoallelic expression is consistent with genomic imprinting, an epigenetic phenomenon in which the expression of alleles is dependent on their parental origin, and generally results in the silencing of one allele (Wrzeska and Rejduch 2004 Assuming that exonic SNP alleles in mRNA isolated from the white blood cells of a single individual have been exposed to the same trans-acting factors, any expression variation seen between alleles using our approach must involve cis-acting factor(s), whether or not trans-factors are also involved. We propose that nonmonoallelically expressed genes that consistently express a particular allele at a higher level than the other are likely to be regulated primarily by cis-factors in strong linkage disequilibrium with the assayed exonic SNP. Of the 57 exonic SNPs that did not show monoallelic expression, 31 are in genes that were differentially expressed with the same allele favored in each of the expressing heterozygotes. These include genes such as C1orf38 and FLJ21069, which were differentially expressed in each of eight and six expressing heterozygotes, respectively (Table 2; Fig. 2B).
The number of exonic SNP alleles expected to have the same allele consistently expressed more highly by chance alone varies with the number of heterozygotes expressing the exonic SNP. For example, the chances of having five or more heterozygotes favoring the same allele are substantially lower than the chances of having only three heterozygotes favoring the same allele. The results show that the 31 exonic SNPs observed with this allele-specific expression pattern is much higher than the
Genes with allelic expression differences influenced by regulatory factors not in strong linkage disequilibrium with the assayed exonic SNP would be expected to have different alleles expressed at higher levels in different heterozygotes. An example of this is exonic SNP ss24515622 in the D21S2056E gene, which was expressed in five heterozygotes (Table 2; Fig. 2B). For this exonic SNP, all five heterozygotes met the 1.5 threshold for differential expression, with one allele favored in three of the heterozygotes and the other allele favored in the remaining two. A total of 26 of the 60 genes examined displayed similar inconsistent favoring of alleles and 12 of the ones displaying allele-specific expression are expected to do so by chance. Thus of the 60 examined genes, the observed allelic expression differences of 38 (63%) are likely to be influenced by factors not in strong linkage disequilibrium with the assayed exonic SNP.
Candidate genes for regulation by genomic imprinting
For FLJ33071, the maternally inherited allele of exonic SNP ss24480254 was predominantly expressed over the other allele in all heterozygous children in both pedigrees: the G SNP allele in pedigree 1344 and the A SNP allele in pedigree 1362. Additionally, there is a second exonic SNP (ss24480254) in FLJ33071 that was monoallelically expressed in two of the 12 unrelated white blood cell samples (Table 2). These data are consistent with the regulation of FLJ33071 by imprinting, with the expressed allele being inherited maternally. For the PRIM2A exonic SNP ss38338836, the maternally derived allele (A) was monoallelically expressed in all heterozygous children in both pedigrees. Thus, the gene is monoallelically expressed in both the 12 original white blood cell samples and the two CEPH pedigrees. Consistent with imprinting as the regulatory mechanism governing expression, the exonic SNP alleles in the PRIM2A gene are randomly favored: in the two CEPH pedigrees, the A SNP allele is expressed (Table 6), and in the white blood cell samples, the T SNP allele is expressed. For ZNF463, the maternally derived allele for exonic SNP ss24225694 was monoallelically expressed in all five heterozygous children in pedigree 1362. Pedigree 1344 had no heterozygous children and thus provided no information. In the 12 unrelated individuals, monoallelic expression of this SNP allele is randomly favored (Fig. 2B), which is consistent with imprinting. Two additional ZNF463 SNPs (ss23813114 and ss23813115), in the same exon as SNP ss24225694, also display monoallelic expression in heterozygous individuals (Table 2). However, unlike these three monoallelically expressed SNPs, which are all in the 3'-exon of the gene, three SNPs (ss24225691, ss24719563, and ss38338978) in the 5'-untranslated region of ZNF463 have biallelic expression in the white blood cell samples (Table 2). Determining the reason for this discrepancy would require further investigation. However, plausible explanations include the presence of alternative or multiple transcripts in the ZNF463 genomic interval that have not yet been identified and annotated. There are no previous reports of imprinting for FLJ33071, PRIM2A, and ZNF463. Although our data strongly suggest that the expression of these three genes is regulated by imprinting in white blood cells, it is important to note that definitive validation would require the observation of parental inheritance of allele expression in at least three generations in large families, with switching of expressed alleles in different generations, dependent on the parental origin.
We have analyzed the genetic basis of allele-specific expression differences in human white blood cells by comparing the relative levels of exonic SNP alleles within mRNA samples isolated from unrelated individuals. Of the 60 genes classified on the basis of their differential allelic expression patterns, approximately one-third are likely to be regulated predominantly by cis-elements in strong linkage disequilibrium with the assayed exonic SNP, and two-thirds are likely to have their regulation strongly influenced by elements not in linkage disequilibrium with the assayed exonic SNP. Our expression data suggesting that three out of the 60 genes are regulated by imprinting in human white blood cells is surprising, given that there are only 50 human genes with evidence of imprinting and parent-of-origin effects in the Imprinted Gene Catalogue (http://igc.otago.ac.nz/home.html), and it has generally been thought that the number of imprinted genes in mammals is low. Our results suggest that experiments using exonic SNPs for genotyping and expression analysis across multiple tissues at different developmental stages may result in the identification of many more genes regulated by genomic imprinting.
Exonic SNP selection and primer design From a genome-wide collection of human single nucleotide polymorphisms (SNPs) discovered in an independent study by Perlegen Sciences (Hinds et al. 2005
Calculation of
The
was used to determine genotypes in the DNA samples and differential allelic expression in the RNA samples, as discussed below.
Quality control filters for SNP assays The conformance for a particular allele was defined as the fraction of feature groups in which the perfect-match feature was brighter than the three corresponding mismatch features. For each SNP allele, there are 10 such feature groups, five for the forward strand and five for the reverse strand. Conformance was computed independently for the reference and alternate SNP allele feature sets, and the larger of the two values was used. SNP measurements having conformance <0.9 were discarded from further evaluations.
The signal S, the background B, and the signal-to-background ratio R were calculated from intensity measurements for both alleles in the following manner:
SNP measurements having R < 1.5 were discarded from further evaluations.
Determination of genotypes in DNA samples by clustering intensities
Determination of differential allelic expression using arrays
Thus, when the
cDNA value for a heterozygous SNP lay between A,DNA and H,DNA, the frequency of the reference allele transcript in the cDNA sample, pcDNA, was determined as:
p) in the reference allele frequency between the cDNA (pcDNA) and the DNA (0.5) for heterozygotes is:
10 or 0.1.
Only transcripts for which the exonic SNPs passed the quality thresholds for conformance and signal-to-background ratios in at least 75% of samples (nine of the 12 individuals) were included in the study. The requirement for expression in 75% of the samples was chosen arbitrarily, to ensure that we focused on SNPs expressed in a preponderance of samples. The standard error, SE, in the estimate of
> 2.5 or < 0.4. These thresholds in the SE and were used to exclude spurious signals, and their particular values were picked by examining the data for homozygous SNPs, in which it was found that measurements failing these criteria accounted for only 2.4% of the data, but included 8.2% of the cases where | p| > 0.2. Supplemental Table 2 provides the raw data for all SNPs that passed conformance and signal-to-background quality control filters, were expressed in at least nine samples, and were genotyped as heterozygous in at least one sample.
Estimation of false-positive and false-discovery rates
Rates of differential expression and false discovery were also estimated by comparing the distribution of |
Effect of cDNA concentration in PCR step on allelic expression data
Correlation of allelic expression data from multiple sample preparations
We thank Joe P. Karbowski, Patrick Chu, and Rhode S. Vergara for high-throughput PCR and array hybridization; Geoff B. Nilsen, Wade A. Barrett, and Michael Jen for designing the high-density arrays and for excellent assistance with data analysis; Andrew P. Kloek, David A. Hinds, Nila Patil, Karel Konvicka, and Katherine S. Pollard for helpful discussions; and Jerry Meek for assistance with creating figures. This publication was made possible by Grant Number 5 R44 HG002638-03 from NHGRI (to K.A.F.). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NHGRI.
[Supplemental material is available online at www.genome.org.] Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4559106.
1 These authors contributed equally to this work.
2 Corresponding author.
Bray, N.J., Buckland, P.R., Owen, M.J., and O'Donovan, M.C. 2003. Cis-acting variation in the expression of a high proportion of genes in human brain. Hum. Genet. 113: 149153.[Medline] Brem, R.B., Yvert, G., Clinton, R., and Kruglyak, L. 2002. Genetic dissection of transcriptional regulation in budding yeast. Science 296: 752755. Cheung, V.G., Conlin, L.K., Weber, T.M., Arcaro, M., Jen, K.Y., Morley, M., and Spielman, R.S. 2003. Natural variation in human gene expression assessed in lymphoblastoid cells. Nat. Genet. 33: 422425.[CrossRef][Medline] Cowles, C.R., Hirschhorn, J.N., Altshuler, D., and Lander, E.S. 2002. Detection of regulatory variation in mouse genes. Nat. Genet. 32: 432437.[CrossRef][Medline] Doss, S., Schadt, E.E., Drake, T.A., and Lusis, A.J. 2005. Cis-acting expression quantitative trait loci in mice. Genome Res. 15: 681691. Fondon III, J.W. and Garner, H.R. 2004. Molecular origins of rapid and continuous morphological evolution. Proc. Natl. Acad. Sci. 101: 1805818063. Germer, S., Holland, M.J., and Higuchi, R. 2000. High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res. 10: 258266. Hinds, D.A., Seymour, A.B., Durham, K., Banerjee, P., Ballinger, D.G., Milos, P.M., Cox, D.R., Thompson, J.F., and Frazer, K.A. 2004a. Application of pooled genotyping to scan candidate regions for association with HDL cholesterol levels. Hum. Genomics 1: 421434.[Medline] Hinds, D.A., Stokowski, R.P., Patil, N., Konvicka, K., Kershenobich, D., Cox, D.R., and Ballinger, D.G. 2004b. Matching strategies for genetic association studies in structured populations. Am. J. Hum. Genet. 74: 317325.[CrossRef][Medline] Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., and Cox, D.R. 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307: 10721079. Hubner, N., Wallace, C.A., Zimdahl, H., Petretto, E., Schulz, H., Maciver, F., Mueller, M., Hummel, O., Monti, J., Zidek, V., et al. 2005. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat. Genet. 37: 243253.[CrossRef][Medline] International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931945.[CrossRef][Medline] Kim, U.K., Jorgenson, E., Coon, H., Leppert, M., Risch, N., and Drayna, D. 2003. Positional cloning of the human quantitative trait locus underlying taste sensitivity to phenylthiocarbamide. Science 299: 12211225. Knight, J.C., Keating, B.J., and Kwiatkowski, D.P. 2004. Allele-specific repression of lymphotoxin- Koschinsky, M.L., Boffa, M.B., Nesheim, M.E., Zinman, B., Hanley, A.J., Harris, S.B., Cao, H., and Hegele, R.A. 2001. Association of a single nucleotide polymorphism in CPB2 encoding the thrombin-activable fibrinolysis inhibitor (TAF1) with blood pressure. Clin. Genet. 60: 345349.[CrossRef][Medline] Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H., and Lee, M.P. 2003. Allelic variation in gene expression is common in the human genome. Genome Res. 13: 18551862. Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S., and Cheung, V.G. 2004. Genetic analysis of genome-wide variation in human gene expression. Nature 430: 743747.[CrossRef][Medline] Oliver, F., Christians, J.K., Liu, X., Rhind, S., Verma, V., Davison, C., Brown, S.D., Denny, P., and Keightley, P.D. 2005. Regulatory variation at glypican-3 underlies a major growth QTL in mice. PLoS Biol. 3: e135.[CrossRef][Medline] Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H., et al. 2004. A survey of genetic and epigenetic variation affecting human gene expression. Physiol. Genomics 16: 184193. Prokunina, L., Castillejo-Lopez, C., Oberg, F., Gunnarsson, I., Berg, L., Magnusson, V., Brookes, A.J., Tentler, D., Kristjansdottir, H., Grondal, G., et al. 2002. A regulatory polymorphism in PDCD1 is associated with susceptibility to systemic lupus erythematosus in humans. Nat. Genet. 32: 666669.[CrossRef][Medline] Ronald, J., Akey, J.M., Whittle, J., Smith, E.N., Yvert, G., and Kruglyak, L. 2005. Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res. 15: 284291. Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G., et al. 2003. Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297302.[CrossRef][Medline] Tokuhiro, S., Yamada, R., Chang, X., Suzuki, A., Kochi, Y., Sawada, T., Suzuki, M., Nagasaki, M., Ohtsuki, M., Ono, M., et al. 2003. An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is associated with rheumatoid arthritis. Nat. Genet. 35: 341348.[CrossRef][Medline] Wilkins, J.F. 2005. Genomic imprinting and methylation: Epigenetic canalization and conflict. Trends Genet. 21: 356365.[CrossRef][Medline] Wrzeska, M. and Rejduch, B. 2004. Genomic imprinting in mammals. J. Appl. Genet. 45: 427433.[Medline] Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B., and Kinzler, K.W. 2002. Allelic variation in human gene expression. Science 297: 1143.
Received August 11, 2005; accepted in revised format December 9, 2005. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||