|
|
|
|
Published online before print
November 29, 2006, 10.1101/gr.5663007 Genome Res. 17:82-87, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Methods Localization of a long-range cis-regulatory element of IL13 by allelic transcript ratio mapping1 Department of Paediatrics, Oxford University, Oxford OX3 7BN, United Kingdom; 2 Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
It appears that, for many genes, the two alleles possessed by an individual may produce different amounts of transcript. When such allelic differences in transcription are observed for some individuals but not others, a plausible explanation is genetic variation in the cis-acting elements that regulate the gene in question. Here we describe a novel analytical approach that uses such observations, combined with genotyping data from the HapMap project, to define the genomic location of cis-acting regulatory elements. When applied to the human 5q31 chromosomal region, where complex regulatory mechanisms are known to exist, we demonstrate the sensitivity of this approach by locating a highly significant cis-regulatory element operating on IL13 at long range from a position 250 kb upstream from the gene (P = 2 x 106). As this method is unaffected by other sources of variation, such as environmental and trans-acting genetic factors, it provides a tractable approach for dissecting the complexities of genetic variation in gene regulation.
It is now possible to identify regions of the human genome that determine individual variation in gene expression, by combining the powerful techniques of genome-wide expression profiling, genome-wide SNP genotyping, and genetic association analysis (Cheung et al. 2005 Imagine a functional polymorphism (F) that affects the expression of gene X. The aim of genetic association analysis is to detect a significant difference in gene X expression between individuals of different F genotypes. However, if the expression of gene X is affected by multiple genetic or environmental factors, these are potential confounders that might obscure or distort the association of gene X expression with F genotypes.
The number of potential confounders can be reduced by using allelic transcript quantification to focus on cis-regulatory effects, where F acts only on the copy of gene X that lies on the same chromosome (Goldsborough and Kornberg 1994
We use as an example chromosome 5q31, a region that is rich in immune genes and has been implicated in several common diseases (Marquet et al. 1996
The above findings are typical of allelic transcript quantification studies that have been reported for a growing number of genes (Pastinen et al. 2004 We propose a simple method of allelic transcript ratio mapping. For any SNP marker M, with alleles M1 and M2, in the genomic region of T we can measure ATR in a group of individuals who are heterozygous for T, and determine pairwise haplotypes between M and T in each individual. In individuals who are heterozygous for M, this haplotypic information allows us to establish the phase of the ATR with respect to M, and to derive a marker-specific transcript ratio (MTR), that is, the relative abundance of transcripts derived from the M1 chromosome compared to the M2 chromosome (Fig. 2B). In individuals who are homozygous for M, the distribution of ATR values provides a measure of experimental variation that cannot be attributed to the allele-specific effects of M. We call this the nonspecific transcript ratio (NTR). We then derive some measure of the statistical significance of the MTR taking the NTR into accountwe call this the ATR mapping metric. The P-value of a t-test comparison of the MTR and NTR distributions is used here as a simple example of a mapping metric. By charting the ATR mapping metric for a set of SNP markers across the genomic region surrounding T, we can build up a picture of the location of F.
To test this method on the IL13 locus in a model system, we used immortalized lymphoblastoid B-cell lines derived from 90 individuals of European ancestry that form part of the HapMap project (Altshuler et al. 2005
We examined 300 SNPs across an
These nine SNPs fall within a genomic block of high linkage disequilibrium (Fig. 3B), and their minor alleles appear to be specific for a long-range haplotype spanning the genes APXL2 (currently known as SHROOM1), GDF1, QP-C (currently known as UQCRQ), LEAP2, and AFF4 (haplotype A4 in Fig. 3C). By analogy with MTR, we can use individuals who are heterozygous for a given haplotype to estimate the allelic expression ratio between that haplotype and all other haplotypes, which we call the haplotype-specific transcript ratio (HTR). Haplotype A4 has an average HTR of 1.28 (95% CI, 1.191.37) in non-activated cells and 1.26 (95% CI, 1.151.37) in activated cells. The t-test for average HTR in A4 haplotype carriers versus non-A4-haplotype carriers was significant in both resting cells (P = 2 x 105), and in activated cells (P = 8 x 104).
These IL13 data highlight a crucial feature of ATR mapping, namely, that this method does not require F to be in linkage disequilibrium with T. If F is in strong linkage disequilibrium with T, then ATR values observed in different individuals will tend to be consistently different from 1. If F is not in linkage disequilibrium with T, then ATR values observed in different individuals will range above and below 1, depending on the haplotypic relationship of F and T in the individual being tested, and there may be many ATR values close to 1 arising from individuals who are heterozygous for T but homozygous for F. This has interesting implications for large-scale efforts to use allelic transcript quantification to screen for cis-regulatory factors across the whole genome (Pastinen et al. 2004 Since ATR data are determined by cis-acting regulation and are potentially refractory to trans-acting effects (which will affect absolute expression but not the allelic transcript ratio), genetic diversity on a different chromosome should not influence the variation observed in ATR data for IL13. We therefore applied the ATR mapping metric for the IL13 data (chromosome 5) to 10,000 consecutive SNPs on chromosome 20 (available for the same cell lines from the HapMap resource), to gain further insight into the potential false discovery rate of the test. Because the ATR mapping metric relies on phasing of the transcript SNP with each marker SNP being tested, we "transferred" the IL13 transcript SNP to chromosome 20 "in silico" for this simulation, so that it could be phased with each SNP in the analysis. Of the 10,000 SNPs on chromosome 20, only eight independent loci reached significance at or above the level of that seen for the nine SNPs identified 250 kb upstream of IL13 (Fig. 4), giving a false discovery rate for the ATR mapping metric, at a designated significance level of P < 2 x 106, of 1 in 1250 SNPs.
To corroborate the finding of a potential long-range cis-regulator for IL13, we interrogated publicly available human cell line expression data at Gene Expression Omnibus, the NCBI gene expression and hybridization array data repository (Edgar et al. 2002
In addition to the 200300-kb upstream region, there may exist other cis-regulatory polymorphisms that determine IL13 expression. For example, the region 400500 kb downstream from IL13 shows several potentially interesting differences between MTR and NTR (Fig. 3A). When we focused on the proximal 100-kb region containing IL13 and its flanking genes RAD50 and IL4, single-locus analysis revealed no striking effects, but there were potentially interesting haplotypic effects (Fig. 3D). We analyzed common haplotypic groups across this region in 22 unrelated individuals, after removing the potential confounding effect of the distal haplotype A4 (Fig. 3C) by excluding four individuals known to be A4 heterozygotes. One haplotype (B6) (see Fig. 3D) emerged from this analysis as having a possible effect on gene expression that was much more apparent in activated cells (HTR = 0.78, 95% CI, 0.650.91) than in non-activated cells (HTR = 0.93, 95% CI, 0.761.10). The t-test for average HTR in B6 haplotype carriers versus non-B6-haplotype carriers was significant in activated cells (P = 9 x 103) but not in resting cells. The direction of HTR distortion in B6 haplotype carriers upon activation was consistent in all lines (paired t-test, P < 5 x 103).
We demonstrate a method of ATR mapping that is capable of detecting functional polymorphism acting on a gene, irrespective of whether it lies proximally to and in high LD with the gene, or operates at long range and is in low LD with the gene. A pattern of unidirectional ATR distortion in the majority of the individuals assayed would be consistent with the presence of cis-regulatory polymorphism in high LD with the gene, whereas bidirectional ATR distortion in a subgroup of the individuals assayed would be consistent with cis-regulatory polymorphism in low LD with the gene.
For IL13 we identify distal cis-regulatory polymorphism highlighted by nine SNPs all common to a single haplotype spanning the genes from APLX2 to AFF4. Complex mechanisms of gene regulation have been described in the 5q31 gene region. A three-dimensional chromatin configuration that approximates regulatory elements and target genes has been proposed at 5q31 (Ansel et al. 2003 Single-locus analysis revealed no effects in the region proximal to IL13 despite the presence of a positive haplotype effect. The ATR mapping metric proposed here has the greatest power to detect a cis-regulatory effect in a Marker SNP (M) where there are equal numbers of heterozygotes and homozygotes. Although as a consequence, this metric is less well powered for the detection of proximal cis-polymorphism existing in high LD with the transcript marker (T), it is likely that single-locus analysis would have been successful in identifying the proximal effect seen for IL13 with a greater depth of SNP ascertainment. In the context of complex gene regulation, it is possible that those cell lines with observed ATR distortion may carry one of a number of independent or interdependent cis-acting components. The two regulatory elements identified with these data appear to be independent effects that show different characteristics, with the distal haplotype affording a constitutive down-regulating effect and the proximal haplotype affording an inducible up-regulating effect. The context specificity of these findings will need to be addressed in primary T-cell experiments. If the complexity of cis-regulatory mechanisms is high, the observed ATR for an individual may be a composite reflection of multiple cis-acting elements. In this situation, the observed ATRs for a group of cell lines are likely to cover a broad spectrum of values, and the data may prove more difficult to resolve.
For all modes of quantifying cell line expression (Cheung et al. 2003
Nevertheless, the complexity of regulatory mechanisms that are assayed using the ATR approach is significantly less than that using traditional expression profiling as trans-acting regulation and environmental confounders will not influence ATR measurements. By focusing only on the cis-acting framework of gene regulation, ATR measurement and mapping (Knight et al. 2003
Cell lines Lymphoblastoid B-cell lines from the CEPH collection (Center dEtude de Polymorphisme Humaine) were sourced from the Coriell Repository. All 30 CEPH HapMap families and a further eight CEPH family trios were used in analysis. Unrelated individuals only were used in expression assays.
Cell culture and activation
cDNA preparation
Allele-specific transcript quantification For a single cDNA assay, we performed four PCR replicates. We performed four extension reaction replicates on each PCR product, therefore producing 16 nested technical replicates per assay. Each replicate was run using a 2-µL aliquot of cDNA (derived from 200 ng of total RNA). Genomic DNA in 16 equivalent nested technical replicates (4 µg per replicate) was assayed as a control for each assay, using the same allelotype chip and primer mix. RT negative cDNA controls for all assays were run in parallel. For a given cDNA assay, the allelic transcript ratio was calculated on each of the 16 technical replicates from the relative quantity of the two allele-specific cDNA species. The mean allelic transcript ratio for the whole assay was then calculated and normalized to the mean allelic transcript ratio for the genomic controls. An assay was accepted for further analysis if the standard error of the mean for the 16 technical replicates was <10%.
SNP and haplotype analysis
Statistical analysis To generate a corrected P-value for multiple testing, permutation analysis for SNPs of interest was performed by randomizing observed ATRs with individuals. For the SNP of interest, a two-tailed t-test with unequal variance for the comparison of groups bracketed as MTR and as NTR was performed for each of 10,000 permutations to generate a distribution of P-values. (The Log10s of all ATR values were used in the t-test.) The corrected P-value was assigned from the position of the observed P-value within this distribution of random P-values. All statistics were programmed in visual Basic (Microsoft Excel) or obtained using SSPS.
We thank Evelyn Harvey for her contribution to initial ATR experiments in the 5q31 region. This work was funded by a Wellcome Trust Clinical Research Training Fellowship (to J.T.F.) and a MRC Program Grant (to D.P.K.).
3 Present address: Kennedy Institute of Rheumatology, Imperial College, London, UK.
E-mail Julian.forton{at}paediatrics.ox.ac.uk; fax 44-1865-220479. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5663007
Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P. 2005. A haplotype map of the human genome. Nature 437: 12991320.[CrossRef][Medline] Ansel, K.M., Lee, D.U., and Rao, A. 2003. An epigenetic view of helper T cell differentiation. Nat. Immunol. 4: 616623.[CrossRef][Medline] Buckland, P.R. 2004. Allele-specific gene expression differences in humans. Hum. Mol. Genet. 13: R255R260. Cheung, V.G., Jen, K.Y., Weber, T., Morley, M., Devlin, J.L., Ewens, K.G., and Spielman, R.S. 2003. Genetics of quantitative variation in human gene expression. Cold Spring Harb. Symp. Quant. Biol. 68: 403407.[CrossRef][Medline] Cheung, V.G., Spielman, R.S., Ewens, K.G., Weber, T.M., Morley, M., and Burdick, J.T. 2005. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437: 13651369.[CrossRef][Medline] Edgar, R., Domrachev, M., and Lash, A.E. 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30: 207210. Elvidge, G.P., Price, T.S., Glenny, L., and Ragoussis, J. 2005. Development and evaluation of real competitive PCR for high-throughput quantitative applications. Anal. Biochem. 339: 231241.[CrossRef][Medline] Fields, P.E., Lee, G.R., Kim, S.T., Bartsevich, V.V., and Flavell, R.A. 2004. Th2-specific chromatin remodeling and enhancer activity in the Th2 cytokine locus control region. Immunity 21: 865876.[CrossRef][Medline] Forton, J.T. and Kwiatkowski, D. 2006. Searching for the regulators of gene expression. Bioessays 28: 15.[CrossRef][Medline] Goldsborough, A.S. and Kornberg, T.B. 1994. Allele-specific quantification of Drosophila engrailed and invected transcripts. Proc. Natl. Acad. Sci. 91: 1269612700. Hacking, D., Knight, J.C., Rockett, K., Brown, H., Frampton, J., Kwiatkowski, D.P., Hull, J., and Udalova, I.A. 2004. Increased in vivo transcription of an IL-8 haplotype associated with respiratory syncytial virus disease-susceptibility. Genes Immun. 5: 274282.[CrossRef][Medline] Knight, J.C., Keating, B.J., Rockett, K.A., and Kwiatkowski, D.P. 2003. In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading. Nat. Genet. 33: 469475.[CrossRef][Medline] Lettice, L.A., Horikoshi, T., Heaney, S.J., van Baren, M.J., van der Linde, H.C., Breedveld, G.J., Joosse, M., Akarsu, N., Oostra, B.A., and Endo, N., et al. 2002. Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc. Natl. Acad. Sci. 99: 75487553. Marquet, S., Abel, L., Hillaire, D., Dessein, H., Kalil, J., Feingold, J., Weissenbach, J., and Dessein, A.J. 1996. Genetic localization of a locus controlling the intensity of infection by Schistosoma mansoni on chromosome 5q31q33. Nat. Genet. 14: 181184.[CrossRef][Medline] Monks, S.A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J.W., Sachs, A., and Schadt, E.E. 2004. Genetic inheritance of gene expression in human cell lines. Am. J. Hum. Genet. 75: 10941105.[CrossRef][Medline] Pastinen, T. and Hudson, T.J. 2004. Cis-acting regulatory variation in the human genome. Science 306: 647650. Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., and Brandstrom, H., et al. 2004. A survey of genetic and epigenetic variation affecting human gene expression. Physiol. Genomics 16: 184193. Pastinen, T., Ge, B., Gurd, S., Gaudin, T., Dore, C., Lemire, M., Lepage, P., Harmsen, E., and Hudson, T.J. 2005. Mapping common regulatory variants to human haplotypes. Hum. Mol. Genet. 14: 39633971. Rihet, P., Traore, Y., Abel, L., Aucan, C., Traore-Leroux, T., and Fumoux, F. 1998. Malaria in humans: Plasmodium falciparum blood infection levels are linked to chromosome 5q31q33. Am. J. Hum. Genet. 63: 498505.[CrossRef][Medline] Rioux, J.D., Daly, M.J., Silverberg, M.S., Lindblad, K., Steinhart, H., Cohen, Z., Delmonte, T., Kocher, K., Miller, K., and Guschwan, S., et al. 2001. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat. Genet. 29: 223228.[CrossRef][Medline] Spilianakis, C.G. and Flavell, R.A. 2004. Long-range intrachromosomal interactions in the T helper type 2 cytokine locus. Nat. Immunol. 5: 10171027.[CrossRef][Medline] Stephens, M., Smith, N.J., and Donnelly, P. 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68: 978989.[CrossRef][Medline] Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., Lyle, R., Hunt, S., Kahl, B., Antonarakis, S.E., and Tavare, S., et al. 2005. Genome-wide associations of gene expression variation in humans. PLoS Genet. 1: e78.[CrossRef][Medline] West, A.G. and Fraser, P. 2005. Remote control of gene transcription. Hum. Mol. Genet. 14: R101R111.
Received June 19, 2006; accepted in revised format October 18, 2006.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||