|
|
|
|
Genome Res. 17:1210-1218, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 Methods Heritability of alternative splicing in the human genome1 Department of Human Genetics, McGill University, Montréal, Québec, H3A 1A4, Canada; 2 McGill University and Génome Québec Innovation Centre, Montréal, Québec, H3A 1A4, Canada; 3 Affymetrix Inc., Santa Clara, California 95051, USA; 4 Ontario Institute for Cancer Research, Toronto, Ontario M5G IL7, Canada
Alternative pre-mRNA splicing increases proteomic diversity and provides a potential mechanism underlying both phenotypic diversity and susceptibility to genetic disorders in human populations. To investigate the variation in splicing among humans on a genome-wide scale, we use a comprehensive exon-targeted microarray to examine alternative splicing in lymphoblastoid cell lines (LCLs) derived from the CEPH HapMap population. We show the identification of transcripts containing sequence verified exon skipping, intron retention, and cryptic splice site usage that are specific between individuals. A number of novel alternative splicing events with no previous annotations in either the RefSeq and EST databases were identified, indicating that we are able to discover de novo splicing events. Using family-based linkage analysis, we demonstrate Mendelian inheritance and segregation of specific splice isoforms with regulatory haplotypes for three genes: OAS1, CAST, and CRTAP. Allelic association was further used to identify individual SNPs or regulatory haplotype blocks linked to the alternative splicing event, taking advantage of the high-resolution genotype information from the CEPH HapMap population. In one candidate, we identified a regulatory polymorphism that disrupts a 5' splice site of an exon in the CAST gene, resulting in its exclusion in the mutant allele. This report illustrates that our approach can detect both annotated and novel alternatively spliced variants, and that such variation among individuals is heritable and genetically controlled.
The human genome is estimated to contain 20,000–25,000 genes, and recent studies suggest that 50%–75% of multi-exon genes undergo alternative splicing (AS), generating multiple mRNA isoforms and greatly increasing human proteomic diversity (Lander et al. 2001
Recent advances in microarray technology hold great promise for the genome-wide detection of AS events (Lee and Roy 2004
We show that (1) the Exon Array is able to detect AS at a level that is comparable in sensitivity as other microarray methods, and (2) we can identify quantitative and qualitative variations in splicing among individuals. Preliminary analysis estimates that up to 5% of all RefSeq exons are differentially spliced between individuals. Our approach for establishing a genetic basis for the variation in splicing uses lymphoblasts derived from individuals of the CEPH population (Cohen et al. 1993
Examination of splicing differences between two CEPH HapMap individuals We investigated differences in exon-level expression in lymphoblastoid cell lines (LCLs; three biological and five technical replicates, for a total of 15 replicates per individual) from two unrelated individuals from the CEPH HapMap population (GM12750 and GM12751). We defined the splicing index (SI) as the expression level of a given probe set (representing one exon) divided by the expression of the corresponding meta-probe set (representing the gene), to control for differences in gene expression levels between samples (Clark et al. 2002
The array contains sequences from two main sources: high confidence mRNAs from RefSeq and GenBank databases and ESTs from dbEST, and a lower confidence set of speculative gene structures predicted using software such as GENSCAN (Burge and Karlin 1997
One of the potential issues regarding the use of microarrays, particularly with respect to our study of looking at differences in splicing between individuals, is the effect of polymorphisms within the probes that potentially affect binding affinities. Single nucleotide polymorphisms (SNPs) are very common genetic variations and occur at a frequency of one in 1000 bp in the human genome (Sachidanandam et al. 2001
After summarizing probe set scores,
We applied some additional biological and statistical criteria to the data set (see Methods), reducing the number of candidate probe sets to 1028. From this list, we proceeded to test a random selection of probe sets ranging from the highest significance level to those near the FDR cutoff. A small subset of 20 candidates were subjected to validation by reverse transcriptase–polymerase chain reaction (RT-PCR) using a pair of primers in two distinct exons flanking a third exon containing the predicted probe set. The presence of alternative isoforms for nine transcripts was confirmed by RT-PCR (Table 1; Supplemental Fig. 1), which translates into a 45% validation rate. However, our study evaluates the ability of this microarray technology to identify alternative AS events de novo in genetically diverse populations. Restricting our candidates to those showing EST and cDNA evidence of AS in sequence databases reduces the number of cases from 20 to 12, thereby increasing our success rate to 60% (seven out of 12). This is similar to the observed rates in a genome wide junction array study (73/153 = 48%) (Johnson et al. 2003
Analysis of validated AS events Based on EST and RefSeq evidence, seven of the nine probe sets with confirmed AS are predicted to confer exon-skipping events, with the exception of the OAS1 and SFRS5 genes. Two OAS1 splice variants (RefSeq accession nos. NM_016816 and NM_002534) are predicted to encode isoforms with alternative 3' splice site (ss) usage of the last downstream coding exon. The probe set identified in the SFRS5 gene is located within an intron between exons 4 and 5 and represents an intron-retention event. In total, seven of the nine probe sets that were identified in this study show annotated evidence in EST and RefSeq databases of AS. Probe sets corresponding to exons from the PPFIA1 and SIDT1 genes show no previous evidence of AS, demonstrating that the array can detect novel splicing events. In three (CAST, PPFIA1, OAS1) of the top four validated splicing events with the highest degree of fold-change in SI between individuals, we observe a clear predominance of one isoform in one individual versus the alternate variant in the second individual. The majority of candidates with lesser fold changes show the presence of both splice variants in each of the individuals. From a biological perspective, the presence or absence of one of the two splice variants between individuals is more likely to have a functional consequence than are cases where two splice variants are expressed in all individuals with subtle differences in relative ratios. Loss of function from one variant without compensatory effects from expression of the alternative splice isoform may have drastic differences in downstream effects. However, until a complete validation of all candidate probe sets is performed, we cannot estimate how many of these "all-or-none" splicing events are present compared with the observation of both isoforms in each individual. In one of our candidate genes, sequence analysis of the RT-PCR products identified a variant using a cryptic splice site within the predicted exon. Two OAS1 transcripts show alternative 3' ss usage in the predicted last exon of the gene, resulting in differential stop codon usage and a longer 3' UTR in one transcript. In the future, sequence analysis of all validated probe sets will be necessary to accurately determine cryptic splice site usage, especially those in close proximity to the annotated splice site, which may be beyond the resolution of standard gel electrophoresis. The available EST and mRNA-based evidence of AS in most of our candidate genes provides support and validation for our array-based discovery of known alternatively spliced transcripts. More importantly, the identification of new PPFIA1 and SIDT1 splice variants provide confidence that we may be able to discover novel AS events and increase the catalog of the human transcriptome.
Association of splicing to cis-regulatory haplotypes
The association between alternatively spliced isoforms and genetic variation was examined further by testing our nine candidates on a larger panel of 60 unrelated HapMap CEU individuals. In many cases, both splice variants are expressed in different ratios in various individuals, but the RT-PCR approach that was used here was not sensitive enough to quantify the relative isoform levels and establish a statistical association with a regulatory haplotype. Other methods based on the use of fluorescent dyes such as TaqMan PCR (Gibson et al. 1996
The most interesting example of allelic association was identified in the CAST gene, which encodes for calpastatin, a calpain protease inhibitor. There are at least 11 known isoforms of calpastatin, all differing in their N-terminal regions (Fig. 4B) (Lee et al. 1992
We also examined the remaining eight AS events for both functional domains encoded within the respective exons and also for putative cis-acting SNPs that may control the splicing patterns. We did not identify any domains for any of the exons except a putative transmembrane domain within the HHAT exon. In most of the cases, the closest polymorphic SNPs between individuals GM12750 and GM12751 were all located either in the 5' or 3' flanking introns but at significant distances (>100 bp) from the splice site. We were able to identify SNPs either within or in close proximity (<100 bp) to the putative AS exon for the SIDT1 and OAS1 genes and within the retained SFRS5 intron. SNP rs2271494 is located 25 bp upstream of the SIDT1 exon and is found within the polypyrimidine tract. Mutations within this region may alter binding between the large subunit of the U2 small nuclear ribonucleoprotein particle (snRNP) auxiliary factor, U2AF, to this motif (Singh et al. 1995
Identifying AS events is important to understanding the diversity and complexity of the human genome, and we report on the use of a comprehensive exon-tiling array in our experimental design to discover such events between individuals. The same microarray design has also been recently used for a complete analysis of tissue-specific differences in splicing (Gardina et al. 2006
Exactly how much differential splicing is occurring between any two individuals is still unknown. We estimated that up to 2.5% of all RefSeq exons expressed in lymphoblasts may show differential expression between the two samples tested, after factoring in our current validation rate, although a more accurate determination on the amount of differential splicing events will require a proper ROC-type analysis. However, this study examines splicing in lymphoblasts, and this estimate may change depending on the tissue tested. Alternative splice variants of the same gene can be expressed in multiple cell types to exert different functional and regulatory effects, which may also be individual specific. Neuronal tissues are known to have high levels of splicing (Yeo et al. 2004
The large amount of genotyping information within identified populations from the HapMap project provides a tremendous resource for associating known SNPs or regions of linkage disequilibrium with genetic differences such as copy number variation, allelic imbalance, and AS, or phenotypic traits that may convey an increased risk of disease. Here, we have shown that this approach can be used to identify one or more SNPs associated with some of the splicing events identified. Further examination of the nature of the polymorphisms and their location relative to the spliced exon can give insight as to whether it is part of a larger cis-regulatory haplotype or in fact the causative SNP disrupting a splice site consensus sequence, an exonic splicing enhancer (ESE) or silencer (ESS), an intronic splicing enhancer (ISE) or silencer (ISS), or other splice regulatory motifs such as the branch point or the polypyrimidine tract. Assigning a definitive causative effect of the SNP will require further experimental validation in vitro, such as monitoring splicing activity in cells using splice reporter constructs (Mayeda and Krainer 1999
Although we identify a candidate exon from the CAST gene showing genetic association with expression level changes, we do not know how often this occurs in a human population on a genome-wide scale. One method of properly assessing how common inherited splicing occurs would be to perform a whole-genome association study with more individuals from the HapMap population, using the SI scores as a quantitative trait. This is very similar to recent whole-genome association studies that suggest that common genetic variation explains much of the gene expression differences among individuals (Stranger et al. 2005
The identification of SNPs within specific individuals in a population that affect splicing is an important issue to address when considering its relevance to possible resistance or susceptibility to disease states. An estimated 20%–30% of disease-causing mutations is believed to affect pre-mRNA splicing (Faustino and Cooper 2003
Cell line preparation RNA samples were obtained from 74 Epstein-Barr virus-transformed LCLs belonging to the CEPH (Center détude du polymorphisme humain) reference individuals from the state of Utah in the United States (CEU). For this study, we used DNA samples from 60 unrelated individuals that have been genotyped for approximately four million SNPs by the International HapMap Project (Altshuler et al. 2005
Affymetrix exon arrays For the initial study, three separate passages of two unrelated individuals, GM12750 and GM12751, from the CEPH 1444 pedigree were used, with five technical replicates of each growth, for a total of 15 arrays hybridized for each sample. Multiple replicates were used to assess the relative contributions of biological and technical noise to the observed exon and transcript levels. In particular, since this array uses probe cells with a feature size that is only one-quarter of previous expression array designs, we aimed to determine whether they showed greater technical variability or higher background noise and also to identify a minimum number of biological and technical replicates required for an acceptable signal-to-noise ratio. For the linkage studies of the CEPH 1444 pedigree, three passages for each of GM12739, GM12740, GM12750, and GM12751 were used along with single replicates for the remaining 10 individuals.
Analysis of array hybridization data PCA was performed on the SI scores from all chips using the Partek Genomics Suite software package (Partek) in order to attribute the variance averaged over all exons to sources of variability, and to determine a confidence level in the consistency of expression profiles from biological and technical replicates. Comparison of expression data from individuals GM12750 and GM12751 identified outliers for three replicates of GM12750 (Fig. 2) that were excluded from all subsequent analyses.
To analyze splicing differences between the two samples for each probe set, an unpaired Students t-test was performed using the log-transformed SI values for all remaining replicates (12 of GM12750 and 15 of GM12751) of each individual (R statistical package, version 2.3.0). Probe sets showing significantly different SI scores were ranked by P-value. Linkage analysis tests of SI scores cosegregating with chromosomal regions for the CEPH 1444 family was carried out using MERLIN (version 1.0.1) with default settings (Abecasis et al. 2002
Differentially spliced probe sets were filtered using a number of criteria including: (1) detectable level above background (DABG < 0.05) for both the probe set and the meta-probe set to which it belongs; (2) normalized meta-probe set scores with a minimum intensity score of 50; (3) the transcript defined by a minimum of three exons; and (4) size of the exon corresponding to the probe set is divisible by three. This last criterion was added to ensure that changes resulting from exon inclusion/exclusion would be in frame, which has been observed in a high percentage of conserved and species-specific alternative exons (Magen and Ast 2005
RT-PCR and sequence analysis
We thank Eef Harmsen for helpful discussions. This work is supported by Genome Canada and Genome Québec. T.J.H. is the recipient of a Clinician-Scientist Award in Translational Research by the Burroughs Wellcome Fund and an Investigator Award from CIHR. J.M. is a Canada Research Chair holder.
5 Corresponding author.
E-mail jacek.majewski{at}mcgill.ca; fax (514) 398-1790. [The microarray data from this study have been submitted to GEO under accession no. GES7952.] Article is online at http://www.genome.org/cgi/doi/10.1101/gr.6281007
Abecasis, G.R., Cherny, S.S., Cookson, W.O., and Cardon, L.R. 2002. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30: 97–101.[CrossRef][Medline] Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., Donnelly, P., and Consortium, I.H. 2005. A haplotype map of the human genome. Nature 437: 1299–1320.[CrossRef][Medline] Ben-Ari, S., Toiber, D., Sas, A.S., Soreq, H., and Ben-Shaul, Y. 2006. Modulated splicing-associated gene expression in P19 cells expressing distinct acetylcholinesterase splice variants. J. Neurochem. 97 (Suppl 1): 24–34. Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57: 289–300. Black, D.L. and Graveley, B.R. 2006. Splicing bioinformatics to biology. Genome Biol. 7: 317.[CrossRef][Medline] Bonnevie-Nielsen, V., Field, L.L., Lu, S., Zheng, D.J., Li, M., Martensen, P.M., Nielsen, T.B., Beck-Nielsen, H., Lau, Y.L., and Pociot, F. 2005. Variation in antiviral 2',5'-oligoadenylate synthetase (2'5'AS) enzyme activity is controlled by a single-nucleotide polymorphism at a splice-acceptor site in the OAS1 gene. Am. J. Hum. Genet. 76: 623–633.[CrossRef][Medline] Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94.[CrossRef][Medline] Churchill, G.A. and Doerge, R.W. 1994. Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971.[Abstract] Clark, T.A., Sugnet, C.W., and Ares, M. 2002. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296: 907–910. Clark, T.A., Schweitzer, A.C., Chen, T.X., Staples, M.K., Lu, G., Wang, H., Williams, A., and Blume, J.E. 2007. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol. 8: R64.[CrossRef][Medline] Cohen, D., Chumakov, I., and Weissenbach, J. 1993. A first-generation physical map of the human genome. Nature 366: 698–701.[CrossRef][Medline] Faustino, N.A. and Cooper, T.A. 2003. Pre-mRNA splicing and human disease. Genes & Dev. 17: 419–437. Field, L.L., Bonnevie-Nielsen, V., Pociot, F., Lu, S., Nielsen, T.B., and Beck-Nielsen, H. 2005. OAS1 splice site polymorphism controlling antiviral enzyme activity influences susceptibility to type 1 diabetes. Diabetes 54: 1588–1591. Frey, B.J., Mohammad, N., Morris, Q.D., Zhang, W., Robinson, M.D., Mnaimneh, S., Chang, R., Pan, Q., Sat, E., Rossant, J., et al. 2005. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs. Nat. Genet. 37: 991–996.[CrossRef][Medline] Gardina, P.J., Clark, T.A., Shimada, B., Staples, M.K., Yang, Q., Veitch, J., Schweitzer, A., Awad, T., Sugnet, C., Dee, S., et al. 2006. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array. BMC Genomics 7: 325.[CrossRef][Medline] Gibson, U.E., Heid, C.A., and Williams, P.M. 1996. A novel method for real time quantitative RT-PCR. Genome Res. 6: 995–1001. Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., and Shoemaker, D.D. 2003. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302: 2141–2144. Korf, I., Flicek, P., Duan, D., and Brent, M.R. 2001. Integrating genomic homology into gene structure prediction. Bioinformatics 17: S140–S148.[Abstract] Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860–921.[CrossRef][Medline] Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S.F., and Lee, C. 2004. Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data. Nucleic Acids Res. 32: e180. doi: 10.1093/nar/gnh173. Lee, C. and Roy, M. 2004. Analysis of alternative splicing with microarrays: successes and challenges. Genome Biol. 5: 231. doi: 10.1186/gb-2004-5-7-231.[CrossRef][Medline] Lee, W.J., Ma, H., Takano, E., Yang, H.Q., Hatanaka, M., and Maki, M. 1992. Molecular diversity in amino-terminal domains of human calpastatin by exon skipping. J. Biol. Chem. 267: 8437–8442. Magen, A. and Ast, G. 2005. The importance of being divisible by three in alternative splicing. Nucleic Acids Res. 33: 5574–5582. Mayeda, A. and Krainer, A.R. 1999. Preparation of HeLa cell nuclear and cytosolic S100 extracts for in vitro splicing. Methods Mol. Biol. 118: 309–314.[Medline] Modrek, B., Resch, A., Grasso, C., and Lee, C. 2001. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29: 2850–2859. Nissim-Rafinia, M. and Kerem, B. 2005. The splicing machinery is a genetic modifier of disease severity. Trends Genet. 21: 480–483.[CrossRef][Medline] Rozen, S. and Skaletsky, H. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132: 365–386.[Medline] Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., et al. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933.[CrossRef][Medline] Siepel, A. and Haussler, D. 2004. Computational identification of evolutionarily conserved exons. In Proceedings of the eighth annual international conference on Resaerch in computational molecular biology. ACM Press, San Diego, CA. Singh, R., Valcarcel, J., and Green, M.R. 1995. Distinct binding specificities and functions of higher eukaryotic polypyrimidine tract-binding proteins. Science 268: 1173–1176. Srinivasan, K., Shiue, L., Hayes, J.D., Centers, R., Fitzwater, S., Loewen, R., Edmondson, L.R., Bryant, J., Smith, M., Rommelfanger, C., et al. 2005. Detection and measurement of alternative splicing using splicing-sensitive microarrays. Methods 37: 345–359.[CrossRef][Medline] Storey, J.D. and Tibshirani, R. 2003. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100: 9440–9445. Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., Lyle, R., Hunt, S., Kahl, B., Antonarakis, S.E., Tavare, S., et al. 2005. Genome-wide associations of gene expression variation in humans. PLoS Genet. 1: e78. doi: 10.1371/journal.pgen.0010078.[CrossRef][Medline] Stranger, B.E., Forrest, M.S., Dunning, M., Ingle, C.E., Beazley, C., Thorne, N., Redon, R., Bird, C.P., de Grassi, A., Lee, C., et al. 2007. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315: 848–853. Sugnet, C.W., Srinivasan, K., Clark, T.A., OBrien, G., Cline, M.S., Wang, H., Williams, A., Kulp, D., Blume, J.E., Haussler, D., et al. 2006. Unusual intron conservation near tissue-regulated exons found by splicing microarrays. PLoS Comput. Biol. 2: e4. doi: 10.1371/journal. pcbi.0020004.[CrossRef][Medline] Takano, E., Nosaka, T., Lee, W.J., Nakamura, K., Takahashi, T., Funaki, M., Okada, H., Hatanaka, M., and Maki, M. 1993. Molecular diversity of calpastatin in human erythroid cells. Arch. Biochem. Biophys. 303: 349–354.[CrossRef][Medline] Thomas, D.C., Haile, R.W., and Duggan, D. 2005. Recent developments in genomewide association scans: A workshop summary and review. Am. J. Hum. Genet. 77: 337–345.[CrossRef][Medline] Ule, J., Ule, A., Spencer, J., Williams, A., Hu, J.S., Cline, M., Wang, H., Clark, T., Fraser, C., Ruggiu, M., et al. 2005. Nova regulates brain-specific splicing to shape the synapse. Nat. Genet. 37: 844–852.[CrossRef][Medline] Valverde, D., Riveiro-Alvarez, R., Bernal, S., Jaakson, K., Baiget, M., Navarro, R., and Ayuso, C. 2006. Microarray-based mutation analysis of the ABCA4 gene in Spanish patients with Stargardt disease: Evidence of a prevalent mutated allele. Mol. Vis. 12: 902–908.[Medline] Yeo, G., Holste, D., Kreiman, G., and Burge, C.B. 2004. Variation in alternative splicing across human tissues. Genome Biol. 5: R74. doi: 10.1186/gb-2004-5-10-r74.[CrossRef][Medline] Zhang, C., Li, H.R., Fan, J.B., Wang-Rodriguez, J., Downs, T., Fu, X.D., and Zhang, M.Q. 2006. Profiling alternatively spliced mRNA isoforms for prostate cancer classification. BMC Bioinformatics 7: 202. doi: 10.1186/1471-2105-7-202.[CrossRef][Medline]
Received January 15, 2007; accepted in revised format May 31, 2007. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||