|
|
|
|
Published online before print
May 24, 2007, 10.1101/gr.6017807 Genome Res. 17:1023-1033, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicingInstitute of Molecular Biology, University of Oregon, Eugene, Oregon 97403, USA
Orthologous mammalian introns contain many highly conserved sequences. Of these sequences, many are likely to represent protein binding sites that are under strong positive selection. In order to identify conserved protein binding sites that are important for splicing, we analyzed the composition of intronic sequences that are conserved between human and six eutherian mammals. We focused on all completely conserved sequences of seven or more nucleotides located in the regions adjacent to splice-junctions. We found that these conserved intronic sequences are enriched in specific motifs, and that many of these motifs are statistically associated with either alternative or constitutive splicing. In validation of our methods, we identified several motifs that are known to play important roles in alternative splicing. In addition, we identified several novel motifs containing GCT that are abundant and are associated with alternative splicing. Furthermore, we demonstrate that, for some of these motifs, conservation is a strong indicator of potential functionality since conserved instances are associated with alternative splicing while nonconserved instances are not. A surprising outcome of this analysis was the identification of a large number of AT-rich motifs that are strongly associated with constitutive splicing. Many of these appear to be novel and may represent conserved intronic splicing enhancers (ISEs). Together these data show that conservation provides important insights into the identification and possible roles of cis-acting intronic sequences important for alternative and constitutive splicing.
The majority of mammalian mRNAs are interrupted by multiple noncoding intronic sequences that must be removed before translation. A large ribonucleoprotein complex known as the spliceosome carries out the recognition and removal of introns (for reviews, see Burge et al. 1999
In addition to experimental approaches, several computational approaches for identifying exonic and intronic cis-elements have been described (for review, see Zhang et al. 2005
The increasing wealth of genomic sequence information has made possible the development of computational methods that rely upon comparative genomic methods. The rationale for comparative approaches is founded in the well-established observation that informationally important sequences tend to be evolutionarily conserved. Several studies using conservation of sequence as a criterion for identifying potential intronic cis-splicing elements have been published (Yeo et al. 2005
Although alternative splice-junctions often display higher levels of conservation than constitutively spliced junctions (Yeo et al. 2005
We present the results of a comprehensive characterization of the composition of sequences that are conserved among orthologous mammalian introns. Our ultimate goal was to characterize conserved cis-splicing elements. Since these are likely to represent protein binding sites and since many RNA binding proteins recognize short (610 nt) patterns, we included all conserved sequences (CSs) that are at least 7 nt in length. It should be noted that this analysis therefore differs from studies of much longer conserved nongenic sequences (known as CNGs) (for reviews, see Dermitzakis et al. 2005 This analysis produced several interesting observations regarding conserved intronic sequences and splicing. (1) We show that CSs are enriched in specific motifs. This demonstrates that many CSs are the result of positive selective pressures on a limited set of putative cis-acting sequences and are not simply due to chance conservation. (2) Many conserved intronic motifs are associated with either alternative or constitutive splicing. This suggests that these motifs play important roles in splicing. (3) As validation for our methods, we found that several of the motifs associated with alternative splicing resemble motifs previously demonstrated to play important roles in regulated splicing. (4) We identified several novel motifs containing GCT that are associated with alternative splicing. (5) We identified a large number of conserved motifs that are associated with constitutive splicing, most of which have not been previously computationally identified. (6) Lastly, we demonstrate that conservation is an important indicator of functionality by showing that conserved instances of some n-mers are highly associated with alternative splicing, but nonconserved instances are not.
Extraction of conserved intronic sequences In order to identify conserved motifs representing putative trans-factor binding sites, we wanted to extract all intronic sequences conserved between human and several closely related mammals. We created a database of U2-dependent intronic sequences based upon the RefSeq annotation (Pruitt et al. 2005 100 to >400,000 bp), and the signals that govern splicing of shorter (<200 bases) introns may differ from those governing splicing of longer introns (Fox-Walsh et al. 2005
We defined a CS to be a contiguous run of at least 7 nt of identity in a multiple sequence alignment between human and six eutherian mammals: chimp, rhesus monkey, mouse, rat, dog, and cow (see Methods). We chose 7 nt as the lower length cutoff since many RNA binding proteins have binding site sizes of 610 nt. In order to emphasize the most significant portion of the cis signals and to reduce noise, we chose not to allow any sequence mismatches. For sequence and motif analysis, we extracted and categorized CSs from the donor intronic (DI) and acceptor intronic (AI) regions (see Fig. 2). Details concerning the identification and extraction of CSs are presented in the Methods. Many thousands of introns containing CSs were identified. Specifically 16,548 introns (11%) contained CSs in the donor side, and 20,342 introns (14%) contained CSs in the acceptor side. Since CSs were extracted from a multiple-sequence alignment, which requires that the relative positions of a sequence be somewhat conserved, it is possible that these numbers underestimate the number of functionally relevant conserved intronic sequences. Nevertheless, we believe this approach produces a conservative sampling that will allow us to identify functionally relevant sequences. The lengths of CSs varied from the minimum of 7 bases to the full-length of 100 bases. The actual distributions are shown in Figure 3, A and B.
Conserved intronic sequences are found adjacent to both constitutive and alternatively spliced junctions In order to explore the relationships between alternative splicing and CSs, splice-junctions containing a CS were cross-referenced against the alternative splice events annotated in the UCSC ExonWalk database (http://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/). A splice-junction was annotated as alternative if it was involved in either a skipped-exon or alternative adjacent splice event. Using these data, we found that 3% of introns lacking any CS were annotated as being alternative. In contrast, we found that 8% of introns containing a CS were involved in an alternative event. Since the lengths of CSs varied greatly, we wanted to establish the relationship between CS length and degree of alternative splicing. This analysis (Fig. 3C) revealed that the degree of alternative splicing increases with the total length of CS found within the intron. In particular, we observed that intron flanks containing between 7 and 25 nt of CS are twice more likely to be involved in alternative events than were introns without CSs, and flanking regions containing >50 nt of CS are eight times more likely to be involved in alternative events. These observations are consistent with earlier studies showing that intronic regions flanking alternatively spliced junctions tend to be highly conserved (Sorek and Ast 2003
To explore the distribution of CSs between alternatively or constitutively spliced junctions, we determined the percentage of either category that contains one or more CSs (Fig. 3D). Consistent with previous studies and with our analysis above, we found that a higher proportion of alternatively spliced junctions contain a CS than constitutively spliced junctions. We did not explore the more complex associations between CSs across exons so we do not know the percentage of exons that have CSs in just one or in both intronic flanks. However, these data demonstrate that, although CSs are enriched in alternatively spliced junctions, the majority of human alternatively spliced junctions do not contain a CS in the immediate vicinity of the alternative junction. This observation is consistent with studies suggesting that the majority of human alternatively spliced events are not conserved within mammals (Sorek et al. 2006 Though intronic CSs are enriched in introns flanking alternatively spliced junctions, the great majority are located within junctions that are constitutively spliced (Fig. 3D). Cis-splicing elements that are involved in constitutive splicing have generally received less attention than those involved in alternative splicing. In order to identify putative cis-splicing elements that may play important roles in alternative and/or constitutive splicing, we wanted to identify n-mers that are enriched in CS sequences.
Conserved intronic sequences are enriched in specific n-mers
We counted n-mers in the conserved donor (DI-CS) and conserved acceptor (AI-CS) samples using a sliding-window with an overlapping word count. In order to determine the n-mers that are enriched within the CS sequences, we had to establish the background probability for random occurrence for each n-mer within the region. This is complicated by the fact that the sequence composition of introns is nonhomogeneous as one moves away from the splice-junction; therefore, the probability of occurrence for an n-mer may vary at each position within the region. To account for this, we implemented a random sampling strategy that incorporates position as a factor. For each CS identified, we also extracted 100 additional analogous (e.g., having the same splice-junction relative starting position and the same length) sequences as the CS from introns randomly chosen from the original data set of all human introns. These sequences made up the random sequence (RS) pool. Background frequencies were calculated using the entire RS sample. Enrichment was determined using a confidence interval for the binomial distribution (Agresti and Coull 1998
Figure 4, A and C, demonstrates scatter-plots for the counts of all n-mers found in the CS samples relative to the counts obtained from the RS samples. n-Mers that are significantly enriched in the CS samples (
Visual inspection of the CS-enriched n-mers revealed that many contain common substrings. This would be expected if CSs were enriched in specific motifs since when using a sliding-window enumeration, a single conserved motif would spawn many related n-mers containing portions of the motif in different frames. Examples of the distributions of two n-mers containing substrings found by visual inspection to be common to CS-enriched n-mers are shown in Figure 4, B and D. Shown are the distributions for n-mers containing the substring GCATG (as found in the DI sample) and the substring TTCTG (as found in the AI sample). In both cases, it is clear that these substrings confer a distributional bias to n-mers containing these substrings. It is interesting to note that the substring GCATG is identical to the binding site for the Fox family of splicing factors that are known to play important roles in alternative splicing (discussed in greater detail below). The TTCTG substring does not exactly match any described binding sites and is discussed in greater detail below.
Graph based clustering of similar n-mers and construction of CS motifs
We refer to the CS motifs as CSMs. An example of two clusters and corresponding CSMs that were obtained from the DI region are shown in Figure 5. The CSMs in this example closely resemble binding sites for two known splicing factors, Fox-1/Fox-2 and QKI (both discussed in greater detail below). Since each of these proteins have well-characterized binding sites and are known splicing factors, the fact that we identified CSMs matching these sites helps validate our methods. In addition, it is especially interesting to note that the GCS for the putative QKI motif centers over the high affinity portion of the QKI site identified biochemically (Galarneau and Richard 2005
Table 1 details the clustering results for the CS and pseudo-CS n-mers. The DI-CS sample yielded 63 clusters, while the AI-CS sample yielded 85 clusters (available in Supplemental Tables 3, 4). Meanwhile only three clusters were obtained from the DI-pseudo sample, and only one was obtained from the AI-pseudo sample. In both cases the percentage of n-mers that clustered was significantly higher in the CS-derived samples compared with the pseudo samples. This demonstrates that GCCS clustering successfully filters out n-mers that, despite showing enrichment, are likely to be due to chance.
Many CSMs are statistically associated with constitutive or alternative splicing events Using the same database of alternatively spliced junctions that we used above, we counted the occurrences, for all n-mers (47 nt), within CSs and categorized them according to their being located adjacent to an alternatively spliced or constitutively spliced junction. The G-test was used to determine significant associations (see Supplemental Materials and Methods). The probabilities for the association were transformed to a value that we refer to as a TA-score (see Supplemental Materials and Methods). A positive TA-score indicates an association with alternative splicing, and a negative TA-score indicates an association with constitutive splicing. Since each CSM is composed of several n-mers, the association for the CSM was determined by comparing the means (using Students t-test) of the TA-scores for the CSM versus the mean for all n-mers. An example of the distribution of TA-scores for several CSMs is shown in Figure 6 (for the complete analysis, see Supplemental Figs. 1, 2). This analysis revealed that some motifs were significantly associated with alternative splicing (Pt-test < 0.01) and some with constitutive splicing, and some showed no significant association either way. It is important to point out that the clustering procedure we used does not incorporate any knowledge regarding alternative splicing; yet many of the motifs are clearly enriched in n-mers that show similar biases, which demonstrates that the common substrings are responsible for the observed bias. After removing redundant examples of compositionally similar motifs, we found that five DI-CSMs and five AI-CSMs are significantly associated with alternative splicing, while 18 DI-CSMs and 18 AI-CSMs are significantly associated with constitutive splicing. The CSMs showing a statistically significant association with alternative or constitutive splicing are shown in Figure 7. We also determined the number of splice-junction flanks that contain at least one instance of a conserved n-mer matching each of these CSMs. We found that most CSMs are found in hundreds to more than 1000 individual introns (Supplemental Fig. 3). The combined observation that DI-CSs and AI-CSs are enriched in specific n-mers and that many of these n-mers are statistically associated with alternative or constitutive splicing strongly suggests that they represent motifs that are under positive selective pressures because they play important roles in splicing.
There are many more CSMs that are not as significantly associated with either alternative or constitutive splicing but display a bias toward either category, and there are others that show no bias at all (see Supplemental Figs. 1, 2). Some of these may represent motifs that are important for splicing but are utilized in a context independent of regulated or constitutive splicing. It is also possible that some of these represent motifs that are under selective pressures but play roles in other processes such as mRNA trafficking, maturation, degradation, or poly-adenylation. For further characterizations, we chose to focus only on those motifs with the strongest biases.
Identification of motifs known to be associated with alternative splicing
Meanwhile, the motifs DI-2 and AI-2 are both matches to the Quaking protein (QKI) binding site CTAAC (Wu et al. 2002
The acceptor side motifs, AI-3 (TTCTG) and AI-5 (TGTT), are abundant (Supplemental Fig. 3) and may represent conserved targets for members of the CELF/BRUNO-like family (for review, see Barreau et al. 2006
A previous computational analysis of alternative events conserved between mouse and human identified 45 base n-mers enriched in intronic sequences flanking skipped-exons (Yeo et al. 2005
Conserved cryptic splice-junctions are associated with alternative splicing
Identification of putative novel GCT motifs associated with alternative splicing
These three motifs all contain a core substring of GCT. The most likely candidate trans-acting factors for these motifs are members of the MBNL family of RNA binding proteins (Pascual et al. 2006
The donor side motif DI-4 (GCTTG) (Fig. 7) is similar to a motif, TGYTTTC, enriched in introns flanking included alternative exons in brain (Sugnet et al. 2006
Conserved motifs associated with constitutive splicing are abundant and AT-rich
Several families of RNA binding proteins are known to bind sequences similar to motifs in this group. Two well-characterized proteins, TIA1 and TIAL1 (also known as TIAR), are known to bind T-rich sequences and play important roles in splicing (Dember et al. 1996
The RNA binding protein Sam68 has been shown to be involved in splicing and has been shown to preferentially bind to the sequence TAAA (Lin et al. 1997
Conservation of certain CSMs is associated with functionality It is likely that conservation would increase the likelihood that a particular motif is functionally relevant. If this were true, we would expect to see a stronger correlation between alternative splicing, for instance, and conserved instances of an n-mer versus instances of the n-mer that are not conserved. In order to test this hypothesis, we examined the relationship between alternative splicing for several pentamers matching CSMs associated with either alternative splicing or with constitutive splicing. We also included one pentamer that was not enriched in CSs and was not associated strongly with either alternative or constitutive splicing. We found that the pentamers that are highly enriched in CSs and are associated with alternative splicing are much more likely to be associated with an alternative event (Fig. 8, pentamers 14). Meanwhile, the same pentamers are no more likely to be associated with alternative splicing when they occurred in non-CSs than background levels. Importantly, it should be noted that this observation is not simply due to there being an enrichment of these pentamers in CSs. In fact, the great majority of these pentamers occurred in non-CSs (e.g., the pentamer GCATG occurred 330 times in CSs but occurred 12,449 times in non-CSs). In contrast and in agreement with their TA-scores, two pentamers that are enriched in CSs but are significantly associated with constitutive splicing (Fig. 8, pentamers 5 and 6) are much less likely to be associated with alternative splicing when they occurred in either CS or non-CS sequences (in either context the association is less than average). Interestingly, these pentamers still have a higher association with alternative splicing when they occurred in CSs than in non-CSs. This suggests that although they are generally associated with constitutive splicing, they may play important roles in alternative splicing in some introns. Last, a pentamer that showed no enrichment in CSs nor association with alternative splicing is no more likely to be associated with alternative splicing than predicted by chance in either context (Fig. 8, pentamer 7).
These results demonstrate that conservation of an n-mer near an alternatively spliced junction is likely to be an important predictor of functionality, and also suggests that the mere presence of a sequence matching a particular binding site doesnt indicate that the sequence represents a functional site. A likely explanation for this phenomenon is that the local context (i.e., surrounding sequences) of conserved instances is different from the nonconserved instances, and implies that additional cis- and trans-elements are required for functionality. A more comprehensive analysis of these relationships merits future attention.
Evolutionary conservation is a well-established metric for distinguishing signals from noise in genomic sequences (Cooper and Sidow 2003 In order to identify such motifs, we carried out an analysis of conserved intronic sequences flanking the splice-junctions. Comparative analysis of mammalian genomes has revealed that many mammalian introns contain stretches of CS. These islands of conservation can be readily visualized by comparing aligned orthologous sequences (see Fig. 1). Prior to this analysis, it was unclear whether or not the many, typically short, CSs simply represent noninformative regions that have not been subject to mutational divergence since the last common ancestor. If this were true, we would expect the sequence composition of CSs to be equivalent to the composition of introns in general. However, as we have shown, we found this not to be the case. Instead, the population of conserved intronic sequences is clearly enriched in specific n-mers. Using a novel graph-clustering algorithm, we show that these n-mers can be clustered into distinct sequence motifs (CSMs). Furthermore, we showed that many of the CSMs show a marked association with either alternative or constitutive splicing. This linkage between splice-type and conservation supports the notion that the selective pressures responsible for conservation of many of the CSMs is likely to be related to splicing.
A variety of auxiliary splicing factors have been identified; however, for the majority of these proteins, the optimal binding sites have either not been well characterized or the observed binding sites are not discrete enough to be distinguishable by sequence composition alone. Thus, we can only speculate about which splicing factors are likely to be the binding partners for many of the mammalian CSMs. Future experimental studies will be required to identify trans-factors for many of the CSMs identified in this analysis. The splicing factors Fox-1/Fox-2 and QKI have well-characterized and distinctive binding sites, and their connections with alternative splicing have been well documented. Motifs matching the binding sites for these proteins were found to be both highly enriched in CSMs and were highly associated with alternative splicing. Interestingly, a comparative analysis to define n-mers that are enriched in conserved alternatively spliced introns in the nematodes C. elegans and C. briggsae also revealed these same motifs (Kabat et al. 2006
Our analysis revealed several GCT-containing motifs that are associated with alternative splicing. To our knowledge these motifs have not been previously predicted using computational methods. These motifs are as abundant as the Fox and QKI motifs, suggesting that they play important roles in alternative splicing of many exons. We are not aware of any known splicing factors that are obvious candidates for binding these motifs. However, these motifs are a close match to the proposed model of the MBNL binding site (Ho et al. 2004
An interesting outcome of this analysis was the large number of previously unrecognized conserved motifs that are strongly associated with constitutive splicing. These motifs are largely A and T rich. Among these motifs are sequences that resemble Sam68, TIA1/TIAL1, and Hu protein binding sites. Whether or not these motifs represent conserved binding sites for any of these proteins remains to be determined. These proteins have been typically studied in the context of alternative splicing. Considering that TIA1/TIAL1 have been shown to promote splicing via interaction with U1 snRNP (Forch et al. 2000
Although we identified several motifs similar to known splicing factor binding sites, several well-known splicing factor sites were not found. Notably absent, for instance, are CSMs matching Nova protein binding sites. Nova has been shown to be involved in alternative splicing and appears to bind clusters of YCAY motifs (Jensen et al. 2000a Lastly, we demonstrated that there is a strong association between conservation of specific n-mers and apparent functionality since conserved occurrences of these n-mers are statistically associated with alternative splicing while nonconserved occurrences are not. This strongly suggests that there is a fundamental difference between random occurrences of n-mers and functional occurrences. The most likely explanation for this phenomenon is that higher order associations exist between functionally relevant instances of a potential binding site and the local context (e.g., other cis-elements or RNA secondary structure). Future analysis to elucidate such associations may be important for uncovering the higher order language of splice-site definition.
Extraction of conserved and RS populations A database of human introns was constructed using sequences obtained from the May 2004 GenBank release build 35 and gene predictions from the NCBI RefSeq project (http://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/). Predicted introns that did not begin with GY and end with AG were discarded. We should note that this would not exclude the relatively small population of GT-AG U12-dependent introns (Sharp and Burge 1997 Custom software (available upon request) was used to extract CSs from the intronic sequences flanking splice-junctions (Fig. 2). A CS was defined to be a contiguous run of at least 7 nt of identity from an alignment between human and six eutherian mammals: Pan troglodytes (chimp), Macaca mulatta (rhesus monkey), Mus musculus (house mouse), Rattus norvegicus (house rat), Canis lupus familiaris (domestic dog), and Bos taurus (domestic cow). The alignment used was the UCSC alignment of 17 vertebrate genomes (hg17, March 2004, http://hgdownload.cse.ucsc.edu/goldenPath/hg17/multiz17way/). Intronic sequences were extracted from the first 7100 nt for the donor (DI) and the last 100 to last 4 nt for the acceptor (AI) region (see Fig. 2). Since we included only introns that were >199 bases in length, this value eliminated overlap between the donor and acceptor sides of the intron. Extracted CSs were categorized according to the region from which they were recovered. The splice-junction database and CS sequences are available upon request. For each CS identified, we also extracted 100 additional analogous (e.g., having the same splice-junction relative starting position and the same length) sequences as the CS from introns randomly chosen from the original data set of all human introns. These sequences (equivalent to 100 times the size of the CS samples) made up the RS pool.
We thank those who gave valuable feedback during preparation of this manuscript: Stephen M. Garrey, Marcus J. Lanskey, Amy Mahady, Scott Mahady, Kristy Henscheid, Jill I. Murray, Emily Goers, Pascale M. Voelker, and Alice Barkan. This work was partially supported by the American Heart Association grant 0420073Z to R.B.V. and grants from NSF (MCB-0616264) and NIH (AR053903) to J.A.B.
1 Corresponding author.
E-mail aberglund{at}molbio.uoregon.edu; fax (541) 346-5891. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6017807
Agresti, A. and Coull, B.A. 1998. Approximate is better than exact for interval estimation of binomial proportions. Am. Stat. 52: 119126.[CrossRef] Baraniak, A.P., Chen, J.R., and Garcia-Blanco, M.A. 2006. Fox-2 mediates epithelial cell-specific fibroblast growth factor receptor 2 exon choice. Mol. Cell. Biol. 26: 12091222. Barreau, C., Paillard, L., and Osborne, H.B. 2005. AU-rich elements and associated factors: Are there unifying principles? Nucleic Acids Res. 33: 71387150.[CrossRef][Medline] Barreau, C., Paillard, L., Mereau, A., and Osborne, H.B. 2006. Mammalian CELF/Bruno-like RNA-binding proteins: Molecular characteristics and biological functions. Biochimie 88: 515525.[Medline] Berglund, J.A., Chua, K., Abovich, N., Reed, R., and Rosbash, M. 1997. The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC. Cell 89: 781787.[CrossRef][Medline] Berglund, J.A., Abovich, N., and Rosbash, M. 1998a. A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition. Genes & Dev. 12: 858867. Berglund, J.A., Fleming, M.L., and Rosbash, M. 1998b. The KH domain of the branchpoint sequence binding protein determines specificity for the pre-mRNA branchpoint sequence. RNA 4: 9981006.[Abstract] Bird, C.P., Stranger, B.E., and Dermitzakis, E.T. 2006. Functional variation and evolution of non-coding DNA. Curr. Opin. Genet. Dev. 16: 559564.[CrossRef][Medline] Blencowe, B.J. 2000. Exonic splicing enhancers: Mechanism of action, diversity and role in human genetic diseases. Trends Biochem. Sci. 25: 106110.[CrossRef][Medline] Brow, D.A. 2002. Allosteric cascade of spliceosome activation. Annu. Rev. Genet. 36: 333360.[CrossRef][Medline] Brudno, M., Gelfand, M.S., Spengler, S., Zorn, M., Dubchak, I., and Conboy, J.G. 2001. Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Res. 29: 23382348. Burge, C.B., Tuschl, T., and Sharp, P.A. 1999. Splicing of precursors to mRNAs by the spliceosomes. In The RNA world (eds. R.F. Gesteland et al.), pp. 525560. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. 2d ed. Cartegni, L., Chew, S.L., and Krainer, A.R. 2002. Listening to silence and understanding nonsense: Exonic mutations that affect splicing. Nat. Rev. Genet. 3: 285298.[CrossRef][Medline] Cooper, G.M. and Sidow, A. 2003. Genomic regulatory regions: Insights from comparative sequence analysis. Curr. Opin. Genet. Dev. 13: 604610.[CrossRef][Medline] Del Gatto-Konczak, F., Bourgeois, C.F., Le Guiner, C., Kister, L., Gesnel, M.C., Stevenin, J., and Breathnach, R. 2000. The RNA-binding protein TIA-1 is a novel mammalian splicing regulator acting through intron sequences adjacent to a 5' splice site. Mol. Cell. Biol. 20: 62876299. Dember, L.M., Kim, N.D., Liu, K.Q., and Anderson, P. 1996. Individual RNA recognition motifs of TIA-1 and TIAR have different RNA binding specificities. J. Biol. Chem. 271: 27832788. Dermitzakis, E.T., Reymond, A., and Antonarakis, S.E. 2005. Conserved non-genic sequencesan unexpected feature of mammalian genomes. Nat. Rev. Genet. 6: 151157.[CrossRef][Medline] Dietrich, R.C., Fuller, J.D., and Padgett, R.A. 2005. A mutational analysis of U12-dependent splice site dinucleotides. RNA 11: 14301440. Dredge, B.K. and Darnell, R.B. 2003. Nova regulates GABA(A) receptor Dredge, B.K., Stefani, G., Engelhard, C.C., and Darnell, R.B. 2005. Nova autoregulation reveals dual functions in neuronal splicing. EMBO J. 24: 16081620.[CrossRef][Medline] Fairbrother, W.G., Yeh, R.F., Sharp, P.A., and Burge, C.B. 2002. Predictive identification of exonic splicing enhancers in human genes. Science 297: 10071013. Faustino, N.A. and Cooper, T.A. 2003. Pre-mRNA splicing and human disease. Genes & Dev. 17: 419437. Fedorov, A., Saxonov, S., Fedorova, L., and Daizadeh, I. 2001. Comparison of intron-containing and intron-lacking human genes elucidates putative exonic splicing enhancers. Nucleic Acids Res. 29: 14641469. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||