Genome Res. 13:2637-2650, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Letter
Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification
Xiang H-F. Zhang1,
Katherine A. Heller2,
Ilana Hefter2,
Christina S. Leslie2 and
Lawrence A. Chasin1,3
1 Department of Biological Sciences, Columbia University, New York, New York 10027, USA
2 Department of Computer Science, Columbia University, New York, New York 10027, USA
Vertebrate pre-mRNA transcripts contain many sequences that resemble splice sites on the basis of agreement to the consensus, yet these more numerous false splice sites are usually completely ignored by the cellular splicing machinery. Even at the level of exon definition, pseudo exons defined by such false splices sites outnumber real exons by an order of magnitude. We used a support vector machine to discover sequence information that could be used to distinguish real exons from pseudo exons. This machine learning tool led to the definition of potential branch points, an extended polypyrimidine tract, and C-rich and TG-rich motifs in a region limited to 50 nt upstream of constitutively spliced exons. C-rich sequences were also found in a region extending to 80 nt downstream of exons, along with G-triplet motifs. In addition, it was shown that combinations of three bases within the splice donor consensus sequence were more effective than consensus values in distinguishing real from pseudo splice sites; two-way base combinations were optimal for distinguishing 3' splice sites. These data also suggest that interactions between two or more of these elements may contribute to exon recognition, and provide candidate sequences for assessment as intronic splicing enhancers.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1679003.
3 Corresponding author. E-MAIL lac2{at}columbia.edu; FAX (212) 532-0425.
[Supplemental material is available online at www.genome.org.]

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
Y. Ryabov and M. Gribskov
Spontaneous symmetry breaking in genome evolution
Nucleic Acids Res.,
May 1, 2008;
36(8):
2756 - 2763.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Ke, X. H.-F. Zhang, and L. A. Chasin
Positive selection acting on splicing motifs reflects compensatory evolution
Genome Res.,
April 1, 2008;
18(4):
533 - 543.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. A. Calarco, Y. Xing, M. Caceres, J. P. Calarco, X. Xiao, Q. Pan, C. Lee, T. M. Preuss, and B. J. Blencowe
Global analysis of alternative splicing differences between humans and chimpanzees
Genes & Dev.,
November 15, 2007;
21(22):
2963 - 2975.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. I. Dogan, L. Getoor, W. J. Wilbur, and S. M. Mount
SplicePort--An interactive splice-site analysis tool
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W285 - W291.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Das, T. A. Clark, A. Schweitzer, M. Yamamoto, H. Marr, J. Arribere, S. Minovitsky, A. Poliakov, I. Dubchak, J. E. Blume, et al.
A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing
Nucleic Acids Res.,
July 10, 2007;
(2007)
gkm485v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. B. Voelker and J. A. Berglund
A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing
Genome Res.,
July 1, 2007;
17(7):
1023 - 1033.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Cheng, R. M. Miura, and B. Tian
Prediction of mRNA polyadenylation sites by support vector machine
Bioinformatics,
October 1, 2006;
22(19):
2320 - 2325.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Xing, Q. Wang, and C. Lee
Evolutionary Divergence of Exon Flanks: A Dissection of Mutability and Selection
Genetics,
July 1, 2006;
173(3):
1787 - 1791.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Shepelev and A. Fedorov
Advances in the Exon-Intron Database (EID)
Brief Bioinform,
June 1, 2006;
7(2):
178 - 185.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. H.-F. Zhang, T. Kangsamaksin, M. S. P. Chao, J. K. Banerjee, and L. A. Chasin
Exon Inclusion Is Dependent on Predictable Exonic Splicing Enhancers
Mol. Cell. Biol.,
August 15, 2005;
25(16):
7323 - 7332.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. H-F. Zhang, C. S. Leslie, and L. A. Chasin
Dichotomous splicing signals in exon flanks
Genome Res.,
June 1, 2005;
15(6):
768 - 779.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Dror, R. Sorek, and R. Shamir
Accurate identification of alternatively spliced exons using support vector machine
Bioinformatics,
April 1, 2005;
21(7):
897 - 901.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Minovitsky, S. L. Gee, S. Schokrpur, I. Dubchak, and J. G. Conboy
The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons
Nucleic Acids Res.,
February 3, 2005;
33(2):
714 - 724.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Yeo, S. Hoon, B. Venkatesh, and C. B. Burge
Variation in sequence and organization of splicing regulatory elements in vertebrate genes
PNAS,
November 2, 2004;
101(44):
15700 - 15705.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. H-F. Zhang and L. A. Chasin
Computational definition of sequence motifs governing constitutive exon splicing
Genes & Dev.,
June 1, 2004;
18(11):
1241 - 1250.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|