|
|
|
|
Published online before print
May 12, 2003, 10.1101/gr.1008203 Genome Res. 13:1155-1157, 2003 ©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Letter Phylogenetically Older Introns Strongly Correlate With Module Boundaries in Ancient Proteins1 Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
The hypothesis that some (but not all) introns were used to construct ancient genes by exon shuffling of modules at the earliest stages of evolution is supported by the finding of an excess of phase-zero intron positions in the boundary regions of such modules in 276 ancient proteins (defined as common to eukaryotes and prokaryotes). Here we show further that as phase-zero intron positions are shared by distant taxa, and thus are truly phylogenetically ancient, their excess in the boundaries becomes greater, rising to an 80% excess if shared by four out of the five taxa: vertebrates, invertebrates, fungi, plants, and protists.
We recently studied the distribution of introns in homologs of 276 ancient unrelated proteins of known three-dimensional structure and found a significant, but small, excess of phase-zero intron positions in the boundary regions of modules 1535 Å in diameter (Fedorov et al. 2001
Modules are compact subregions of the peptide chain, described as lying within a maximum diameter (Go 1981
The "ancient" proteins have homologs in both prokaryotes and eukaryotes. They have no introns in the prokaryotes, but have introns in the complex eukaryotes. In an introns-late model (Palmer and Logsdon 1991
Our large sample of 3328 phase-zero introns (Fedorov et al. 2001
Figure 1A, 1B shows the correlations of these sets with the module boundary regions of the 276 ancient proteins. Introns have been shown to correlate with boundaries of modules defined for a large range of diameters, from 1040 Å (de Souza et al. 1996
Are the differences between the patterns for the various subsets (sets 24) and the full set (set 1) significant? To determine this, we asked how often one would expect to draw a subset with an average excess equal to or larger than that for a given subset. For each subset, we generated 10,000 subsets (100,000 for set 4) of an equal number of intron positions as the real subset, and we calculated their average excess over the range 1535 Å. As Table 1 shows, each subset exhibits a correlation with module boundaries significantly stronger than would be expected from such a random sample.
For the phase-one and phase-two introns, we did not find any significant correlation or excess of the corresponding matched sets with module boundary regions of ancient genes. Neither did we find any effects with the matched introns of different phases with module boundaries of eukaryote-specific genes (see our Web page, www.mcb.harvard.edu/gilbert/intron_subsets).
This greater and greater excess in module boundary regions is exactly what one expects if the excess is due to a population of ancient introns that define module boundaries, as would be the case if these genes had been assembled by exon shuffling between modules using phase-zero introns. These findings are not explicable within the framework of an introns-late theory, which claims that all introns are inserted into genes at the latest stages of evolution. If one were to argue that perhaps there is a weak selective advantage to introns, once inserted, being maintained at module boundaries, so that old introns would be more likely to lie at module boundaries, one must provide reasons for that not being the case for phase-one and phase-two introns. It is far more likely that phase-zero introns were used before the branching of the eukaryotes to create proteins by exon shuffling. Since the proteins we examine are shared by both eukaryotes and prokaryotes, their homology and colinearity suggest that they were created by exon shuffling at the early stages of evolution, predating the major divergences of eukaryotes and prokaryotes. This hypothesis was recently supported by Kaessmann et al. (2002
We analyze intron patterns with a computer program, INTERMODULE, that divides each three-dimensional structure into modules of a given diameter and defines boundary regions between the module cores. The program then calculates the excess of introns in the boundary regions, compared to a random expectation, for the entire set of introns and proteins (de Souza et al. 1998 We used Monte Carlo methods to determine whether our sets 24 are significantly more correlated with module boundaries than is the phase-zero set as a whole. For each set of phylogenetically conserved positions, we generated 10,000 random subsets of the full set of 3328 phase-zero positions, each subset containing the same number of positions as the real set (550, 118, and 29 intron positions for sets 2, 3, and 4, respectively). Then we studied the correlation of the introns in these random subsets with module boundaries. For each real subset (24), the fraction of the corresponding 10,000 random subsets which have correlations as strong as the real subset is the probability of seeing the observed correlation at random.
All calculations were performed with computer programs written in PERL and C. A full set of our results is available on our Web page: www.mcb.harvard.edu/gilbert/intron_subsets. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1008203. Article published online before print in May 2003.
2 Present address: Dept. of Medicine, Medical College of Ohio, Toledo, OH 43614-5809, USA.
3 Present address: Genzyme Corp., Framingham, MA 01701, USA.
4 Corresponding author.
de Souza, S.J., Long, M., Schoenbach, L., Roy, S.W., and Gilbert, W. 1996. Intron positions correlate with module boundaries in ancient proteins. Proc. Natl. Acad. Sci.
93:
1463214636.
de Souza, S.J., Long, M., Klein, R.J., Roy, S., Lin, S., and Gilbert, W. 1998. Toward a resolution of the introns early/late debate: Only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci.
95:
50945099.
Fedorov, A., Cao, X., Saxonov, S., de Souza, S., Roy, S.W., and Gilbert, W. 2001. Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns. Proc. Natl. Acad. Sci.
98:
1317713182. Gilbert, W. 1987. The exon theory of genes. Cold Spring Harbor Symp. Quant. Biol. 52: 901905.[Medline] Go, M. 1981. Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature 291: 9092.[CrossRef][Medline]
Kaessmann, H., Zollner, S., Nekrutenko, A., and Li, W.-H. 2002. Signatures of domain shuffling in the human genome. Genome Res. 12:
16421650. Logsdon, J.M. 1998. The recent origin of spliceosomal introns revised. Curr. Opin. Genet. Dev. 8: 637648.[CrossRef][Medline] Palmer, J.D. and Logsdon, J.M. 1991. The recent origin of introns. Curr. Opin. Genet. Dev. 1: 470477.[CrossRef][Medline]
Panchenko, A.R., Luthey-Schulten, Z., and Wolynes, P.G. 1996. Foldons, protein structural modules, and exons. Proc. Natl. Acad. Sci.
93:
20082013. Patthy, L. 1999. Genome evolution and the evolution of exon-shufflingA review. Gene 238: 103114.[CrossRef][Medline]
Received November 18, 2002;
accepted in revised format March 21, 2003.
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||