|
|
|
|
Published online before print
July 10, 2007, 10.1101/gr.6320607 Genome Res. 17:1139-1145, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs)1 Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany; 2 Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; 3 Institute of Bioinformatics, University of Münster, Münster, Germany
Exonization of retroposed mobile elements, a process whereby new exons are generated following changes in non-protein-coding regions of a gene, is thought to have great potential for generating proteins with novel domains. Our previous analysis of primate-specific Alu-short interspersed elements (SINEs) showed, however, that during their 60 million years of evolution, SINE exonizations occurred in some primates, only to be lost again in some of the descendent lineages. This dynamic gain and loss makes it difficult to ascertain the contribution of exonization to genomic novelty. It was speculated that Alu-SINEs are too young to reveal persistent protein exaptation. In the present study we examined older mobile elements, mammalian-wide interspersed repeats (MIRs) that underwent active retroposition prior to the placental mammalian radiation 130 million years ago, to determine their contribution to protein-coding sequences. Of 107 potential cases of MIR exonizations in human, an analysis of splice sites substantiates a mechanism that benefits from 3' splice site selection in MIR sequences. We retraced in detail the evolution of five MIR elements that exonized at different times during mammalian evolution. Four of these are expressed as alternatively spliced transcripts; three in species throughout the mammalian phylogenetic tree and one solely in primates. The fifth is the first experimentally verified, constitutively expressed retroposed SINE element in mammals. This pattern of highly conserved, alternatively and constitutively spliced MIR sequences evinces the potential of exonized transposed elements to evolve beyond the transient state found in Alu-SINEs and persist as important parts of functional proteins.
Genomic plasticity has contributed significantly to the dynamic generation of novel features in evolution. In this context, retroposed genetic elements, which are sequences of DNA that amplify via RNA to different positions within the genome, play a decisive role as inducers or substrates of novel evolutionary building blocks (Brosius and Gould 1992
Following the path of Alu exonizations along a phylogenetic tree of primates indicates a dynamic gain and loss of exonizations, processes that may embrace >60 million years (Myr) of evolution (Krull et al. 2005
Good candidates for illuminating these older exonization processes are the retroposed mammalian-wide interspersed repeat (MIR) elements. MIR elements amplified
In this paper, we focus on MIR exonizations in a phylogenetic context by characterizing novel gene modules in representatives of all mammalian clades (Kriegs et al. 2006
Selection of the data set and examples To identify MIR elements in protein-coding sequences of human, mouse, and rat, we screened a compilation of mammalian mRNAs presumably harboring transposable element-cassettes assembled by Makalowski and co-workers (Genomic ScrapYard; http://warta.bio.psu.edu/SYDB/database.html), performing separate searches for the listed species. From 372 ScrapYard records with potential MIR element-cassettes identified in this initial search, 126 MIR sequences were present in protein-coding sequences (CDS), the remaining were either redundant entries or otherwise artifactual (Supplemental Table S1; Supplemental Fig. S1). Of these 126 loci, 107 were found in human (one of which was initially identified in rat) and were used to investigate the distribution and orientation of exonized MIR sequences (Fig. 1). Five of the above 126 cases were (1) supported by ESTs or other indications of expression, (2) flanked by conserved sequence regions facilitating mammalian wide PCR amplification of fragments not exceeding 2 kb, (3) expressed in available tissue, and (4) embedded in introns, and thereby suitable for intensive phylogenetic reconstruction (Fig. 2; Supplemental Data Set S1). The five MIR cassettes were found in the following genes: (1) neurotrophic tyrosine kinase receptor type 3 gene (NTRK3; GenBank accession no. BT007291), (2) zinc finger protein 639 gene (ZNF639; NM_016331), (3) LAS1-like gene (LAS1L; NM_031206), (4) zinc finger protein 384 gene (Zfp384; AF216807), and (5) cholinergic receptor nicotinic alpha 1 gene (CHRNA1; NM_001039523). The mammalian-wide evolutionary distribution and state of expression of these exonized sequences is summarized in Figure 3.
A natural splice site in MIR elements Because the presence or acquisition of alternative splice sites recognized by the splicing machinery is crucial for intronic elements to be exonized, we examined the nature of the splice sites flanking the 107 MIR exonizations identified in human by compiling and comparing their potential protein-coding sequences (Fig. 1). Significantly more of the potential exonized MIR elements (64 of 107; 2 test, P < 0.05) were inserted in the antisense orientation, presumably favored by an internal oligopyrimidine tract in the appropriate distance, an additional component of splice sites. Surprisingly, although one might argue that there is a general insertion preference for antisense MIRs, just the opposite is true. We analyzed MIR elements in all human introns and found significantly more MIR elements inserted in the sense orientation (36,120) relative to the transcription of the host gene than in the antisense orientation (34,414; 2 test, P < 0.001). By comparison, from 492,344 intronic human Alu sequences, 222,597 were located in the sense and 269,747 in the antisense orientation. Thus, contrary to MIR intronic insertions, significantly more Alu insertions were located in the antisense orientation ( 2 test, P 0.001).
Twenty of the 64 antisense exonized MIR elements feature a MIR-contributed AG splice site that is preceded by a MIR-contributed oligopyrimidine tract (Fig. 1). This configuration is similar to the prevalent 3' splice site in the right arm of antisense-oriented Alu elements (Makalowski et al. 1994
Alternative expression of MIR sequences
Constitutive expression of a MIR sequence For ZNF639, all necessary events from insertion of the MIR element to recruitment of parts of the MIR as a novel protein-coding exon occurred on the phylogenetic branch leading from the common ancestor of amniotes (mammals, reptiles, dinosaurs, and birds) to the mammalian ancestor. However, there are no living animals that diverged in this time period that would enable us to reconstruct, step by step, the successive evolution of molecular changes that were necessary to facilitate the 5' functional splice site and an intact ORF, or to show possible intermediate alternative splice variants.
Mammalian-wide detection of MIR sequence exonizations From a total of 372 potential exonized MIR elements, we investigated 126 elements with strong indications of exonization in human, mouse, or rat (Supplemental Fig. S1). On five of these, we performed an extensive retrospective analysis reconstructing >100 Myr of mammalian history. To establish a complete mammalian exonization pattern, we limited our analyses to those loci that could be amplified in the greatest number of species, sampling for both DNA and RNA analyses (introns 2 kb). From our extensive experience analyzing 160,000 genomic loci in the evolutionary history of mammalian species (Kriegs et al. 2006
Expression of MIR sequences
An intriguing observation is that the natural, MIR-specific splice site is just 3' adjacent to the MIR conserved core region. However, there are hundreds of thousands of MIR elements that are not associated with splice sites but still bear the conserved core region. Given their origins dating back at least 130 Mya, it is still unclear why, in contrast to the rest of the MIR sequence, the core sequences remain so highly conserved. In coding sequences the 5' domain of MIR elements appears to be preferentially exonized (Fig. 1). On the one hand, this bias may be due to the naturally occurring splice site in the consensus sequence, but it also might reflect a more efficient recognition of exonized conserved 5' MIR parts compared with more diverged, but possibly as well exonized, 3' parts by the RepeatMasker. The documented persistence of MIR element exonizations appears to be different from that of Alu-SINEs. In our detailed analysis of selected Alu exonizations, we found examples in which apparently functional splice sites in ancestral species were not conserved in all related primate lineages, indicating a loss of functional splice forms (Krull et al. 2005
Time point of MIR exonization
Nearly 30 years ago, Walter Gilbert recognized in alternative splicing a process that allows evolution to try out new solutions without destroying the old (Gilbert 1978
Evolutionary time seems to be a critical factor in establishing essential key mutations required for exonization. Three examples endorse this assumption (Fig. 4): (1) Alu exonizations that are present in specific primate lineages but not in others (RPE2–1, C-rel-2, MTO1–3, survivin) (Krull et al. 2005
The low Ka/Ks value emphasizes the moderate selection pressure acting on the exonized MIR sequence of ZNF639. Interestingly, although we did not detect a MIR element in the corresponding locus in chicken, we found an exonized intronic sequence of the same size in this locus. The same sequence region (one additional triplet with respect to chicken) is also exonized in the ostrich, which represents another major branch of the bird phylogenetic tree. A DNA sequence alignment shows no apparent relationship between the two independently exonized sequences in mammals and birds and only 50% random similarity (Supplemental Fig. S4A). However, although the additional sequences of the proteins do display some similarities in charges and hydrophobicity (Supplemental Fig. S4B,C), protein structural information is necessary to understand if the exonized sequences might play any beneficial role at all in separating neighboring protein domains. The orthologous gene in Xenopus lacks this additional exon, and consequently the protein lacks the extra 45-amino acid segment.
Although Ka/Ks values indicate that the exonized part of the MIR element itself is under moderate selection pressure, the 291-nt adjacent protein coding flanks (parts of exons 5 and 7) show even lower Ka/Ks values (0.17 vs. 0.19 for the exonized MIR). This, at most, suggests a possibly lower selection pressure on the MIR exonized sequence than on the flanks. This difference is even greater when compared to the nine functional zinc finger domains of ZNF639 in exon 7 (582 nt; Ka/Ks = 0.03; data not shown). However, more information about functional domains of the N-terminal region of the protein is necessary to present more conclusive information about a potential spacer function of the exonized MIR sequence. There is also another report of two independent exonizations, although of different lengths and origin, in the same intron of the ADARB1 gene in different taxonomic groups (human and mouse; Slavov and Gardiner 2002
In theory, the first transition from random insertion to alternative splicing is reversible and is either not at all under purifying selection or under relaxed negative selection (Xing and Lee 2006
The contribution of transposed elements to gene structures is more or less coincidental. Their persistence is usually transient. If not deleted, they fade beyond recognition over longer evolutionary periods. However, a notable fraction of transposed elements escapes transience, for example, by integrating into protein-coding parts of genes, facilitated by internal components providing splice sites and oligopyrimidine tracts and "reprogramming" the splicing system of a targeted gene. Once proven worthy in the struggle of survival, they endure recognizably over hundreds of millions of years and contribute to significant tasks. We have identified and analyzed some of these candidates, thus shedding light on their >100 million-year-old evolutionary histories that show they have clearly stood the test of evolutionary time and persisted in mammalian lineages. We showed that 3' splice site selection in exonized transposed elements is not restricted to Alu elements but seems to be an older and significant mechanism for MIR exonization as well. Alternative splicing of exonized MIRs is exemplarily shown to be a stable process retained >100 Myr in all major groups of mammals. Functional persistence shown by constitutive splicing of an exonized MIR sequence was evidenced for the first time and demonstrates that this evolutionary pathway is not necessary correlated with genetic disorder, as has been suggested for Alu exonizations by Lev-Maor et al. (2003)
Data selection To identify MIR elements in protein-coding sequences, we screened a compilation of mammalian mRNAs with transposable element cassettes (Genomic ScrapYard; http://warta.bio.psu.edu/SYDB/database.html), performing independent searches for human, mouse, and rat. Out of 1091, 472, and 201 matching records we selected 314, 38, and 20 cases that included MIR elements in human, mouse, and rat, respectively. These 372 cases were then scrutinized to filter out duplications and other artifacts (246 cases; Supplemental Fig. S1). To search for cases that were suitable for experimental evaluation of their phylogeny, we screened the remaining 126 potential exonizations by applying the following criteria: (1) The MIR-derived sequence should not be located in the first or last protein-coding exon as they usually lack highly conserved flanking regions and hence are difficult to amplify by polymerase chain reaction (PCR). (2) To enable manageable PCRs of up to 2 kb in highly diverged mammalian species, the MIR-derived sequences should be flanked by conserved sequences. Conservation was determined by comparison of available genomic sequences (e.g., of human, mouse, and dog). (3) Indication for exonization should be supplied by such available information as EST data or other published information. (4) Relevant tissues should be available for specific alternative transcripts. Five of the 126 potential exonizations fulfilled all four criteria and were subjected to further phylogenetic examination using PCR, RT-PCR, and sequence analyses. Note that the ScrapYard database consists of GenBank entries from January 15, 2002. It is expected that an updated database would facilitate the recovery of additional cases of MIR exonizations. Available sequence information was obtained from the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgBlat) and the NCBI trace archive (http://www.ncbi.nlm.nih.gov/blast/tracemb.shtml). Experimental procedures concerning DNA and RNA extraction, PCR amplification, and reverse transcription are given in the Supplemental Protocol S1. PCR primers are listed in Supplemental Table S2 and are illustrated in Supplemental Fig. S5.
Sequence analyses
Ka/Ks values
We thank Frank Grützner, Rodney L. Honeycutt, Uwe Joite, Jan Ole Kriegs, Jörg Molten, Bernhard Neurohr, Christian Roos, Gertrud Scheele, Heike Weber, and Anja Zemann for providing us with tissue samples and Marsha Bundman for editorial assistance. We thank Valer Gotea for his help in selecting data from the Genomic ScrapYard database and Michael Haberl for his comments. We thank Django Sussman for introducing us to methods for analyzing structural features of the ZNF639 protein. J.S. thanks Matthias Schmitz for all his personal support. This work was supported by the Nationales Genomforschungsnetz (NGFN) (0313358A to J.B. and J.S.), the European Union (EU) (LSHG-CT-2003-503022 to J.B.), and the Deutsche Forschungsgemeinschaft (DFG) (SCHM1469 to J.S. and J.B.).
4 Corresponding authors. E-mail jueschm{at}uni-muenster.de; fax 49-251-8352134.
E-mail RNA.world{at}uni-muenster.de; fax 49-251-8358512. [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to NCBI under accession nos: DQ323592–DQ323661, DQ507223–DQ507235, DQ855908–DQ855913, EF418572, EF422277, EF520683, and EF520684.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6320607
Bejerano, G., Lowe, C.B., Ahituv, N., King, B., Siepel, A., Salama, S.R., Rubin, E.M., Kent, W.J., and Haussler, D. 2006. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441: 87–90.[CrossRef][Medline] Bogaerts, S., Vanlandschoot, A., van Hengel, J., and van Roy, F. 2005. Nuclear translocation of Bridges, C.B. 1936. Genes and chromosomes. Teaching Biol. Nov: 17–23. Brosius, J. 2005. Echoes from the past—Are we still in an RNP world? Cytogenet. Genome Res. 110: 8–24.[CrossRef][Medline] Brosius, J. and Gould, S.J. 1992. On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA". Proc. Natl. Acad. Sci. 89: 10706–10710. Gilbert, W. 1978. Why genes in pieces? Nature 271: 501.[CrossRef][Medline] Gilbert, N. and Labuda, D. 1999. CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs. Proc. Natl. Acad. Sci. 96: 2869–2874. Gotea, V. and Makalowski, W. 2006. Do transposable elements really contribute to proteomes? Trends Genet. 22: 260–267.[CrossRef][Medline] Greenberg, A.J., Moran, J.R., Fang, S., and Wu, C.-I. 2006. Adaptive loss of an old duplicated gene during incipient speciation. Mol. Biol. Evol. 23: 401–410. Hillman, R.T., Green, R.E., and Brenner, S.E. 2004. An unappreciated role for RNA surveillance. Genome Biol. 5: R8. doi: 10.1186/gb-2004-5-2-r8.[CrossRef][Medline] Imoto, I., Yuki, Y., Sonoda, I., Ito, T., Shimada, Y., Imamura, M., and Inazawa, J. 2003. Identification of ZASC1 encoding a Krüppel-like zinc finger protein as a novel target for 3q26 amplification in esophageal squamous cell carcinomas. Cancer Res. 63: 5691–5696. Kapitonov, V.V., Pavlicek, A., and Jurka, J. 2004. Anthology of human repetitive DNA. In Encyclopedia of molecular cell biology and molecular medicine (ed. R.A. Meyers), pp. 251–305. Wiley-VCH, Weinheim. Kriegs, J.O., Churakov, G., Kiefmann, M., Jordan, U., Brosius, J., and Schmitz, J. 2006. Retroposed elements as archives for the evolutionary history of placental mammals. PLoS Biol. 4: e91. doi: 10.1371/journal.pbio.0040091.[CrossRef][Medline] Krull, M., Brosius, J., and Schmitz, J. 2005. Alu-SINE exonization: En route to protein-coding function. Mol. Biol. Evol. 22: 1702–1711. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860–921.[CrossRef][Medline] Lev-Maor, G., Sorek, R., Shomron, N., and Ast, G. 2003. The birth of an alternatively spliced exon: 3' Splice-site selection in Alu exons. Science 300: 1288–1291. Makalowski, W., Mitchell, G.A., and Labuda, D. 1994. Alu sequences in the coding regions of mRNA: A source of protein variability. Trends Genet. 10: 188–193.[CrossRef][Medline] Mola, G., Vela, E., Fernández-Figueras, M.T., Isamat, M., and Munoz-Mármol, A.M. 2007. Exonization of Alu-generated splice variants in the Survivin gene of human and non-human primates. J. Mol. Biol. 366: 1055–1063.[CrossRef][Medline] Nishihara, H., Smit, A.F.A., and Okada, N. 2006. Functional noncoding sequences derived from SINEs in the mammalian genome. Genome Res. 16: 864–874. Ohno, S. 1970. Evolution by gene duplication. Springer, New York. Saini, S.S., Tüzün, E., and Christadoss, P. 2005. The cDNA of mouse skeletal muscle transcribe for both isoforms 1 and 2 of acetylcholine receptor Singer, S.S., Maennel, D.N., Hehlgans, T., Brosius, J., and Schmitz, J. 2004. From "junk" to gene: Curriculum vitae of a primate receptor isoform gene. J. Mol. Biol. 341: 883–886.[CrossRef][Medline] Slavov, D. and Gardiner, K. 2002. Phylogenetic comparison of the pre-mRNA adenosine deaminase ADAR2 genes and transcripts: Conservation and diversity in editing site sequence and alternative splicing patterns. Gene 299: 83–94.[CrossRef][Medline] Smit, A.F. and Riggs, A.D. 1995. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res. 23: 98–102. Sorek, R., Ast, G., and Graur, D. 2002. Alu-containing exons are alternatively spliced. Genome Res. 12: 1060–1067. Wagner, E. and Lykke-Andersen, J. 2002. mRNA surveillance: The perfect persist. J. Cell Sci. 115: 3033–3038. Xing, Y. and Lee, C. 2005. Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc. Natl. Acad. Sci. 102: 13526–13531. Xing, Y. and Lee, C. 2006. Alternative splicing and RNA selection pressure—Evolutionary consequences for eukaryotic genomes. Nat. Rev. Genet. 7: 499–509.[CrossRef][Medline] Yang, Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555–556. Yang, Z. and Nielsen, R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17: 32–43. Zhang, X.H.-F. and Chasin, L.A. 2006. Comparison of multiple vertebrate genomes reveals the birth and evolution of human exons. Proc. Natl. Acad. Sci. 103: 13427–13432.
Received January 24, 2007; accepted in revised format May 8, 2007. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||