|
|
|
|
Published online before print
July 15, 2005, 10.1101/gr.3688905 Genome Res. 15:1073-1078, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00
Letter Gene-breaking: A new paradigm for human retrotransposon-mediated gene evolution1 Department of Molecular Biology and Genetics, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
The L1 retrotransposon is the most highly successful autonomous retrotransposon in mammals. This prolific genome parasite may on occasion benefit its host through genome rearrangements or adjustments of host gene expression. In examining possible effects of L1 elements on host gene expression, we investigated whether a full-length L1 element inserted in the antisense orientation into an intron of a cellular gene may actually split the gene's transcript into two smaller transcripts: (1) a transcript containing the upstream exons and terminating in the major antisense polyadenylation site (MAPS) of the L1, and (2) a transcript derived from the L1 antisense promoter (ASP) that includes the downstream exons of the gene. Bioinformatic analysis and experimental follow-up provide evidence for this L1 "gene-breaking" hypothesis. We identified three human genes apparently "broken" by L1 elements, as well as 12 more candidate genes. Most of the inserted L1 elements in our 15 candidate genes predate the human/chimp divergence. If indeed split, the transcripts of these genes may in at least one case encode potentially interacting proteins, and in another case may encode novel proteins. Gene-breaking represents a new mechanism through which L1 elements remodel mammalian genomes.
Transposable elements are neither "junk DNA" nor mere curiosities to be categorized taxonomically and relegated to dusty catalogs; rather, they can affect gene expression in important ways and are a dynamic and significant part of our evolutionary history (Boissinot et al. 2000
L1 elements, retrotransposons comprising nearly 17% of the human genome (Smit 1996
Han et al. (2004
The "gene-breaking" model
Bioinformatic evidence for the predicted transcripts
Next, we identified upstream transcripts that terminate in the MAPS. The last 485 residues of L1 were used as a query in an unfiltered BLAST search against the human EST database. Over 4900 ESTs with BLAST alignment e-values >1e-10 were identified this way. Those for which the alignments terminated 3' of the MAPS (between residues 5551 and 5562) were further analyzed. We identified eight transcripts (six EST sequences plus one gene record and a predicted gene record) that terminate exactly in the MAPS (Fig. 2). Six of these are clearly polyadenylated and thus unambiguously represent transcripts produced by termination at the MAPS. The other two come from libraries obtained by using an oligo dT primer and presumably also did contain polyA tails that were truncated in the database records. Our search for 3' antisense L1 sequence in the RefSeq database also revealed a cellular gene, kinetochore protein Spc25, that is normally polyadenylated at the MAPS of an L1 sequence lying distal to the last exon; its last 457 nucleotides are identical to the first 457 3' antisense nucleotides of L1rp. The L1 element partially contained in Spc25 is 95% identical to L1rp. The existence of multiple Spc25 transcripts with this structure in the database provides additional evidence that the MAPS activity is present in endogenous L1 elements. We performed another search of the EST database with the entire L1 sequence in order to detect any other mRNA termination sites. L1 positions 5170, 5172, and 5173 form a noticeable cluster of antisense EST termination sites (359 ESTs on the antisense strand). These positions are situated 68 bases downstream of a potential poly(A) signal, on the antisense strand. None of the ESTs contains any non-L1 sequence and therefore cannot be assigned to any particular genomic location. However, these data indicate that there may be more than one cryptic polyadenylation signal in the antisense L1 for premature termination of a cellular transcript. Although the data presented thus far confirm that the L1 ASP and the L1 MAPS are used in human genes, they do not provide complete evidence for a true gene-breaking event; i.e., a single gene which gives rise to both an ASP transcript and a MAPS-terminated transcript. To address this, we searched for potential MAPS-terminated ESTs upstream of the 15 already identified ESTs originating from intronic L1 ASPs. To be considered a potential MAPS-terminated transcript, an EST would ideally extend from the adjacent exon upstream of the antisense L1, into intronic sequence, and terminate in the L1 MAPS. Realistically, it would be unlikely to find such an EST in the database since the sequencing reads generally are not long enough to contain all of this information.
The 15 genes identified, many of them named and characterized transcripts, were analyzed in detail (Table 1). Each gene contains a presumably young L1, nearly identical to L1rp, in the antisense orientation. For each gene we have identified one EST clearly initiating in the 5' antisense L1 sequence and containing downstream exons. ESTs terminating upstream of the intron containing the L1 are present for all genes, but these may or may not represent MAPS-terminated sequences, as the recorded sequences do not extend into the intronic or antisense L1 sequences. All L1s in the 15 examples are full-length (just over 6 kb), have over 96% identity to L1rp, and are flanked by target site duplications, features indicating relatively recent insertion. All ESTs derived from the L1 ASP match canonical splice site sequences at the junction of the L1 sequence and the sequence from the joined downstream exon, consistent with the examples described by Nigumann et al. (2002
MAPS-terminated transcripts Thus ASP-directed sequences were in hand for all the 15 candidate genes, but none had an unambiguous upstream transcript clearly demonstrating termination at the intronic antisense L1s. We designed and performed an RT-PCR screen to search for such transcripts (Fig. 3A). Total RNA extracts from four different kinds of human cells (Hela, HCT116, 293, and NCI-H69) were first subjected to cDNA synthesis by using a polyT-L1 chimeric primer capable of priming only on mRNAs polyadenylated at the MAPS. The resultant cDNA was amplified with a common intronic primer designed to hybridize to L1 sequences just upstream of the MAPS, and a gene-specific primer, which anneals to an exon upstream of the exon just 5' to the L1-containing intron. The primers were designed to detect products containing antisense L1 sequence from the end of L1 up to the MAPS, the intronic segment upstream of the L1, and transcript sequence up to two exons upstream, so that sequencing will reveal whether the transcript has been spliced. This primer design, therefore, enables discrimination of PCR amplifications of the upstream mRNA of interest from genomic DNA that might contaminate the extracted RNA samples, as well as full-length mRNA. Using this strategy, we successfully amplified upstream transcripts for three of the 15 L1-split candidate genes; the strategy was successful in these three cases most likely because the short distance from the MAPS to the nearest 5' exon-intron junction facilitates an efficient RT-PCR reaction. Secernin 3 (NM_024583 [GenBank] ) has the shortest distance between the cellular exon and the L1 element. A full-length antisense L1 is located in intron 5 of this gene. In Hela and HCT116 cells, RT-PCR with the secernin 3-specific primer annealing to exon 4 provided a single major band migrating at the expected position (Fig. 3B). Sequencing clones of the PCR product showed that intron 4 had been removed by splicing but exon 4 and exon 5 sequences, as well as the 5' segment of the intron upstream of the L1, were all present in the expected configuration (Fig. 3C). This, together with the previously found ASP-derived transcript (AA226814 [GenBank] ), demonstrates gene-breaking in the human secernin 3 gene (Fig. 3D).
RefSeq NM_004866
[GenBank]
contains a full-length L1 element in intron 7; this element is RefSeq NM_014960 [GenBank] is split by an antisense L1 in intron 10. This gene actually contains two L1 elements in intron 10, one a 5' truncated L1 (containing nucleotides 5393 onward) in the antisense orientation, and a full-length L1 element also in the antisense orientation downstream of the truncated L1. We identified an upstream transcript containing part of exon 9, all of exon 10 (and none of intron 9), and intron 10 up to the MAPS in the truncated L1. The ASP-derived transcript identified by database searches commences in the full-length L1 downstream (Fig. 3F). The existence of the transcript terminating at the MAPS of the truncated L1 demonstrates that even partial antisense L1 elements can truncate cellular mRNAs as predicted from transfection experiments. The other 12 candidates contain L1 elements that are further away from the nearest upstream intron-exon boundary (7100 kb). For technical reasons, these lengthy mRNAs are more difficult to amplify.
Confirming the ASP product
Predicted protein products We find strong evidence for gene-breaking in a variety of cell lines and tissues with both bioinformatics and experimental assays. The phenomenon clearly produces separate mRNA transcripts from at least three genes, and it is possible that the transcripts have functional significance apart from their RNA forms. Protein products from such split genes could in principle maintain separate structurally stable and functional domains, and it is possible that such proteins survive in the cellular environment long enough to play roles in the functions of their parental genes. The interactions that normally stabilize the folded structure of peptides could well facilitate the interaction of the two protein fragments to form a protein of normal or slightly altered function. In one of our examples, BCAS3 (NM_017679 [GenBank] ), gene-breaking has produced a subfragment of the original protein. In another example, MET (NM_000245 [GenBank] ), gene-breaking could produce functionally interacting proteins. The hepatocyte growth factor receptor gene, MET, contains an L1 in the antisense direction in its second intron, shown in Figure 5A. The EST, CB98851, confirms that this L1 does express the downstream exons of MET from its antisense promoter.
As seen in Figure 5B, the MET product is synthesized as a single protein which is predicted to be proteolytically cleaved into the We sought to identify more examples in which an antisense L1 element might affect the protein products of the host gene. This can happen because the mature transcripts terminated by L1 MAPS and produced by L1 ASP contain intronic sequence at the 3' end and 5' end, respectively. In both cases, this can cause the open reading frame to extend into intronic sequence, creating an altered protein isoform. We looked at all upstream transcripts to determine whether extension of the transcript into intronic sequence could affect the encoded amino acid sequence. This analysis revealed that all transcripts potentially encode cellular proteins truncated at their carboxy termini, containing the sequence from the upstream exons attached to adjacent intron-derived sequences; these extensions added 185 intron-encoded amino acids to the C-terminus of the encoded upstream proteins. The presence of part of an L1 element in either the upstream or downstream transcripts may alter splicing of the transcript to produce translation products not otherwise seen.
More novel protein structures produced by the L1 antisense promoter In addition to the products described above, other novel protein segments could also be encoded by the downstream transcripts derived from the L1 ASP. Most of the L1 ASP-derived ESTs have ORFs encoding products ranging in size from four to 88 amino acids, most consisting entirely of L1 sequence. Alternatively, novel proteins could potentially result from the fusion of translated L1 sequences with cellular proteins. They could also be translated from ATG codons internal to the target gene, which would create a shorter gene product. For example, the transcript AJ518836 [GenBank] , driven by the L1 ASP in the BCAS3 gene (NM_017679 [GenBank] ), is the product of splicing L1 antisense sequence to exon 3 of BCAS3. In the frame defined by the first ATG, it encodes a 113-amino acid protein derived partly from L1 and partly from BCAS3, but in the frame defined by the second ATG, it is in the same frame as the full gene transcript and encodes a C-terminal subfragment of the BCAS3 protein. Remarkably, this smaller protein is identical to the previously identified Maab3 protein (CAD57724 [GenBank] , or "metastasis-associated antigen of breast cancer," from an unpublished screen of overexpressed proteins in the metastatic breast cancer cell line MCF7. Both ATG sequences are embedded in reasonable Kozak sequences and could potentially be used in vivo. In this example, the L1 ASP-derived transcript encodes a protein, Maab 3, encoded by a transcript resulting from fusion of L1 sequence to that encoded by internal exons of a cellular gene.
The antisense promoter produces a variety of mRNA types To address whether L1 ASP or MAPS activities are tissue-specific, we noted the tissue distribution of the cells generating ESTs described here. The ESTs are derived from a wide range of cell types, spanning nearly every tissue as well as both cancerous and noncancerous cells.
Comparison of human and chimpanzee genes
Conclusions
Cell culture Hela and 293 cells were gifts from J. Moran (The University of Michigan Medical School) and S. Blackshaw (The Johns Hopkins University School of Medicine), respectively, and were maintained in DMEM supplemented with 10% FBS and 0.5 mg/mL Normocin (InvivoGen). HCT116 and NCI-H69 cells were purchased from ATCC and maintained in McCoy's 5A and RPMI (Invitrogen) supplemented with 10% FBS and 0.5 mg/mL Normocin (InvivoGen).
RT-PCR cloning
The authors wish to thank Dave Valle and Jeremy Nathans for helpful comments on the manuscript and Brian Greenlee for help with the figures. This work was supported in part by NIH grant CA16519 to J.D.B. and training grant 5 T32 CA09139 to S.J.W.
[Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3688905. Article published online before print in July 2005.
2 These authors contributed equally to this work.
3 Corresponding author.
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 33893402.
Boissinot, S., Chevret, P., and Furano, A. 2000. LINE-1 retrotransposon evolution and amplification in recent human history. Mol. Biol. Evol. 17: 915928.
Brouha, B., Schustak, J., Badge, R., Lutz-Prigge, S., Farley, A., Moran, J., and Kazazian Jr., H. 2003. Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl. Acad. Sci. 100: 52805285.
Druker, R., Bruxner, T., Lehrbach, N., and Whitelaw, E. 2004. Complex patterns of transcription at the insertion site of a retrotransposon in the mouse. Nucleic Acids Res. 32: 58005808. Gilbert, N., Lutz-Prigge, S., and Moran, J. 2002. Genomic deletions created upon LINE-1 retrotransposition. Cell 110: 315325.[CrossRef][Medline] Han, J. and Boeke, J. 2004. A highly active synthetic mammalian retrotransposon. Nature 429: 314318.[CrossRef][Medline] Han, J., Szak, S., and Boeke, J. 2004. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 429: 268274.[CrossRef][Medline] International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline]
Kazazian Jr., H. 2004. Mobile elements: Drivers of genome evolution. Science 303: 16261632. Landry, J., Mager, D., and Wilhelm, B. 2003. Complex controls: The role of alternative promoters in mammalian genomes. Trends Genet. 19: 640648.[CrossRef][Medline] Meischl, C., Boer, M., Ahlin, A., and Roos, D. 2000. A new exon created by intronic insertion of a rearranged LINE-1 element as the cause of chronic granulomatous disease. Eur. J. Hum. Genet. 8: 697703.[CrossRef][Medline] Myers, J., Vincent, B., Udall, H., Watkins, W., Morrish, T., Kilroy, G., Swergold, G., Henke, J., Henke, L., Moran, J., et al. 2002. A comprehensive analysis of recently integrated human Ta L1 elements. Am J. Hum. Genet. 71: 312326.[CrossRef][Medline] Nigumann, P., Redik, K., Mätlik, K., and Speek, M. 2002. Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics 79: 628634.[CrossRef][Medline] Salem, A., Myers, J., Otieno, A., Watkins, W., Jorde, L., and Batzer, M. 2003. LINE-1 preTa elements in the human genome. J. Mol. Biol. 326: 11271146.[CrossRef][Medline] Schwahn, U., Lenzner, S., Dong, J., Feil, S., Hinzmann, B., van Duijnhoven, G., Kirschner, R., Hemberger, M., Bergen, A., Rosenberg, T., et al. 1998. Positional cloning of the gene for X-linked retinitis pigmentosa 2. Nat. Genet. 19: 327332.[CrossRef][Medline] Smit, A. 1996. The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6: 743748.[CrossRef][Medline] Speek, M. 2001. Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Mol. Cell Biol. 6: 19731985.
Swergold, G. 1990. Identification, characterization, and cell specificity of a human LINE-1 promoter. Mol. Cell. Biol. 10: 67186729. Symer, D., Connelly, C., Szak, S., Caputo, E., Cost, G., Parmigiani, G., and Boeke J. 2002. Human L1 retrotransposition is associated with genetic instability in vivo. Cell 110: 327338.[CrossRef][Medline] van de Lagemaat, L., Landry, J., Mager, D., and Medstrand, P. 2003. Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 19: 530536.[CrossRef][Medline]
Wheelan, S., Church, D., and Ostell, J. 2001. Spidey: A tool for mRNA-to-genomic alignments. Genome Res. 11: 19521957.
Received January 12, 2005; accepted in revised format May 18, 2005. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||