|
|
|
|
Published online before print
December 6, 2006, 10.1101/gr.5542607 Genome Res. 17:33-41, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter Novel retrotransposon analysis reveals multiple mobility pathways dictated by hostsDepartment of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8501, Japan
Autonomous non-long-terminal-repeat retrotransposons (NLRs) proliferate by retrotransposition via coordinated reactions of target DNA cleavage and reverse transcription by a mechanism called target-primed reverse transcription (TPRT). Whereas this mechanism guarantees the covalent attachment of the NLR and its target site at the 3' junction, mechanisms for the joining at the 5' junction have been conjectural. To better understand the retrotransposition pathways, we analyzed targetNLR junctions of zebrafish NLRs with a new method of identifying genomic copies that reside within other transposons, termed "target analysis of nested transposons" (TANT). Application of the TANT method revealed various features of the zebrafish NLR integrants; for example, half of the integrants carry extra nucleotides at the 5' junction, which is in stark contrast to the major human NLR, LINE-1. Interestingly, in a cell culture assay, retrotransposition of the zebrafish NLR in heterologous human cells did not bear extra 5' nucleotides, indicating that the choice of the 5' joining pathway is affected by the host. Our results suggest that several pathways exist for NLR retrotransposition and argue in favor of host protein involvement. With genomic sequence information accumulating exponentially, our data demonstrate the general applicability of the TANT method for the analysis of a wide variety of retrotransposons.
Non-long-terminal-repeat retrotransposons (NLRs), including long interspersed nuclear elements (LINEs), comprise a substantial portion of many eukaryotic genomes (Arkhipova and Meselson 2000
Upon retrotransposition of the major mammalian NLRs, L1s, the target-site sequence of 820 base pairs (bp) is duplicated at each L1 end (i.e., target-site duplication, TSD) (Moran et al. 1996 In contrast to L1s, some NLRs are not associated with such obvious TSDs. Thus, reliable identification of the NLR-target junctions of their genomic copies remains difficult, although genomic sequence information has been accumulating in recent years. We considered that such difficulty could be overcome by collecting genomic NLR copies that reside within other transposons, because the preintegration sequence could be inferred from the consensus sequence of the host transposon. Hereafter, we refer to this collection strategy as target analysis of nested transposons, or TANT method.
The L2 clade of NLRs is represented by currently extinct LINE-2 (L2) in mammals, where the dominant active NLR clade is L1. The zebrafish genome harbors at least three active NLRs of the L2 clade: CR11_DR, CR12_DR, and CR13_DR, which are also called ZfL21, ZfL22, and ZfL23, respectively (Kapitonov and Jurka 2003 In this report, we used the TANT method to characterize genomic copies of these L2-clade NLRs. Bioinformatic analyses of their target-NLR junctions, in combination with the analysis of CR12_DR, experimentally retrotransposed in HeLa cells, revealed previously unrecognized consequences of their retrotransposition, and suggests the involvement of host functions in joining the NLR and target DNAs. Our data thus demonstrate the general applicability of the TANT method to the study of the mobility pathways of a wide variety of transposons.
Collection of human L1s within transposons To test the effectiveness of the TANT method, we first collected genomic copies of human L1 that reside within other transposons, because target-site junctions of this NLR have been well characterized. For 47 5'-truncated and 18 full-length genomic copies analyzed, L1 copies are predominantly associated with TSDs of 820 bp and share a MH stretch with their target sites at the 3' junctions (Table 1; Fig. 2A). At the 5' junction, many of 5'-truncated copies (66%) have MH stretches and only a few copies (9%) are associated with an insertion of nucleotides of unknown origin, whereas many full-length L1s (72%) contain 216 extra nucleotides (Table 1). The length distributions of TSDs and MH stretches (Fig. 2B,C,D), target preference to 5'-TTAAAA-3' (data not shown), and the discrete 5' differences between 5'-truncated and full-length L1s are well consistent with previous observations of L1 copies collected and chosen based on the presence of an obvious TSD (Szak et al. 2002
Target sites of CR1_DRs within transposons We next applied the TANT method to analyze CR1_DRs in the zebrafish genome sequence, because they are vertebrate non-L1 NLRs whose retrotransposition has been studied experimentally (Sugano et al. 2006
A very small fraction (1%6%) is blunt inserted, whereas a larger fraction (18%28%) is associated with target-site truncations (TSTs) (Table 2; Supplemental Fig. S1). The truncations range from 1 to 549 bp, as estimated by using the consensus sequences of the host transposons as a guide (Fig. 3D). Interestingly, they show a bimodal distribution, discriminating short and long TSTs ( 12 bp and 13 bp, respectively), suggesting that two different mechanisms underlie the target truncation upon retrotransposition (see Discussion). For all CR1_DRs, compilation of target sequences around the insertion sites do not indicate any strong nucleotide preference at any position, although a downstream sequence of several base pairs is somewhat AT-rich in targets of CR11_DR and CR12_DR (Supplemental Fig. S2). This very weak preference could be explained by either some degree of cleavage specificity of NLR-encoded ENs or selection for cleavage products that can anneal with the NLR RNAs to start TPRT (see below). In any event, any nucleotide is allowed at almost all positions, leading to our conclusion that all CR1_DRs have very little sequence specificity for their integration targets.
Features of the 3' junctions of CR1_DRs
On the other hand, we also found that some copies (14%24%) are associated with an insertion of nucleotides (191 bp) at the 3' junction (Fig. 3F; Table 2; Supplemental Fig. S1). Because we could not find the putative original NLR copies carrying the extra 3' nucleotides in the current database, these extra 3' nucleotides do not seem to be products of 3' transduction, a retrotransposition event where the new copy is accompanied by the 3'-flanking region of the original copy. Thus, the extra 3' nucleotides were presumably added after transcription of the NLR RNA. For one integrant, we found a genomic region (on a different chromosome) that is 87% identical to its extra 3' nucleotides (61 bp). For seven examples of extra 3' nucleotides (1522 bp), we found homologous EST sequences (with 93%100% identity). These extra nucleotides may have been generated by use of DNA or RNA templates. However, we could not find potential templates for seven of the 15 examples of the extra 3' nucleotides that are 15 bp or longer. It is therefore likely that, in general, the extra 3' nucleotides were generated by nontemplated DNA synthesis or by use of very short template regions. It has been reported that R2- and L1-encoded RTs can add nucleotides without a template or with a very short region(s) of a template before initiating the canonical RNA-templated TPRT (Luan and Eickbush 1995
Features of the 5' junctions of CR1_DRs Some of the integrants (29%45%) had MH at the 5' junctions (Fig. 3A; Table 2 Supplemental Fig. S1). The length distributions of these MHs differ significantly from that expected for two random sequences (Fig. 3G). Thus, the MHs may not have been generated by chance, but by a mechanism integral to certain retrotransposition pathways (see Discussion). In summary, CR1_DRs carry either extra nucleotides or MH at their 5' junctions.
Insertion of extra 3' nucleotides and truncation of target sites are inter-related
Copies of the dual-ORF CR1_DRs with long TSTs are biased toward insertion of extra 5' nucleotides Finally, we analyzed the relationship between the 5' junction and target-site alterations (Fig. 4C; Supplemental Table S2). We found some interrelation: Most long TST-associated integrants of CR11_DR and CR13_DR (82% and 77%, respectively) carry the extra 5' nucleotides. On the other hand, we did not find any interrelation for the CR12_DR integrants. It is worth considering that CR12_DR has a single ORF, whereas CR11_DR and CR13_DR have two ORFs (Fig. 1). Thus, although the exact mechanism(s) for generation of the extra 5' nucleotides is unknown at present, the ORF1 proteins may be involved in the pathways for joining the 5' junction during a type of retrotransposition where target sites suffer extensive truncation (see Discussion).
Retrotransposition of CR12_DR in HeLa cells does not create integrants with extra 5' nucleotides
The TANT method for genome-wide analysis of transposon integrants We developed the TANT method for large-scale analysis of boundaries of genomic NLR copies. This method takes advantage of the fact that we can mine genomic databases for NLR copies residing in other transposons and for consensus sequences of transposons; we can then use the information to determine the junction sites of nested NLRs. It is formally possible that secondary DNA rearrangements could have occurred at these junctions after retrotransposition, leading to misinterpretation. This possibility can, however, be minimized by selecting younger elements. Indeed, the statistics for the L1 elements collected and analyzed by the TANT method (Table 1) are very consistent with previous reports, thereby validating the method. Given that genomic sequence information for many kinds of higher eukaryotes has been expanding steadily, the TANT method will be generally applicable to investigation of many kinds of transposons, as we have shown here for L1 and CR1_DRs.
Retrotransposition of CR1_DRs predominantly generates a short TSD
Target-site truncations as the outcome of noncanonical TPRT reactions A fraction (18%28%) of the integrants we identified is associated with TSTs (Table 2). Careful analysis revealed bimodal distributions of the TST lengths (Fig. 3D). For short TSTs, it has been proposed that nicking of the second strand at several base pairs upstream of the first nicking site and subsequent use of that nick to prime the sense-strand synthesis results in loss of the segment between the two nicks (Gilbert et al. 2002
Implications for NLR-mediated DNA end joining
Pathways for joining the 5' junction are dictated by the host environments
The most surprising finding in this study is that half or more of the CR1_DRs integrants carry extra 5' nucleotides (Table 2). The nucleotides were likely added without a template during retrotransposition. The nucleotides could be synthesized from the 3' end of nascent NLR cDNA generated by incomplete reverse transcription, as proposed by Babushok et al. (2006)
Formally, the extra 5' nucleotides could be synthesized by either NLR RTs or host DNA polymerases. If they were synthesized by NLR RTs, the tendency of the 5'-junction features would not be altered by changing the host. Unlike copies in the zebrafish genome, however, no CR12_DR integrant that experimentally retrotransposed in human cells carried extra 5' nucleotides (Fig. 5). Rather, all integrants were associated with an MH stretch or direct joining, resembling the statistics of L1 retrotransposed in human. These results suggest that alternative pathways account for the integrants associated with MH at the 5' junction and those with extra 5' nucleotides, and that the pathway utilization is directed by the host. It may be possible that the endogenous L1 proteins affected CR12_DR retrotransposition pathways in HeLa cells, but is less likely because it does not explain why 5' features are different between full-length and 5'-truncated L1 insertions in human. Rather, host factors are likely involved in the processes for joining of the target and NLR DNAs at the 5' junction. Consistent with this idea, we have recently revealed that zebrafish L1-clade NLRs are approximately three times more frequently associated with extra 5' nucleotides than human L1s are (Ichiyanagi and Okada 2006
Collection of transposon-harboring NLR copies and identification of targetNLR junctions We downloaded RepeatMasker tables of interspersed repeats for human (hg17, May 2004) and zebrafish (danRer2, June 2004) genomes from the UCSC Genome Browser (Hinrichs et al. 2006 2 test), (5) the insertions and deletions of the host transposons in comparison with the consensus sequences are <10%, (6) both fragments of the host transposon are >50 bp, and (7) the ends of the host transposon and NLR sequences are located <200 bp apart at both junctions. For 5'-truncated L1s, we selected integrants where L1 copies showed 3.8% divergence from the consensus sequences. For full-length L1s, we selected integrants where L1 copies showed 5.1% divergence. When duplicated fragments were collected, we used and counted only one of them. Complete data sets are available in the Supplemental files (L1.txt, CR11_DR.txt, CR12_DR.txt, and CR13_DR.txt).
The genomic sequence files were downloaded from the UCSC Browser, and consensus sequences of NLRs and host transposons were obtained from RepBase (Jurka et al. 2005
Construction of the CR12_DR vector, retrotransposition in HeLa cells, and sequence analysis
We thank Dr. John V. Moran for providing the vector, pCEP4/L1.3mneoI/ColE1, and for critical reading of the manuscript. We also thank Drs. Daria Babushok, Marlene Belfort, Jose L. Garcia-Perez, Haig H. Kazazian Jr., Mitsuhiro Nakamura, and Jun Suzuki for helpful comments on the manuscript. This work was supported by a Grant-in-Aid to N.O. from the Ministry of Education, Culture, Sports, Science and Technology of Japan and by the 21st Century Center of Excellence (COE) program of the ministry.
1 Corresponding author.
E-mail nokada{at}bio.titech.ac.jp fax: 81-45-924-5835. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5542607
Arkhipova, I. and Meselson, M. 2000. Transposable elements in sexual and ancient asexual taxa. Proc. Natl. Acad. Sci. 97: 1447314477. Babushok, D.V., Ostertag, E.M., Courtney, C.E., Choi, J.M., and Kazazian Jr., H.H. 2006. L1 integration in a transgenic mouse model. Genome Res. 16: 240250. Christensen, S.M. and Eickbush, T.H. 2005. R2 target-primed reverse transcription: Ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Mol. Cell. Biol. 25: 66176628. Coros, C.J., Landthaler, M., Piazza, C.L., Beauregard, A., Esposito, D., Perutka, J., Lambowitz, A.M., and Belfort, M. 2005. Retrotransposition strategies of the Lactococcus lactis Ll.LtrB group II intron are dictated by host identity and cellular environment. Mol. Microbiol. 56: 509524.[CrossRef][Medline] Cost, G.J., Feng, Q., Jacquier, A., and Boeke, J.D. 2002. Human L1 element target-primed reverse transcription in vitro. EMBO J. 21: 58995910.[CrossRef][Medline] Edgell, M.H., Hardies, S.C., Loeb, D.D., Shehee, W.R., Padgett, R.W., Burton, F.H., Comer, M.B., Casavant, N.C., Funk, F.D., and Hutchison III, C.A. 1987. The L1 family in mice. Prog. Clin. Biol. Res. 251: 107129.[Medline] Feng, Q., Schumann, G., and Boeke, J.D. 1998. Retrotransposon R1Bm endonuclease cleaves the target sequence. Proc. Natl. Acad. Sci. 95: 20832088. Gasior, S.L., Wakeman, T.P., Xu, B., and Deininger, P.L. 2006. The human LINE-1 retrotransposon creates DNA double-strand breaks. J. Mol. Biol. 375: 13831393.[CrossRef] George, J.A., Burke, W.D., and Eickbush, T.H. 1996. Analysis of the 5' junctions of R2 insertions with the 28S gene: Implications for non-LTR retrotransposition. Genetics 142: 853863.[Abstract] Gilbert, N., Lutz-Prigge, S., and Moran, J.V. 2002. Genomic deletions created upon LINE-1 retrotransposition. Cell 110: 315325.[CrossRef][Medline] Gilbert, N., Lutz, S., Morrish, T.A., and Moran, J.V. 2005. Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol. Cell. Biol. 25: 77807795. Gottlich, B., Reichenberger, S., Feldmann, E., and Pfeiffer, P. 1998. Rejoining of DNA double-strand breaks in vitro by single-strand annealing. Eur. J. Biochem. 258: 387395.[Medline] Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., and Hsu, F., et al. 2006. The UCSC Genome Browser Database: Update 2006. Nucleic Acids Res. 34: D590D598. Hohjoh, H. and Singer, M.F. 1996. Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA. EMBO J. 15: 630639.[Medline] Ichiyanagi, K. and Okada, N. 2006. Genomic alterations upon integration of zebrafish L1 elements revealed by the TANT method. Gene 383: 108116.[CrossRef][Medline] Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110: 462467.[CrossRef][Medline] Kabotyanski, E.B., Gomelsky, L., Han, J.O., Stamato, T.D., and Roth, D.B. 1998. Double-strand break repair in Ku86- and XRCC4-deficient cells. Nucleic Acids Res. 26: 53335342. Kapitonov, V.V. and Jurka, J. 2003. The esterase and PHD domains in CR1-like non-LTR retrotransposons. Mol. Biol. Evol. 20: 3846. Kazazian Jr., H.H. 2004. Mobile elements: Drivers of genome evolution. Science 303: 16261632. Kolosha, V.O. and Martin, S.L. 1997. In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition. Proc. Natl. Acad. Sci. 94: 1015510160. Kulpa, D.A. and Moran, J.V. 2006. Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat. Struct. Mol. Biol. 13: 655660.[CrossRef][Medline] Lieber, M.R., Ma, Y., Pannicke, U., and Schwarz, K. 2003. Mechanism and regulation of human non-homologous DNA end-joining. Nat. Rev. Mol. Cell Biol. 4: 712720.[CrossRef][Medline] Lin, Y. and Waldman, A.S. 2001. Capture of DNA sequences at double-strand breaks in mammalian chromosomes. Genetics 158: 16651674. Luan, D.D. and Eickbush, T.H. 1995. RNA template requirements for target DNA-primed reverse transcription by the R2 retrotransposable element. Mol. Cell. Biol. 15: 38823891.[Abstract] Luan, D.D., Korman, M.H., Jakubczak, J.L., and Eickbush, T.H. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: A mechanism for non-LTR retrotransposition. Cell 72: 595605.[CrossRef][Medline] Malik, H.S., Burke, W.D., and Eickbush, T.H. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16: 793805.[Abstract] Martin, S.L. and Bushman, F.D. 2001. Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol. Cell. Biol. 21: 467475. Martin, S.L., Cruceanu, M., Branciforte, D., Wai-Lun Li, P., Kwok, S.C., Hodges, R.S., and Williams, M.C. 2005a. LINE-1 retrotransposition requires the nucleic acid chaperone activity of the ORF1 protein. J. Mol. Biol. 348: 549561.[CrossRef][Medline] Martin, S.L., Li, W.L., Furano, A.V., and Boissinot, S. 2005b. The structures of mouse and human L1 elements reflect their insertion mechanism. Cytogenet. Genome Res. 110: 223228.[CrossRef][Medline] Moran, J.V., Holmes, S.E., Naas, T.P., DeBerardinis, R.J., Boeke, J.D., and Kazazian Jr., H.H. 1996. High frequency retrotransposition in cultured mammalian cells. Cell 87: 917927.[CrossRef][Medline] Morrish, T.A., Gilbert, N., Myers, J.S., Vincent, B.J., Stamato, T.D., Taccioli, G.E., Batzer, M.A., and Moran, J.V. 2002. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 31: 159165.[CrossRef][Medline] Ostertag, E.M. and Kazazian Jr., H.H. 2001. Twin priming: A proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res. 11: 20592065. Roth, D.B., Porter, T.N., and Wilson, J.H. 1985. Mechanisms of nonhomologous recombination in mammalian cells. Mol. Cell. Biol. 5: 25992607. Roth, D.B., Chang, X.B., and Wilson, J.H. 1989. Comparison of filler DNA at immune, nonimmune, and oncogenic rearrangements suggests multiple mechanisms of formation. Mol. Cell. Biol. 9: 30493057. Saldanha, R., Chen, B., Wank, H., Matsuura, M., Edwards, J., and Lambowitz, A.M. 1999. RNA and protein catalysis in group II intron splicing and mobility reactions using purified components. Biochemistry 38: 90699083.[CrossRef][Medline] Shiloh, Y. and Kastan, M.B. 2001. ATM: Genome stability, neuronal development, and cancer cross paths. Adv. Cancer Res. 83: 209254.[Medline] Smith, J., Riballo, E., Kysela, B., Baldeyron, C., Manolis, K., Masson, C., Lieber, M.R., Papadopoulo, D., and Jeggo, P. 2003. Impact of DNA ligase IV on the fidelity of end joining in human cells. Nucleic Acids Res. 31: 21572167. Smith, D., Zhong, J., Matsuura, M., Lambowitz, A.M., and Belfort, M. 2005. Recruitment of host functions suggests a repair pathway for late steps in group II intron retrohoming. Genes & Dev. 19: 24772487. Sugano, T., Kajikawa, M., and Okada, N. 2006. Isolation and characterization of retrotransposition-competent LINEs from zebrafish. Gene 365: 7482.[CrossRef][Medline] Symer, D.E., Connelly, C., Szak, S.T., Caputo, E.M., Cost, G.J., Parmigiani, G., and Boeke, J.D. 2002. Human l1 retrotransposition is associated with genetic instability in vivo. Cell 110: 327338.[CrossRef][Medline] Szak, S.T., Pickeral, O.K., Makalowski, W., Boguski, M.S., Landsman, D., and Boeke, J.D. 2002. Molecular archeology of L1 insertions in the human genome. Genome Biol. 3: research0052. Zingler, N., Willhoeft, U., Brose, H.P., Schoder, V., Jahns, T., Hanschmann, K.M., Morrish, T.A., Lower, J., and Schumann, G.G. 2005. Analysis of 5' junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining. Genome Res. 15: 780789.
Received May 24, 2006; accepted in revised format October 3, 2006. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||