|
|
|
|
Vol. 10, Issue 4, 411-415, April 2000
REPORT
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Human L1 retrotransposons can produce DNA transduction events in which unique DNA segments downstream of L1 elements are mobilized as part of aberrant retrotransposition events. That L1s are capable of carrying out such a reaction in tissue culture cells was elegantly demonstrated. Using bioinformatic approaches to analyze the structures of L1 element target site duplications and flanking sequence features, we provide evidence suggesting that ~15% of full-length L1 elements bear evidence of flanking DNA segment transduction. Extrapolating these findings to the 600,000 copies of L1 in the genome, we predict that the amount of DNA transduced by L1 represents ~1% of the genome, a fraction comparable with that occupied by exons.
| |
INTRODUCTION |
|---|
|
|
|---|
The LINE-1 (L1) retrotransposon family is estimated to contain
600,000 copies, accounting for at least 15 % of the human genomic DNA
(Smit 1996
). L1's second ORF (ORF2) encodes
endonuclease and reverse transcriptase activities (Mathias et al. 1991
;
Feng et al. 1996
), and is the most abundant ORF in the genome. As a
major source of reverse transcriptase, L1 is likely to be indirectly responsible for the spreading of other retrotranscripts, such as
Alu sequences and processed pseudogenes (Maestre et al. 1995
; Boeke and Stoye 1997
; Dhellin et al. 1997
; Jurka 1997
). A novel feature
of L1 propagation and function was described recently by Moran et al.
(1999)
, who showed that L1 can efficiently comobilize a 3'-
flanking segment of non-L1 DNA to new genomic locations in tissue
culture cells. This makes L1 a potential player in such genomic events
as exon shuffling and regulatory region combinatorics (Boeke and
Pickeral 1999
; Eickbush 1999
). We have studied 129 full-length L1
elements with high similarity to L1.2 (an active element).
Computational analysis shows that at least 10% of these L1s have an
associated putative 3'-transduced segment, and on this basis, the
total amount of DNA transduced by L1 can be extrapolated to represent
at least 1% of the human genome. This finding demonstrates that L1s
are often involved in shuffling genomic DNA, and are thus important
contributors to genome plasticity.
Several examples of naturally occurring 3'-transduction events were
identified previously as mutagenic L1 insertions, in which additional
(non-L1) sequences were incorporated downstream from each newly
transposed L1 (Miki et al. 1992
; Holmes et al. 1994
; McNaughton et al.
1997
). Transduction of 3'-flanking sequence by engineered L1
elements also readily occurs in HeLa cells, and can be driven by either
the cytomegalovirus promoter, or the native L1 promoter; notably,
transposition efficiency is higher when a strong polyadenylation signal
is introduced downstream from the L1 (Moran et al. 1999
). An important
open question remains
how efficiently does L1-driven 3'
transduction occur naturally in the human genome? Opportunities to
observe abnormal human phenotypes caused by 3'-transducing L1
insertions may be extremely limited because only a small fraction of
the human genome is currently attributed to genes and upstream
regulatory regions. In this study, we took advantage of the tremendous
sequence production by the Human Genome Project to computationally
estimate the extent of naturally occurring L1-driven 3'
transduction (Fig.1).
|
Full-length L1 elements are ~6000 bp long; the majority of L1s in
the human genome, however, are severely 5' truncated or rearranged,
including 5'-inverted and deleted-inverted forms (Hutchison et al.
1989
). Newly inserted L1 sequences are frequently flanked by short
direct repeats, which have been shown to represent target site
duplications (TSDs) created upon L1 integration (Kazazian et al. 1988
;
Holmes et al. 1994
; Moran et al. 1996
). With these sequence features in
mind, we designed a TSD-based strategy to look for potential
3'-transduced segments associated with full-length L1s (see
Methods). If a pair of TSDs is found immediately flanking the L1 and
its poly(A) tail, this represents a standard L1 insertion, with no
additional sequences transposed. In contrast, in cases of 3'
transduction the 3' TSD is found further downstream from the L1
(Miki et al. 1992
; Holmes et al. 1994
; McNaughton et al. 1997
).
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
We limited our present study to full-size L1s with high similarity
(>94% identity) to L1.2. Studying whole insertion events is
critical for the TSD-based algorithm for 3'-transduction detection, and this allows us to avoid introducing the extra ambiguity of the
precise 5' boundaries of each L1, which is a prominent factor when
truncated elements are analyzed. We studied 129 full-length L1
elements; of these, 16 were uninformative because of insufficient flanking sequence in the GenBank records. An additional 16 examples lacked TSDs >6 bp in length. Another 76 cases represented standard insertions. Finally, 21 qualified as 3' transduction candidates. These 21 elements could be divided into three classes on the basis of
sequence characteristics of the transduced DNA segment (Fig. 2; Table
1). Class 1 elements had downstream
segments 89-975-bp long, and contained a consensus polyadenylation
signal (AATAAA or ATTAAA) (Tabaska and Zhang 1999
) 10-35 bp upstream
from the poly(A) tail immediately preceding the 3' TSD. The 10 elements in this class are the most likely candidates for 3'
transduction. Class 2 elements had 3' segments 52-356-bp long, and
lacked a consensus polyadenylation signal. Class 3 segments are shorter than the other two classes (7-26 bp), and could represent aberrant poly(A) tails. Even though class 3 segments may well have been formed
by the same mechanism, further sequence analysis was not pursued
because the results would not be statistically significant.
|
|
We then searched for a possible origin of the 14 elements of classes 1 and 2 and their 3' transduced segments. Each of the potential transduced segments was masked by RepeatMasker (A. Smit and P. Green, unpubl.) to suppress highly repetitive matches, and the masked sequences were then used as queries in BLASTN and BLASTX similarity searches. A high-scoring matching segment elsewhere in the genome, if found immediately downstream from another L1 element, would represent a potential master element if no poly(A) tail followed the segment of interest, or a related 3' transduction event if another poly(A) tail with target site duplications were found.
Of the fourteen 3'-transduced segments, three were completely masked by RepeatMasker, and four were partially masked. Three of the four L1s with partially masked 3'-transduced segments represent a previous integration into an L1 3' UTR, and one represents integration into an Alu repeat. Interestingly, all of these L1 elements inserted in the same orientation as the target element (Fig. 3), which may be indicative of an L1 targeting preference. The TSD-based algorithm allows one to distinguish a transduction event with inclusion of repetitive sequence present at the master locus from a new L1 insertion into a pre-existing L1 3' UTR: In each case in Figure 3, a full-length, uninterrupted L1 is followed by the poly(A) tail and the sequence of interest (including other transposon sequence), followed by another poly(A) tail. All of this inserted material is flanked by TSDs. This sequence signature is consistent with a 3'-transduction event in which the master L1 that produced each insertion had inserted previously into another transposon (see Fig. 3 legend for details). These events also suggest a simple mechanism for the rapid evolution of L1 (and other retrotransposon) 3' ends, which are structurally quite diverse.
|
Several of the unique sequences present in 3'-transduced segments
produced significant nucleotide matches, whereas no protein matches
(representing potential coding exons) were found. Twelve of the
fourteen segments did, however, contain at least one ORF >50 bp
long: two were <80 bp long, four between 80 and 100 bp long, and six
were >100 bp long. Six of the fourteen segments associated with L1s
of classes 1 and 2 (identified in the text by the gi numbers of their
GenBank records
see Table 1) produced significant non-self nucleotide
matches in the human nonredundant database. In three cases, gi2076718,
2853183, and 3522964, no L1 was found upstream from the matches.
Sequence gi2275172 represents an example of 3' transduction we
expected to find as a positive control
this case is a previously
described L1-driven 3'-transduction event in the human dystrophin
gene (McNaughton et al. 1997
). The last two of the query segments,
gi2588627 and 3288437, shared a unique 31-bp sequence immediately
following L1. Further analysis of the DNA flanking each of these two
elements suggests that they are several transposition events removed
from the same master element, which acquired additional flanking
sequences in several steps (Fig. 4).
|
Of 113 informative L1 elements in the data set studied here, 97 (86%)
had TSDs consistent with either a standard insertion or a
3'-transduction event. Of these 97 L1s, 10% (class 1 elements) are
excellent candidates for 3' transduction, whereas 14%-22% have a
3'-transduced segment by more relaxed criteria (including classes 2 and 3). Importantly, known and new examples of related 3'-transduction events were identified. To estimate the total fraction of the human genome that can be accounted for by 3'
transduction, we assumed ~600,000 copies of L1 in the human genome
(Smit 1996
) and the average length of transduced segments 420 bp for
class 1 elements, 340 bp for classes 1 and 2 combined, and 231 bp for all 3 classes of transducing L1s. By use of these values, the total
amount of 3'-transduced sequence shuffling extrapolated to the
entire human genome is between 25.2 and 30.5 Mb, or ~1% of the
human genome, a fraction comparable with that occupied by exons.
The estimate of L1 numbers on which the above calculation is based, was extrapolated from the amount of repeat DNA found in various large DNA clones selected from the human genome for sequencing. There is some ambiguity in the number of L1s in the human genome, mainly due to the difficulty of recognizing old elements in the sequence. In addition, our current knowledge of highly heterochromatic and potentially difficult to clone regions is limited; these are likely to be enriched in transposon sequences. Balancing this, the L1s without TSDs may result from either 3'-transduction events of very large segments or from very short TSDs, thus the actual frequency of transduction events may be higher than our estimate. Finally, because L1s tend to localize to less gene-rich regions, the chances of carrying an exon may be decreased.
A final issue regarding the above estimate is the validity of extrapolating our numbers to the very large class of 5'-truncated L1 elements. We expect that they will have as high a number of associated transduced segments as full-length elements, if not higher. We carried out a pilot survey of 25 randomly selected 100-1000-bp long 5'-truncated L1 elements. These had characteristics similar to the full-length L1 elements in this study. The truncated elements analyzed were 85%-99% identical to L1.2 sequence, and of the 25 truncated L1 elements, 1 was uninformative (insufficient flanking sequence), 9 had no TSDs >7-bp long, and 15 had TSDs. Of the elements with TSDs, 11 were classified as standard insertion, and 4 represented a 3'-transduction signature. Of these four candidates, two were completely masked (L1 3' UTR sequence, in the same orientation as the L1 of interest). This preliminary survey supports our expectation of finding an equivalent or slightly higher fraction of 3' transduction in truncated elements as opposed to full-length L1s.
A possible expansion of this study would be the study of older L1 subfamilies. However, this would bring a higher ambiguity level to the signals seen at the DNA sequence level. Elements of the young subfamilies are more likely to have inserted relatively recently, and thus would have a higher chance of preserving intact TSDs. The older an insertion, the lower the likelihood of finding authentic TSDs, because of the accumulation of random mutations.
Thus, L1-driven transduction of flanking DNA is likely to be an important mechanism of genome evolution via increasing genome plasticity and facilitating new combinations of coding and regulatory sequences. Although most observed cases of 3' transduction represent relatively small gene segments, it is possible that entire genes are included occasionally into transduced segments. This would lead to the introduction of a processed (no introns) copy of a gene into a new genomic locus. Thus, 3' transduction is potentially a mechanism for outright gene duplication as well as gene scrambling. A 3' transduction might represent a relatively noninvasive mechanism by which an organism can test novel sequence combinations, because transposon-plus-flanking-sequence integration could be significantly less disruptive to genome organization than larger-scale genome rearrangements such as inversions or translocations.
| |
METHODS |
|---|
|
|
|---|
Data set
A total of 129 L1 elements, at least 6000-bp long, were collected as high-scoring BLAST matches to L1.2 (accession no. M80343), one of the currently active LINE-1 transposons. The search was performed against GenBank release 113.0 plus daily updates as of September 1, 1999. All of these elements are at least 94% identical to L1.2.
Target Site Duplication Determination
One hundred base pairs upstream of each L1 element studied were
compared with 3000 bp downstream of the same element, in search of
short direct repeats that are the putative TSDs. BLAST2SEQUENCES (Tatusova and Madden 1999
) and DOTTER (Sonnhammer and Durbin 1995
) computer programs were used to find matching substrings at least 6-bp
long. One mismatch was allowed for repeats >11-bp long. A pair of
short direct repeats was considered a TSD pair if the 5' TSD
adjoined the 5' end of the L1 element and the 3' TSD
immediately followed a poly(A) tail.
DNA Sequence Analysis
The BLASTN (Altschul et al. 1997
) program was used to search
human-specific nucleotide databases for sequences similar to the
putative transduced regions; BLASTX (Altschul et al. 1997
) program was
used to search for protein matches. We used the e-value cutoff of 0.1 in both BLASTN and BLASTX searches. REPEATMASKER (A. Smit and P. Green,
unpubl.) was used to mask and characterize the transduced segments.
Databases Used
The nonredundant (nr) protein database was used in BLASTX searches. In BLASTN searches, four human-specific databases were used, nr, EST, GSS, and HTGS.
| |
ACKNOWLEDGMENTS |
|---|
We thank John Moran and Greg Cost for helpful discussions. Our work was supported in part by NIH grant CA16519 to J.D.B.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL jboeke{at}jhmi.edu; FAX (410) 614-2987.
| |
REFERENCES |
|---|
|
|
|---|
the polyA connection.
Nat. Genet.
16:
6-7[CrossRef][Medline].Received December 14, 1999; accepted in revised form February 25, 2000.
This article has been cited by other articles:
![]() |
M. A. Cantrell, L. Scott, C. J. Brown, A. R. Martinez, and H. A. Wichman Loss of LINE-1 Activity in the Megabats Genetics, January 1, 2008; 178(1): 393 - 404. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A.J.M. van den Hurk, I. C. Meij, M. del Carmen Seleme, H. Kano, K. Nikopoulos, L. H. Hoefsloot, E. A. Sistermans, I. J. de Wijs, A. Mukhopadhyay, A. S. Plomp, et al. L1 retrotransposition can occur early in human embryonic development Hum. Mol. Genet., July 1, 2007; 16(13): 1587 - 1592. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Sen, C. T. Huang, K. Han, and M. A. Batzer Endonuclease-independent insertion provides an alternative pathway for L1 retrotransposition in the human genome Nucleic Acids Res., June 28, 2007; 35(11): 3741 - 3751. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Xing, H. Wang, V. P. Belancio, R. Cordaux, P. L. Deininger, and M. A. Batzer From the Cover: Eukaryotic Transposable Elements and Genome Evolution Special Feature: Emergence of primate genes by retrotransposon-mediated sequence transduction PNAS, November 21, 2006; 103(47): 17608 - 17613. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Buzdin, E. Kovalskaya-Alexandrova, E. Gogvadze, and E. Sverdlov At Least 50% of Human-Specific HERV-K (HML-2) Long Terminal Repeats Serve In Vivo as Active Promoters for Host Nonrepetitive DNA Transcription. J. Virol., November 1, 2006; 80(21): 10752 - 10762. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Wang, H. Zheng, C. Fan, J. Li, J. Shi, Z. Cai, G. Zhang, D. Liu, J. Zhang, S. Vang, et al. High Rate of Chimeric Gene Origination by Retroposition in Plant Genomes PLANT CELL, August 1, 2006; 18(8): 1791 - 1802. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Cantrell, M. M. Ederer, I. K. Erickson, V. J. Swier, R. J. Baker, and H. A. Wichman MysTR: an Endogenous Retrovirus Family in Mammals That Is Undergoing Recent Amplifications to Unprecedented Copy Numbers J. Virol., December 1, 2005; 79(23): 14698 - 14707. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Wang and E. F. Kirkness Short interspersed elements (SINEs) are a major source of canine genomic diversity Genome Res., December 1, 2005; 15(12): 1798 - 1808. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Han, S. K. Sen, J. Wang, P. A. Callinan, J. Lee, R. Cordaux, P. Liang, and M. A. Batzer Genomic rearrangements by LINE-1 insertion-mediated deletion in the human and chimpanzee lineages Nucleic Acids Res., July 20, 2005; 33(13): 4040 - 4052. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Anzai, M. Osanai, M. Hamada, and H. Fujiwara Functional roles of 3'-terminal structures of template RNA during in vivo retrotransposition of non-LTR retrotransposon, R1Bm Nucleic Acids Res., April 6, 2005; 33(6): 1993 - 2002. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Brunner, K. Fengler, M. Morgante, S. Tingey, and A. Rafalski Evolution of DNA Sequence Nonhomologies among Maize Inbreds PLANT CELL, February 1, 2005; 17(2): 343 - 360. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Lavie, E. Maldener, B. Brouha, E. U. Meese, and J. Mayer The human L1 promoter: Variable transcription initiation sites and a major impact of upstream flanking sequence on promoter activity Genome Res., November 1, 2004; 14(11): 2253 - 2260. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-N. Volff, H. Lehrach, R. Reinhardt, and D. Chourrout Retroelement Dynamics and a Novel Type of Chordate Retrovirus-like Element in the Miniature Genome of the Tunicate Oikopleura dioica Mol. Biol. Evol., November 1, 2004; 21(11): 2022 - 2033. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Pelissier, C. Bousquet-Antonelli, L. Lavie, and J.-M. Deragon Synthesis and processing of tRNA-related SINE transcripts in Arabidopsis thaliana Nucleic Acids Res., July 28, 2004; 32(13): 3957 - 3966. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Tuzun, J. A. Bailey, and E. E. Eichler Recent Segmental Duplications in the Working Draft Assembly of the Brown Norway Rat Genome Res., April 1, 2004; 14(4): 493 - 506. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. GILES, M. CAPUTI, and K. L. BEEMON Packaging and reverse transcription of snRNAs by retroviruses may generate pseudogenes RNA, February 1, 2004; 10(2): 299 - 307. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H. Farley, E. T. Luning Prak, and H. H. Kazazian Jr More active human L1 retrotransposons produce longer insertions Nucleic Acids Res., January 23, 2004; 32(2): 502 - 510. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Amadou, R. M. Younger, S. Sims, L. H. Matthews, J. Rogers, A. Kumanovics, A. Ziegler, S. Beck, and K. Fischer Lindahl Co-duplication of olfactory receptor and MHC class I genes in the mouse major histocompatibility complex Hum. Mol. Genet., November 15, 2003; 12(22): 3025 - 3040. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Biedler and Z. Tu Non-LTR Retrotransposons in the African Malaria Mosquito, Anopheles gambiae: Unprecedented Diversity and Evidence of Recent Activity Mol. Biol. Evol., November 1, 2003; 20(11): 1811 - 1825. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ejima and L. Yang Trans mobilization of genomic DNA as a mechanism for retrotransposon-mediated exon shuffling Hum. Mol. Genet., June 1, 2003; 12(11): 1321 - 1328. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. T. L. Prak, A. W. Dodson, E. A. Farkash, and H. H. Kazazian Jr. Tracking an embryonic L1 retrotransposition event PNAS, February 18, 2003; 100(4): 1832 - 1837. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Lal, M. J. Giroux, V. Brendel, C. E. Vallejos, and L. C. Hannah The Maize Genome Contains a Helitron Insertion PLANT CELL, February 1, 2003; 15(2): 381 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kaessmann, S. Zollner, A. Nekrutenko, and W.-H. Li Signatures of Domain Shuffling in the Human Genome Genome Res., November 1, 2002; 12(11): 1642 - 1650. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L. Deininger and M. A. Batzer Mammalian Retroelements Genome Res., October 1, 2002; 12(10): 1455 - 1465. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Holmes Transcendent Elements: Whole-Genome Transposon Screens and Open Evolutionary Questions Genome Res., August 1, 2002; 12(8): 1152 - 1155. [Full Text] [PDF] |
||||
![]() |
S. Chambeyron, A. Bucheton, and I. Busseau Tandem UAA Repeats at the 3'-End of the Transcript Are Essential for the Precise Initiation of Reverse Transcription of the I Factor in Drosophila melanogaster J. Biol. Chem., May 10, 2002; 277(20): 17877 - 17882. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Costas Characterization of the Intragenomic Spread of the Human Endogenous Retrovirus Family HERV-W Mol. Biol. Evol., April 1, 2002; 19(4): 526 - 533. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-P. Witte, Q. H. Le, T. Bureau, and A. Kumar Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes PNAS, November 20, 2001; 98(24): 13778 - 13783. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Boissinot, A. Entezam, and A. V. Furano Selection Against Deleterious LINE-1-Containing Loci in the Human Lineage Mol. Biol. Evol., June 1, 2001; 18(6): 926 - 935. [Abstract] [Full Text] |
||||
![]() |
F.-m. Sheen, S. T. Sherry, G. M. Risch, M. Robichaux, I. Nasidze, M. Stoneking, M. A. Batzer, and G. D. Swergold Reading between the LINEs: Human Genomic Variation Induced by LINE-1 Retrotransposition Genome Res., October 1, 2000; 10(10): 1496 - 1508. [Abstract] [Full Text] |
||||
![]() |
A. Pavlicek, J. Paces, D. Elleder, and J. Hejnar Processed Pseudogenes of Human Endogenous Retroviruses Generated by LINEs: Their Integration, Stability, and Distribution Genome Res., March 1, 2002; 12(3): 391 - 399. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovchinnikov, A. B. Troxel, and G. D. Swergold Genomic Characterization of Recent Human LINE-1 Insertions: Evidence Supporting Random Insertion Genome Res., December 1, 2001; 11(12): 2050 - 2058. [Abstract] [Full Text] [PDF] |
||||
| |||||||||