|
|
|
|
Published online before print
March 26, 2007, 10.1101/gr.6036807 Genome Res. 17:556-565, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Functionality or transcriptional noise? Evidence for selection within long noncoding RNAsMRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
Long transcripts that do not encode protein have only rarely been the subject of experimental scrutiny. Presumably, this is owing to the current lack of evidence of their functionality, thereby leaving an impression that, instead, they represent "transcriptional noise." Here, we describe an analysis of 3122 long and full-length, noncoding RNAs ("macroRNAs") from the mouse, and compare their sequences and their promoters with orthologous sequence from human and from rat. We considered three independent signatures of purifying selection related to substitutions, sequence insertions and deletions, and splicing. We find that the evolution of the set of noncoding RNAs is not consistent with neutralist explanations. Rather, our results indicate that purifying selection has acted on the macroRNAs promoters, primary sequence, and consensus splice site motifs. Promoters have experienced the greatest elimination of nucleotide substitutions, insertions, and deletions. The proportion of conserved sequence (4.1%5.5%) in these macroRNAs is comparable to the density of exons within protein-coding transcripts (5.2%). These macroRNAs, taken together, thus possess the imprint of purifying selection, thereby indicating their functionality. Our findings should now provide an incentive for the experimental investigation of these macroRNAs functions.
Whether it is 2.5% (Lunter et al. 2006
Evidence from both large-scale studies of extensive full-length mouse cDNA libraries (Okazaki et al. 2002
If long ncRNAs have preserved their functions over long time spans, then the imprint of purifying selection should be apparent within their sequences when sampled from diverse mammalian species. However, initial surveys have been discouraging and provide scant evidence of purifying selection (Wang et al. 2004
The issues that remain to be clarified are whether most long ncRNAs are biologically relevant and, if so, whether they have persisted because of the benefit accrued from their functions over long time intervals, such as since the last common ancestor of primates and rodents
In this study, we sought to investigate whether long ncRNAs exhibit signatures of purifying selection that would provide indications of their functionality. To provide evidence for selection requires reliable estimates of neutral evolution. As virtually all ancestral repeats (ARs), defined as transposable elements present in the last common ancestor of, for instance, mouse and human, appear to have evolved neutrally (Lunter et al. 2006
We took advantage of a well-defined large set of 3122 long putative ncRNAs of unknown function obtained from the FANTOM 2 and 3 Consortia (Okazaki et al. 2002 Our studies show that the set of macroRNAs appears to exhibit suppressed rates of nucleotide substitution, insertion, and deletion, relative to proximal ARs and general intergenic sequence. Suppressed rates were observed for transcript sequences, promoters, and splice-site dinucleotide motifs. We interpret these suppressed rates as indicative of recurrent events of purifying selection that acted within functional sequence. Neutralist explanations of suppressed rates, such as varying mutational rates due to CpG substitutions, transcription-coupled repair, or nucleotide composition, were not consistent with our findings. We thus conclude that many of the macroRNAs we considered are functional, and thus deserve more intensive investigation of their evolution and functions.
A validated set of macroRNAs We investigated the evolutionary properties of a set of 3122 apparent ncRNAs (average length 4.2 kb) from which known protein-coding genes had previously been discarded. These transcripts were identified from mouse cDNA libraries collected by the FANTOM Consortium (Okazaki et al. 2002 To exclude the possibility that evolutionary constraints we observed within these putative macroRNAs arise from overlaps with protein-coding exons not annotated by FANTOM2 or FANTOM3 or with regulatory intronic regions, we conservatively applied two additional filtering steps in order to create our own candidate noncoding set. We excluded macroRNAs that overlap with Ensembl-annotated protein-coding genes (including introns), and others exhibiting significant alignments with well-established protein-coding genes (see Methods). We believe that all remaining candidate macroRNAs are thus located within intergenic regions.
Suppressed substitution and transversion rates Our initial finding was that nucleotide substitutions have been fixed at a significantly reduced rate in macroRNAs compared to in neighboring ARs. The distributions of dRNA estimated between the mouse putative macroRNA sequences aligned to their rat or human orthologous sequence were both found to be significantly lower than those of dAR for these species pairs (P < 1015; two-sided Kolmogorov-Smirnov test) (see Fig. 1A,B). Median dRNA/dAR values for macroRNAs were 0.899 (mouserat) and 0.948 (mousehuman) (Table 1). For these species pairs, substitution rates on macroRNAs are thus suppressed by, approximately, 10% and 5%.
We considered whether these departures of dRNA/dAR from unity might be causally related to the known high rate of substitutions in CpG dinucleotides (Cooper and Youssoufian 1988 Although differential CpG content does not appear to explain the observed higher divergence within putatively neutral AR sequence compared to macroRNAs, we remained concerned that other AR-specific sequence features might underlie this difference. We therefore constructed a second, independent set of putative neutral sequence. For this, we considered all intergenic and nonrepetitive sequence, not overlapping with, but in the vicinity of, macroRNAs. To remove the majority of functional sequence, we discarded from this set regions that exhibit the signature of purifying selection upon indels (see Methods). Comparisons of macroRNAs with this second set of putative neutral sequence also demonstrated significant suppression of substitution and transversion rates within macroRNAs (Supplemental Fig. S1). We were also concerned that this signature of purifying selection might be associated less with macroRNAs, and more with cis-regulatory elements, unannotated alternative first exons, or other elements of protein-coding genes. To investigate this, we repeated these analyses, now including only those macroRNAs located at least a well-defined distance away from protein-coding genes. For all substitution and transversion rate analyses, macroRNA sequences located >60 kb (or >10 kb, or >30 kb) from Ensembl protein-coding genes were seen to exhibit evolutionary rate distributions similar to those of the complete candidate macroRNA set (Table 1; Supplemental Figs. S2 and S3). These results re-emphasize the suppression of substitution rates in macroRNAs and further suggest that protein-associated regulatory regions do not contribute the only signature of substitution rate suppression from our putative macroRNA data set. Although the vast majority of ARs appear to have evolved neutrally, it was possible that ARs harbored within macroRNAs might have been under greater constraint than neighboring ARs lying outside. However, we determined that rates of substitutions, or of transversions, within ARs inside and outside of macroRNAs were not significantly different at the 5% level (Supplemental Figs. S4 and S5). This held true for LINEs, LTRs, SINEs, or DNA transposons, whether considered together or in separate repeat classes. While substitution or transversion rates inside macroRNAs are reduced in general, such reductions thus do not appear to have occurred uniformly throughout each transcript.
Finally, we extended our pairwise sequence comparison and examined whether multispecies conserved sequences (MCSs) (Siepel et al. 2005
Suppressed rates of insertion/deletion (indel) mutations Indeed, we find that IPSs are strongly and significantly over-represented within macroRNAs compared with their density in intergenic sequence (1.78-fold increase, P < 104). In these analyses, we took care to account for relevant nucleotide composition (G+C) biases (see Methods). In order, once more, to exclude the possibility of protein-coding genes contributing to our findings, we also restricted both the macroRNAs and the intergenic space to regions at a minimum distance of 10 kb, 30 kb, and 60 kb away from the nearest Ensembl protein-coding genes. For these sets, the significant over-representations remained and, indeed, progressively increased in magnitude (1.95-fold, 2.15-fold, and 2.32-fold, respectively; all P < 104). We next wished to investigate whether the observed associations with purifying selection exhibited any G+C biases. Consequently, we returned to considering all intergenic sequence and separately analyzed the density of IPSs for 10 sequence classes each with approximately equal G+C content. These classes were designed to partition 10-kb windows, from the intergenic portion of the mouse genome, into 10 equally populated isochores (see Methods). Across all 10 G+C classes we found significant over-representations of IPSs within our macroRNA data set, ranging from a 1.33-fold increase for the highest G+C class, to a 1.87-fold increase for the most A+T-rich sequence (Fig. 2).
These results should not be taken to imply that A+T-rich transcripts contain more functional sequence than G+C-rich ones. IPSs, and all functional segments, are considerably more abundant within high G+C sequence (Lunter et al. 2006 75% of conserved functional sequence is expected to be found within IPS segments (Lunter et al. 2006
macroRNAs often possess conserved splice sites
The association of pre-mRNA splicing with these consensus dinucleotides need not imply function, because the splicing machinery might have been recruited inconsequentially to consensus sites within otherwise nonsensical transcripts. To assess the functional significance of the consensus splice sites, we investigated their conservation in orthologous human or rat sequence. Against this, we compared the level of conservation of proximal and intronic GT and AG dinucleotides that are not known to be splice-site signals. We observed that 40% and 65% of mouse macroRNA GT-AG splice sites are conserved in human and rat, respectively, significantly more than for intronic GT and AG dinucleotides not involved in splicing (30% and 58%, respectively; P = 9.5 x 105 and P = 2.0 x 104;
To determine whether spliced and unspliced macroRNAs exhibit different signatures of purifying selection, we split the set into 1208 multi-exon and 1914 single-exon macroRNAs. Both subsets are significantly enriched in IPS sequence, and to similar degrees (1.85-fold and 1.80-fold, respectively; both P < 104). Single-exon macroRNAs exhibit a greater suppression of substitution rates when compared to the corresponding human and rat counterparts (9% vs. 3% in single-exon vs. multi-exon macroRNAs for human; 13% vs. 8% for rat), as expected if macroRNA exons show a higher average conservation than their introns, as is the case for protein-coding transcripts.
Conservation within macroRNA promoters We tested first whether these putative core promoter sequences appeared to be evolutionarily conserved with respect to substitutions, by comparing the mouse promoter sequences with their orthologous human and rat counterparts. In both comparisons, the substitution rate within promoter sequences (dpro) was found to be significantly lower than dAR (P < 1015; two-sided Kolmogorov-Smirnov test) (Fig. 3A,B). To account for potential CpG effects, we next considered the rate of transversions. For both the mousehuman and mouse rat comparisons, we again observed transversion rate (tpro) distributions that are significantly different and below those of tAR (P < 1015; two-sided Kolmogorov-Smirnov test) (Fig. 3C,D).
We also observed a clear signature of purifying selection on indels within promoters. IPSs were strongly over-represented within promoters (2.70-fold increase; P < 104), with 7.0% of the core promoter regions being contained within IPSs (compared with an expected IPS density of 2.6% within all intergenic G+C-matched sequence). Similar over-representations were seen when analyzing each G+C class separately (2.09-fold to 4.37-fold enrichments; P < 0.014 for all classes; one-sided test), indicating that, just as for the transcripts themselves, promoter regions show evidence of purifying selection across the G+C spectrum. The density of IPSs within promoters did vary considerably with G+C content, with promoters in G+C-rich regions showing very high densities of IPSs (9.5%; 3.5% expected), whereas IPS enrichments within promoters of A+T-rich regions were more modest (4.4%; 2.1% expected) (Fig. 4).
Next, we identified within the promoter set (1) 450 TATA-driven promoters and (2) 448 CpG-associated promoters (including 28 promoters classified as both TATA-driven and CpG-associated). Putative TATA-boxes were identified, as previously (Ponjavic et al. 2006
We have provided evidence for the suppression of substitution and transversion rates, by between 3% and 40% (Table 1), within a large set of macroRNA transcripts and their promoters. The same sequences also have experienced fewer indel mutations and fewer splice site consensus dinucleotide changes than expected by our neutral models. We interpret these results as indicating that the macroRNAs we investigated are enriched in sequence that has been subject to purifying selection to conserve the functional integrity of three main aspects of a functional transcript: its primary sequence, its promoter sequence, and its pattern of splicing.
We considered, but then discounted, the possibility that these observations arise from decreased rates of mutation, as opposed to purifying selection, within these transcripts. First, we considered whether substitution, insertion, and deletion rates would be decreased because of preferential repair of sequence within macroRNAs ("transcription-coupled repair") (Svejstrup 2002
Second, we considered whether mutational biases, arising from single and dinucleotide (specifically, CpG) sequence composition, were associated with the suppression of substitution rates observed within macroRNAs. To account for the higher mutability of methylated CpG dinucleotides we considered transversions, rather than substitutions, and once more observed significantly suppressed rates in macroRNAs. Again, however, we note that even if CpG-associated mutations were to be, in general, higher in ARs than in macroRNAs, then this might indicate sustained functionality of macroRNAs, since CpG methylation is known to be incompatible with transcriptional activity (Ng and Bird 1999
We also ensured that we controlled for nucleotide composition biases and large-scale mutation rate variations in our analyses (see Methods) by only comparing macroRNAs against putatively neutral sequence in the vicinity of the macroRNA. Previous analyses, which had not found differences in conservation levels between noncoding RNA, and other, sequences (Wang et al. 2004
For these reasons, we believe that purifying selection, rather than mutational biases, underlie the observed suppression of substitution, transversion, and indel rates in macroRNA sequence. We do not mean to imply that our evidence necessarily indicates that all macroRNAs in our set have been subject to evolutionary constraint throughout the
While our filtering procedure ensures that no known gene or any of its close homologs has any overlap with our macroRNAs, unannotated short peptides might still have passed our coding filters. To consider whether protein-coding contaminants explain our results, we created a conservative secondary test set of macroRNAs (2303 transcripts), by excluding all those that show any overlap with GenScan-predicted gene transcripts (Burge and Karlin 1997 We do not mean to imply that the entire lengths of macroRNAs represent functional sequence, even after accounting for transcription run-through. In particular, those transposable elements present within macroRNAs, appear, on average, not to have been subject to selection. A general picture emerges of macroRNAs harboring a density of functional sequence (4.1%5.5%), similar to the density of coding exons within protein-coding genes (5.2%). This low amount of functional material may explain, in part, why these macroRNA transcripts were considered previously to be nonfunctional.
As observed previously (Carninci et al. 2005
If, as now appears likely, many macroRNAs have been subject to purifying selection, then what might be their functions? The greater constraint we observed within promoters than within transcript sequences is consistent with some, but not all, of the macroRNA transcripts possessing functions that are independent of their sequences. Transcription of such macroRNAs might induce a more open chromatin state that would be more amenable for the transcription of neighboring genes (Gribnau et al. 2000
Experimental data sources We used the stringent sets of putative ncRNAs from the mouse (Mus musculus) FANTOM2 (4280 transcripts) (Okazaki et al. 2002
To create a secondary test set to exclude the possibility of small peptide contaminants, we excluded those macroRNAs from the candidate set that overlap with the predicted transcripts of GenScan exons (Burge and Karlin 1997
Partition of intergenic space
Definition of macroRNA core promoters and further classification
Nucleotide substitution and transversion rates in noncoding genomic DNA
We estimated the nucleotide substitution rates between orthologous mousehuman and mouserat aligned sequences using baseml with the REV substitution model (Yang 1994
To test whether the substitution rate estimates are not biased by the higher mutability of CpG dinucleotides, we additionally determined the transversion rate of each noncoding segment, since CpG-associated substitutions are largely transitions (Ebersberger et al. 2002 All analyses were performed independently for the four different intergenic spaces defined for (1) l > 0, (2) l > 10 kb, (3) l > 30 kb, and (4) l > 60 kb. We further determined whether the substitution or transversion rates for ARs inside and outside of macroRNAs are different. We applied the same cross-species comparison procedure between aligned mousehuman and mouserat sequences, as described above, and required the minimum AR length within macroRNAs to be 100 bp. We analyzed each repeat class (LINE, SINE, LTR, and DNA transposon) individually, as well as pooled together.
In a next step, we created a second control of putatively neutral sequence that is independent from the neutral AR sequence defined above. We considered the intergenic sequence of Ensembl-annotated protein-coding genes from which we discarded mouse repetitive sequence (assembly mm5) obtained from Hinrichs et al. (2006)
Multispecies conserved sequences (MCSs) (Siepel et al. 2005
Indel-purifying selection in noncoding genomic DNA This analysis for the noncoding transcripts was performed independently for the four different intergenic spaces defined above (l > 0 to l > 60 kb), whereas for their core promoter sequences, it was carried out using the complete intergenic space (l > 0). To investigate the association of these noncoding segments with IPSs depending on G+C class, we performed the association study within each of the 10 G+C classes separately.
Genome-wide partition based on G+C-content
Genome-wise association procedure controlling for G+C-content biases
The basis for the procedure is a randomization test, which compares the intersection S
Determining splice-site consensus in orthologous mousehuman and mouserat introns
To test for conservation, we scanned along the intron locating the first 5'-GT and 3'-AG dinucleotides that did not overlap with the splice site and that could be aligned to human or rat sequence. We counted the number of times both putatively neutral GT and AG sites were conserved. These two counts were compared using a
Statistical methodology
We thank Andreas Heger and Caleb Webber for generously providing their toolsets, and members of the C.P.P. research group for advice and helpful discussions. We thank the UK Medical Research Council (MRC) for financial assistance. J.P. gratefully acknowledges a graduate Clarendon Award, Oxford Balliol College Domus Award, and a graduate scholarship by the Studienstiftung des deutschen Volkes. G.L. is a MRC Bioinformatics Research Fellow.
1 Corresponding authors. E-mail chris.ponting{at}dpag.ox.ac.uk; fax 44-1865-282651.
E-mail gerton.lunter{at}dpag.ox.ac.uk; fax 44-1865-282651. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6036807
Allen, J.E., Pertea, M., and Salzberg, S.L. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14: 142148. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403410.[CrossRef][Medline] Badger, J.H. and Olsen, G.J. 1999. CRITICA: Coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16: 512524.[Abstract] Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., et al. 2004. Global identification of human transcribed sequences with genome tiling arrays. Science 306: 22422246. Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., et al. 2006. Ensembl 2006. Nucleic Acids Res. 34: D556D561. Blake, W.J., Kaern, M., Cantor, C.R., and Collins, J.J. 2003. Noise in eukaryotic gene expression. Nature 422: 633637.[CrossRef][Medline] Brockdorff, N., Ashworth, A., Kay, G.F., McCabe, V.M., Norris, D.P., Cooper, P.J., Swift, S., and Rastan, S. 1992. The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71: 515526.[CrossRef][Medline] Bucher, P. 1990. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212: 563578.[CrossRef][Medline] Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 7894.[CrossRef][Medline] Carninci, P., Kasukawa, T., Katayama, S., Gough, J., Frith, M.C., Maeda, N., Oyama, R., Ravasi, T., Lenhard, B., Wells, C., et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 15591563. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C., et al. 2006. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38: 626635.[CrossRef][Medline] Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., et al. 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116: 499509.[CrossRef][Medline] Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., et al. 2005. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308: 11491154. Cooper, D.N. and Youssoufian, H. 1988. The CpG dinucleotide and human genetic disease. Hum. Genet. 78: 151155.[CrossRef][Medline] Cooper, S.J., Trinklein, N.D., Anton, E.D., Nguyen, L., and Myers, R.M. 2006. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 16: 110. Duret, L., Chureau, C., Samain, S., Weissenbach, J., and Avner, P. 2006. The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science 312: 16531655. Ebersberger, I., Metzler, D., Schwarz, C., and Paabo, S. 2002. Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70: 14901497.[CrossRef][Medline] Furuno, M., Kasukawa, T., Saito, R., Adachi, J., Suzuki, H., Baldarelli, R., Hayashizaki, Y., and Okazaki, Y. 2003. CDS annotation in full-length cDNA sequence. Genome Res. 13: 14781487. Furuno, M., Pang, K.C., Ninomiya, N., Fukuda, S., Frith, M.C., Bult, C., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., et al. 2006. Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet. 2: e37.[CrossRef][Medline] Gaffney, D.J. and Keightley, P.D. 2005. The scale of mutational variation in the murid genome. Genome Res. 15: 10861094. Gribnau, J., Diderich, K., Pruzina, S., Calzolari, R., and Fraser, P. 2000. Intergenic transcription and developmental remodeling of chromatin subdomains in the human beta-globin locus. Mol. Cell 5: 377386.[CrossRef][Medline] Hardison, R.C., Roskin, K.M., Yang, S., Diekhans, M., Kent, W.J., Weber, R., Elnitski, L., Li, J., OConnor, M., Kolbe, D., et al. 2003. Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res. 13: 1326. Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., et al. 2006. The UCSC Genome Browser database: Update 2006. Nucleic Acids Res. 34: D590D598. Huttenhofer, A., Schattner, P., and Polacek, N. 2005. Non-coding RNAs: Hope or hype? Trends Genet. 21: 289297.[CrossRef][Medline] Hyashizaki, Y. 2004. Mouse transcriptome: Neutral evolution of non-coding complementary DNAs. Nature 431: 757.[CrossRef][Medline] Ihaka, R. and Gentleman, R. 1996. R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5: 299314.[CrossRef] International Human Genome Sequencing Consortium 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931945.[CrossRef][Medline] Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P., and Gingeras, T.R. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916919. Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes. Science 313: 363367. Lenhard, B. and Wasserman, W.W. 2002. TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics 18: 11351136. Lunter, G., Ponting, C.P., and Hein, J. 2006. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol. 2: e5.[CrossRef][Medline] Maeda, N., Kasukawa, T., Oyama, R., Gough, J., Frith, M., Engstrom, P.G., Lenhard, B., Aturaliya, R.N., Batalov, S., Beisel, K.W., et al. 2006. Transcript annotation in FANTOM3: Mouse gene catalog based on physical cDNAs. PLoS Genet. 2: e62.[CrossRef][Medline] Mehler, M.F. and Mattick, J.S. 2006. Non-coding RNAs in the nervous system. J. Physiol. 575: 333341. Mendes Soares, L.M. and Valcarcel, J. 2006. The expanding transcriptome: The genome as the Book of Sand. EMBO J. 25: 923931.[CrossRef] Miller, D., Briggs, D., Snowden, H., Hamlington, J., Rollinson, S., Lilford, R., and Krawetz, S.A. 1999. A complex population of RNAs exists in human ejaculate spermatozoa: Implications for understanding molecular aspects of spermiogenesis. Gene 237: 385392.[CrossRef][Medline] Ng, H.H. and Bird, A. 1999. DNA methylation and chromatin modification. Curr. Opin. Genet. Dev. 9: 158163.[CrossRef][Medline] Numata, K., Kanai, A., Saito, R., Kondo, S., Adachi, J., Wilming, L.G., Hume, D.A., Hayashizaki, Y., and Tomita, M. 2003. Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Res. 13: 13011306. Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H., et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563573.[CrossRef][Medline] Oudejans, C.B., Westerman, B., Wouters, D., Gooyer, S., Leegwater, P.A., van Wijk, I.J., and Sleutels, F. 2001. Allelic IGF2R repression does not correlate with expression of antisense RNA in human extraembryonic tissues. Genomics 73: 331337.[CrossRef][Medline] Pang, K.C., Stephen, S., Engstrom, P.G., Tajul-Arifin, K., Chen, W., Wahlestedt, C., Lenhard, B., Hayashizaki, Y., and Mattick, J.S. 2005. RNAdbA comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 33: D125D130. Pang, K.C., Frith, M.C., and Mattick, J.S. 2006. Rapid evolution of noncoding RNAs: Lack of conservation does not mean lack of function. Trends Genet. 22: 15.[CrossRef][Medline] Ponjavic, J., Lenhard, B., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., and Sandelin, A. 2006. Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters. Genome Biol. 7: R78.[CrossRef][Medline] Ponting, C.P. and Lunter, G. 2006. Signatures of adaptive evolution within human non-coding sequence. Hum. Mol. Genet. (Suppl 2): R170R175. Ravasi, T., Suzuki, H., Pang, K.C., Katayama, S., Furuno, M., Okunishi, R., Fukuda, S., Ru, K., Frith, M.C., Gongora, M.M., et al. 2006. Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res. 16: 1119. Schmitt, S. and Paro, R. 2004. Gene regulation: A reason for reading nonsense. Nature 429: 510511.[CrossRef][Medline] Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. 2003. Humanmouse alignments with BLASTZ. Genome Res. 13: 103107. Sheth, N., Roca, X., Hastings, M.L., Roeder, T., Krainer, A.R., and Sachidanandam, R. 2006. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res. 34: 39553967. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15: 10341050. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||