|
|
|
|
Genome Res. 17:965-968, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 Perspective Promoting transcriptome diversityThe J. Craig Venter Institute, Rockville, Maryland 20850, USA
Although the number of protein-encoding human genes is more limited than many had estimated, the human transcript repertoire is much more diverse than anticipated. In part, transcript diversity is generated through the use of alternative promoters and alternate splicing. In addition, based on discoveries using technologies such as full-length cDNA libraries and whole genome tiling microarrays, it is now likely that non-protein-encoding transcripts comprise a substantial fraction of the human RNA population. Much attention is currently focused on understanding the role of alternative promoters in generating transcript diversity, both for non-protein-encoding (ncRNAs) and protein-encoding RNAs.
Over the past decade, our concept of the human gene repertoire has changed dramatically such that the one-promoterone-geneone-transcriptone-protein concept no longer provides a realistic view of human (and other eukaryotic) genes (Carninci et al. 2005
A major research area for modern genomics is the role of alternative promoters in driving transcript production. While it has been clear for many years that some eukaryotic genes use multiple promoters that can drive expression in specific tissues and/or developmental stages, evidence that alternative promoters play a major role in transcript diversity is expanding rapidly (Carninci 2006
To study these and other questions in a comprehensive manner, current research has focused on the identification and characterization of the sequence structure of alternative promoters, and the delineation of transcription factors that interact with those promoters. Through these studies it is anticipated that a better understanding of the rules governing the structure of use of promoters in non-protein-encoding genes will emerge. Toward that end, full-length cDNA sequencing, especially from human and mouse, has provided data sets that give insight not only into alternative splice forms, but also the utilization of alternative promoters (Okazaki et al. 2002
In their initial study of human promoters, Kimura et al. (2006)
With the increasing availability of genomic sequences across a diverse set of organisms, an attractive opportunity of assessing potential promoter sequence derives from comparative genomics. A very nice example of this approach is the assembly and evolutionary comparison of conserved alternative promoters based on the mouse and human full-length cDNAs (Baek et al. 2007 For example, upstream promoters are more likely to be CpG-rich and associated with higher expression levels. Moreover, promoters that are in larger clusters (more alternative promoters) are more highly conserved, suggesting the need for conservation based on competitive signals. Interestingly, the presence of TATA-boxes is much reduced in alternative promoters and specific hexamers are more frequent, possibly representing, for example, the use of tissue and developmental stage-specific transcription activators. On the other hand, genes associated with widely expressed housekeeping functions generally have single promoters that are CpG-rich and more relaxed sequence conservation, as well as frequent use of TATA-boxes. Based on their studies, these investigators have developed discriminators that will be useful in predicting genes with alternative or single promoters, thereby enabling discovery of such putative elements lacking other experimental support.
The study of Tsuritani et al. (2007)
Overall, comparative sequence and functional analysis represents one of the exciting new opportunities to define the coding potential and regulation of mammalian genomes. Extending these types of analyses to an even wider diversity of organisms, including those that are very anciently related to humans (Venkatesh et al. 2006
Complementary to the studies described above will be the integration of these findings with additional features that contribute to promoter function including epigenetic modification as well as the characterization of the regulatory proteins associated with putative promoter regions. Toward that end, several early studies have been reported that build on genome-wide approaches.
Using genomic tiling arrays comprising oligonucleotides spaced approximately every 35 nucleotides along the entire nonrepetitive region of chromosomes 21 and 22 (Kapranov et al. 2002
Studies that extend these analyses of transcription-factor-binding sites to the entire human genome highlight the challenges and opportunities that exist (Yang et al. 2006
Indeed, there already is evidence of differential methylation of promoters based on specificity of tissue expression as well as differences among promoters by type (Cheong et al. 2006
Highly complementary to the studies described above are computational analyses that are already driving the discovery process for regulatory elements. For example, a comparative genomics approach involving a systematic analysis of the human and mouse genomes, together with those of the rat and dog, revealed 174 potential transcription factor binding sites, more than half of which are newly discovered (Xie et al. 2005
For many years, alternative transcript forms were very much below the radar screen of human genomics. Although there was awareness of interesting examples of alternative splicing events with functional consequences, much of the genomic era has still focused on the concept of one meaningful transcript per gene. For example, gene expression arrays based on oligonucleotides or cDNAs were designed to measure transcript production of a gene in a manner that captures the total signal of the transcript population of a gene based on the hybridization of 3'-end sequences. Therefore, the potential richness of those transcripts, including tissue-specific differences in expression that might encompass qualitative as well as quantitative differences, was undetectable. Moreover, the reliance on RefSeq genes and/or UniGene clusters (http://www.ncbi.nlm.nih.gov/RefSeq) meant that such platforms were essentially closed based on known genes and transcripts.
The underpinnings of transcript discovery have been open platforms that allow for observations that are not influenced by preconceived notions. This has been especially true for sequence-based approaches based on full-length cDNAs (Okazaki et al. 2002
As sequencing technology becomes higher throughput and less costly, the pace of discovery will increase dramatically. For example, pyrosequencing technology has recently been used to study the transcript population in human and plant cells (Bainbridge et al. 2006
Key to overall progress will be the development of community tools including databases that will enable precise cataloging of transcripts, their promoters, and regulatory protein complexes. The integration of these data sets (The Encode Project Consortium 2004
1 Corresponding author.
E-mail rls{at}venterinstitute.org; fax (240) 268-4000. Article is online at http://www.genome.org/cgi/doi/10.1101/gr.6499807
Baek, D., Davis, C., Ewing, B., Gordon, D., and Green, P. 2007. Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters. Genome Res. 17: 145155. Bainbridge, M.N., Warren, R.L., Hirst, M., Romanuik, T., Zeng, T., Go, A., Delaney, A., Griffith, M., Hickenbotham, M., Magrini, V., et al. 2006. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics 7: 246. doi: 10.1186/1471-2164-7-246.[CrossRef][Medline] Carninci, P. 2006. Tagging mammalian transcription complexity. Trends Genet. 22: 501510.[CrossRef][Medline] Carninci, P., Kasukawa, T., Katayama, S., Gough, J., Frith, M.C., Maeda, N., Oyama, R., Ravasi, T., Lenhard, B., Wells, C., et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 15591563. Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., et al. 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116: 499509.[CrossRef][Medline] Cheong, J., Yamada, Y., Yamashita, R., Irie, T., Kanai, A., Wakaguri, H., Nakai, K., Ito, T., Saito, I., Sugano, S., et al. 2006. Diverse DNA methylation statuses at alternative promoters of human genes in various tissues. DNA Res. 13: 155167. Cheung, F., Haas, B.J., Goldberg, S.M., May, G.D., Xiao, Y., and Town, C.D. 2006. Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics 7: 272. doi: 10.1186/1471-2164-7-272.[CrossRef][Medline] Cooper, S.J., Trinklein, N.D., Nguyen, L., and Myers, R.M. 2007. Serum response factor binding sites differ in three human cell types. Genome Res. 17: 136144. The Encode Project Consortium. 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636640. The Encode Project Consortium. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799816.[CrossRef][Medline] Frith, M.C., Bailey, T.L., Kasukawa, T., Mignone, F., Kummerfeld, S.K., Madera, M., Sunkara, S., Furuno, M., Bult, C.J., Quackenbush, J., et al. 2006. Discrimination of non-protein-coding transcripts from protein-coding mRNA. RNA Biol. 3: 4048.[Medline] Furuno, M., Pang, K.C., Ninomiya, N., Fukuda, S., Frith, M.C., Bult, C., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., et al. 2006. Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet. 2: e37.[CrossRef][Medline] Gowda, M., Li, H., Alessi, J., Chen, F., Pratt, R., and Wang, G.L. 2006. Robust analysis of 5'-transcript ends (5'-RATE): A novel technique for transcriptome analysis and genome annotation. Nucleic Acids Res. 34: e126. Guigó, R., Flicek, P., Abril, J.F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V.B., Birney, E., et al. 2006. EGASP: The human ENCODE Genome Annotation Assessment Project. Genome Biol. 7: S2.1S2.31.[CrossRef] Hatada, I., Fukasawa, M., Kimura, M., Morita, S., Yamada, K., Yoshikawa, T., Yamanaka, S., Endo, C., Sakurada, A., Sato, M., et al. 2006. Genome-wide profiling of promoter methylation in human. Oncogene 25: 30593064.[CrossRef][Medline] Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A., et al. 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39: 311318.[CrossRef][Medline] Imanishi, T., Itoh, T., Suzuki, Y., ODonovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M., et al. 2004. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2: e162.[CrossRef][Medline] Iseli, C., Stevenson, B.J., de Souza, S.J., Samaia, H.B., Camargo, A.A., Buetow, K.H., Strausberg, R.L., Simpson, A.J., Bucher, P., and Jongeneel, C.V. 2002. Long-range heterogeneity at the 3' ends of human mRNAs. Genome Res. 12: 10681074. Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P., and Gingeras, T.R. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916919. Kawaji, H., Frith, M.C., Katayama, S., Sandelin, A., Kai, C., Kawai, J., Carninci, P., and Hayashizaki, Y. 2006. Dynamic usage of transcription start sites within core promoters. Genome Biol. 7: R118.[CrossRef][Medline] Kimura, K., Wakamatsu, A., Suzuki, Y., Ota, T., Nishikawa, T., Yamashita, R., Yamamoto, J., Sekine, M., Tsuritani, K., Wakaguri, H., et al. 2006. Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes. Genome Res. 16: 5565. Kodzius, R., Kojima, M., Nishiyori, H., Nakamura, M., Fukuda, S., Tagami, M., Sasaki, D., Imamura, K., Kai, C., Harbers, M., et al. 2006. CAGE: Cap Analysis of Gene Expression. Nat. Methods 3: 211222.[CrossRef][Medline] Mattick, J.S. and Makunin, I.V. 2006. Non-coding RNA. Hum. Mol. Genet. 15: 1729.[CrossRef] Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H., et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563573.[CrossRef][Medline] Ota, T., Suzuki, Y., Nishikawa, T., Otsuki, T., Sugiyama, T., Irie, R., Wakamatsu, A., Hayashi, K., Sato, H., Nagai, K., et al. 2004. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat. Genet. 36: 4045.[CrossRef][Medline] Pennisi, E. 2005. Why do humans have so few genes? Science 309: 80. Peters, B.A. and Velculescu, V.E. 2005. Transcriptome PETs: A genomes best friends. Nat. Methods 2: 9394.[CrossRef][Medline] Strausberg, R.L., Feingold, E.A., Grouse, L.H., Derge, J.G., Klausner, R.D., Collins, F.S., Wagner, L., Shenmen, C.M., Schuler, G.D., Altschul, S.F., et al. 2002. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci. 99: 1689916903. Strausberg, R.L., Simpson, A.J., and Wooster, R. 2003. Sequence-based cancer genomics: Progress, lessons and opportunities. Nat. Rev. Genet. 4: 409418.[Medline] Takeda, J., Suzuki, Y., Nakao, M., Barrero, R.A., Koyanagi, K.O., Jin, L., Motono, C., Hata, H., Isogai, T., Nagai, K., et al. 2006. Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56,419 completely sequenced and manually annotated full-length cDNAs. Nucleic Acids Res. 34: 39173928. Thomas, D.J., Rosenbloom, K.R., Clawson, H., Hinrichs, A.S., Trumbower, H., Raney, B.J., Karolchik, D., Barber, G.P., Harte, R.A., Hillman-Jackson, J., et al. 2007. The ENCODE Project at UC Santa Cruz. Nucleic Acids Res. 35: D663D667. Tsuritani, K., Irie, T., Yamashita, R., Sakakibara, Y., Wakaguri, H., Kanai, A., Mizushima-Sugano, J., Sugano, S., Nakai, K., and Suzuki, Y. 2007. Distinct class of putative human-specific promoters: Comparative studies of alternative promoters of human and mouse genes. Genome Res. (this issue) doi: 10.1101gr.6030107. Venkatesh, B., Kirkness, E.F., Loh, Y.H., Halpern, A.L., Lee, A.P., Johnson, J., Dandona, N., Viswanathan, L.D., Tay, A., Venter, J.C., et al. 2006. Ancient noncoding elements conserved in the human genome. Science 314: 1892. Willingham, A.T. and Gingeras, T.R. 2006. TUF love for "junk" DNA. Cell 125: 12151220.[CrossRef][Medline] Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. 2005. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434: 338345.[CrossRef][Medline] Yang, A., Zhu, Z., Kapranov, P., McKeon, F., Church, G.M., Gingeras, T.R., and Struhl, K. 2006. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell 24: 593602.[CrossRef][Medline]
Related Articles
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||