|
|
|
|
Published online before print
August 21, 2007, 10.1101/gr.6824707 Genome Res. 17:1478-1485, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter Identification and analysis of internal promoters in Caenorhabditis elegans operons1 Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada; 2 Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada; 3 Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada; 4 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
The current Caenorhabditis elegans genomic annotation has many genes organized in operons. Using directionally stitched promoter::GFP methodology, we have conducted the largest survey to date on the regulatory regions of annotated C. elegans operons and identified 65, over 25% of those studied, with internal promoters. We have termed these operons "hybrid operons." GFP expression patterns driven from internal promoters differ in tissue specificity from expression of operon promoters, and serial analysis of gene expression data reveals that there is a lack of expression correlation between genes in many hybrid operons. The average length of intergenic regions with putative promoter activity in hybrid operons is larger than previous estimates for operons as a whole. Genes with internal promoters are more commonly involved in gene duplications and have a significantly lower incidence of alternative splicing than genes without internal promoters, although we have observed almost all trans-splicing patterns in these two distinct groups. Finally, internal promoter constructs are able to rescue lethal knockout phenotypes, demonstrating their necessity in gene regulation and survival. Our work suggests that hybrid operons are common in the C. elegans genome and that internal promoters influence not only gene organization and expression but also operon evolution.
Genes that participate in a particular biological process in bacteria and archaea are often organized in an operon for coordinated regulation of gene expression (Jacob and Monod 1961 25% of these pre-mRNAs are predicted to be within operons (Zorio et al. 1994
The trans-splicing of a small RNA molecule, the spliced leader (SL), to the 5'-end of a pre-mRNA was first discovered in trypanosomes (Murphy et al. 1986
It has been established as a general principle that gene duplication plays an important role in the evolution of genetic variation and new protein functionalities (Ohno 1970
Through the analysis of microarray expression data, Lercher et al. (2003)
Green fluorescent protein (GFP) reporter gene methodology was first introduced in C. elegans in 1994 (Chalfie et al. 1994
Operons with internal promoters for downstream genes, which we have termed "hybrid operons," have been identified and characterized in many bacterial genomes (Horowitz and Platt 1983
Identification of putative internal promoters in hybrid operons We discovered internal promoters by searching for promoter activity located in intergenic regions of WormBase annotated operons. From a set of 2448 directionally stitched promoter::GFP constructs, we have identified 979 putative gene-specific promoters able to drive reporter GFP expression (McKay et al. 2003 As an example, Figure 1A shows details of the klp-8 operon (CEOPX040). We have constructed promoter::GFP fusions for both the leading gene C15C7.2 and the downstream gene C15C7.1 of the klp-8 operon, and each is able to drive expression of the reporter gene in different tissues (Fig. 1B), indicating the presence of an internal promoter. The same is true for the CEOP3332 operon (Fig. 1C,D). The expression patterns driven by other promoter pairs for which we have data on both leading genes and downstream genes in the same operons are shown in Supplemental Figure 1.
To further establish that internal promoters in our study are active in vivo outside the context of a reporter assay, we also looked for examples in which the downstream gene could rescue a lethal mutant phenotype in the absence of promoter sequence from the upstream gene. It has been reported multiple times in the literature that unc-37 mutants can be rescued without the promoter of the upstream gene (Pflugrad et al. 1997 Finally, to confirm the observed negative GFP expression, a deletion analysis was conducted in order to capture the largest possible upstream regulatory sequences for downstream genes. For 12 operons selected in the test set described in Methods, we designed constructs that include the entire upstream genes plus the entire intergenic regions, but exclude the promoters for the upstream genes, which were deleted. The control set contains the sequences from the test set plus the promoter sequences from the leading genes. Our results showed that the constructs from the control set still drove positive GFP expression, while constructs from 11 out of the 12 downstream genes in the test set were unable to drive GFP expression, indicating that the majority of the negative downstream genes in our original experiment lack internal promoters.
Properties of putative internal promoters in hybrid operons
Of 172 operon downstream genes without internal promoters, 20.4% use a mixture of SL1/SL2 and 4% use SL1 only for trans-splicing. Spliced leaders for 10.5% of them have not been detected (Table 1). It is expected that more spliced leaders will be found among downstream genes when more EST sequences or SL data are obtained (Hwang et al. 2004
In order to determine the relationship between gene duplication and the presence of internal promoters, we compared our data set against the duplicated gene pairs identified by Lynch and Conery (2000)
In order to determine the influence of operon structure on the development of alternative splicing variants, we examined the profiles of alternative splicing of operon downstream genes and of genes in the entire C. elegans genome. Surprisingly, although some have alternative 5' starting sites or 3' polyadenylation sites, none of the 66 downstream genes with internal promoters have alternatively spliced variants, which is significantly less than the genomic average of 10% (P < 0.025, two-tailed
Intergenic spacing in putative hybrid operons The average length of intergenic regions with promoter activity is 596 bp (median 377 bp), while that of the intergenic regions without promoter activity is shorter, around 428 bp with a median of 207 bp (P < 0.05, two-tailed t-test; Table 1). 53% of the intergenic regions with promoter activities fall within the 0 400-bp range, which agrees with the overall distribution of gene spacing within operons described by Zorio et al. (1994)
Gene expression analysis
Table 2 shows gene expression correlation across different categories. In general, genes in operons display coexpression when evaluated against comparisons between non-operon genes (P < 0.001, two-tailed t-test), which agrees with the results of Lercher et al. (2003)
In order to examine in more detail the impact of internal promoters on gene expression, we analyzed the expression profile for all hybrid operons. Our results showed that gene members in seven hybrid operons were strongly coexpressed (r 0.5), while those in the remaining 58 hybrid operons lacked any strong expression correlation. To dissect the reasons for the lack of gene expression correlation in hybrid operons, we also examined differences in absolute expression level in various tissues using the Audic-Claverie algorithm (Audic and Claverie 1997
In some cases, issues such as differences in mRNA half-life, lack of SAGE tags for some genes, and ambiguous SAGE tags for others could affect the gene expression analysis and may contribute to a lack of expression correlation. We found that some ambiguous SAGE tags resulted from the prediction of full-length transcripts based on cDNAs covering more than one gene in an operon. One example is the mai-1 gene and the gpd-2 gene in the mai-1 operon. According to WormBase annotations, cDNA yk1412f04.5 covers both mai-1 and gpd-2 and causes the gpd-2 transcript to be annotated as completely overlapping the mai-1 transcript. According to the works of Spieth et al. (1993)
Our study, the largest analysis of operon promoter activity to date, has shown that 66 out of 238 GFP fusions constructed from operon downstream genes are able to autonomously drive GFP expression in vivo. This indicates the presence of internal promoters in more than one-quarter of the operon downstream genes studied. Both GFP expression patterns and SAGE data demonstrate a complex gene expression regulation system in the hybrid operons of C. elegans, where internal promoters are able to drive tissue-specific expression in a pattern distinct from the operon promoter. The functional activity of internal promoters is further demonstrated by the ability of an internal promoter construct to sufficiently recapitulate normal expression to the extent that a lethal phenotype can be rescued. In addition, the GFP promoter assay is able to distinguish two classes of downstream operon genes that differ significantly in their intergenic distance, frequency of paralogs, and the rate at which they undergo alternative splicing, providing further evidence that this assay reflects bona fide promoter activity.
Gene duplication
Alternative splicing variants
SL1 and SL2
Do patterns of trans-splicing in downstream genes signal the presence or absence of internal promoters? Our analysis shows that nearly half of the operon downstream genes with internal promoters studied here only have evidence for trans-splicing to SL2. We might expect that proportion to be reduced when more SL1 and SL2 data are obtained (Hwang et al. 2004 Interestingly, in the C. elegans genomic annotation (WS135) analyzed, >50 genes are indicated to be trans-spliced to either a mixture of SL1 and SL2 or exclusively SL2 despite having no evidence that they are downstream participants in operons (data not shown). This may suggest that there exists an alternative mechanism for SL2 trans-splicing. Conversely, this may indicate the presence of undiscovered genes in these regions that would function as the initiating gene in an undefined operon. Since the C. elegans genomic annotation is relatively mature, it is appealing to speculate that these undiscovered genes could be non-protein coding.
Approximately 20% of all C. elegans operons were subjected to GFP analysis in this study. Although our data set does not cover every internal promoter in the annotated operons of the C. elegans genome, it greatly enhances our knowledge and understanding of C. elegans promoters, especially as operon downstream genes were specifically excluded in previous large-scale promoter studies in C. elegans (McKay et al. 2003
Discovery of operons with putative internal promoters in C. elegans All described gene and operon annotations and genomic sequences of C. elegans were extracted from WormBase release WS135 (ftp.sanger.ac.uk).
The directionally stitched promoter::GFP fusions were constructed according to McKay et al. (2003)
Gene-specific promoter::GFP constructs with positive reporter expression were identified. Only those genes residing downstream in WormBase annotated operons were used for further analysis. We then mapped the coordinates of primer sets used in GFP fusions through the e-PCR program (Schuler 1997 Intergenic distance within an operon was defined as the region between the end of the poly(A) tail of the upstream gene and the beginning of the trans-spliced site of the corresponding downstream gene.
Paralogous genes and annotated spliced leaders in the C. elegans genome
C. elegans genes with alternative splicing variants
Gene expression analysis
SAGE tags were mapped to genes using the full-length transcripts predicted from WormBase release WS110 based on a previously described method (Pleasance et al. 2003
The Pearson correlation coefficient was used for gene expression correlation analysis, while the Audic-Claverie algorithm (Audic and Claverie 1997
Transgenic complementation of let-721 (C05D11.12)
Deletion analysis To address some of these concerns, we conducted the following experiment. From the promoter::GFP results, we selected a set of 12 operons, where the promoters of upstream genes presented positive gene expression patterns while the first corresponding downstream genes gave negative results. In order to evaluate gene expression modulatory effects produced via the intra-operonic region between the two genes, we designed the left primer right after the start codon of the upstream gene with the upstream operon promoter deleted, while still using the same right primer used for the negative downstream gene. Two control upstream genes (K12H4.5 in the CEOP3476 operon and Y71F9B.3 in the CEOP1056 operon) from the above operon set were tested again, where the left primers were the same as those used in the positive upstream genes and the right primers were the same as those used for the corresponding negative downstream genes.
We thank Thomas Blumenthal for critical review and comments on the manuscript. We thank Richard Durbin, Anthony Rogers, and Daniel Lawson of WormBase at the Sanger Institute for information and consultation regarding operon annotations and full-length transcript prediction. We also thank Courtney Mills for providing the analytical Web sites for both SAGE and GFP experiments. E.P. is supported by the Canadian Institutes of Health Research and the Michael Smith Foundation for Health Research (MSFHR). A.M. is supported by a NSERC graduate scholarship. D.L.B. holds a Canada Research Chair. The work was supported in part by a CIHR grant to D.L.B. S.J. and M.M. are Scholars of the MSFHR. This work was primarily funded by Genome Canada.
5 Corresponding author.
E-mail sjones{at}bcgsc.ca; fax (604) 876-3561. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6824707
Andrews, J., Smith, M., Merakovsky, J., Coulson, M., Hannan, F., and Kelly, L.E. 1996. The stoned locus of Drosophila melanogaster produces a dicistronic transcript and encodes two distinct polypeptides. Genetics 143: 1699–1711.[Abstract] Audic, S. and Claverie, J.M. 1997. The significance of digital gene expression profiles. Genome Res. 7: 986–995. Blumenthal, T. 1995. Trans-splicing and polycistronic transcription in Caenorhabditis elegans. Trends Genet. 11: 132–136.[CrossRef][Medline] Blumenthal, T. 2004. Operons in eukaryotes. Brief. Funct. Genomic. Proteomic. 3: 199–211. Blumenthal, T. 2005. Trans-splicing and operons. In WormBook (ed. The C. elegans Research Community, WormBook. http://www.wormbook.org. Blumenthal, T. and Gleason, K.S. 2003. Caenorhabditis elegans operons: Form and function. Nat. Rev. Genet. 4: 112–120.[Medline] Blumenthal, T., Evans, D., Link, C.D., Guffanti, A., Lawson, D., Thierry-Mieg, J., Thierry-Mieg, D., Chiu, W.L., Duke, K., Kiraly, M., et al. 2002. A global analysis of Caenorhabditis elegans operons. Nature 417: 851–854.[CrossRef][Medline] Chalfie, M., Tu, Y., Euskirchen, G., Ward, W.W., and Prasher, D.C. 1994. Green fluorescent protein as a marker for gene expression. Science 263: 802–805. Chang, S., Johnston, R.J., and Hobert, O. 2003. A transcriptional regulatory cascade that controls left/right asymmetry in chemosensory neurons of C. elegans. Genes & Dev. 17: 2123–2137. Corcoran, M.M., Hammarsund, M., Zhu, C., Lerner, M., Kapanadze, B., Wilson, B., Larsson, C., Forsberg, L., Ibbotson, R.E., Einhorn, S., et al. 2004. DLEU2 encodes an antisense RNA for the putative bicistronic RFP2/LEU5 gene in humans and mouse. Genes Chromosomes Cancer 40: 285–297.[CrossRef][Medline] Dupuy, D., Li, Q.R., Deplancke, B., Boxem, M., Hao, T., Lamesch, P., Sequerra, R., Bosak, S., Doucette-Stamm, L., Hope, I.A., et al. 2004. A first version of the Caenorhabditis elegans Promoterome. Genome Res. 14: 2169–2175. Evans, D. and Blumenthal, T. 2000. trans splicing of polycistronic Caenorhabditis elegans pre-mRNAs: Analysis of the SL2 RNA. Mol. Cell. Biol. 20: 6659–6667. Garcia-Rios, M., Fujita, T., LaRosa, P.C., Locy, R.D., Clithero, J.M., Bressan, R.A., and Csonka, L.N. 1997. Cloning of a polycistronic cDNA from tomato encoding Hope, I.A. 1991. Promoter trapping in Caenorhabditis elegans. Development 113: 399–408.[Abstract] Hope, I.A. 1994. PES-1 is expressed during early embryogenesis in Caenorhabditis elegans and has homology to the fork head family of transcription factors. Development 120: 505–514.[Abstract] Hope, I.A., Arnold, J.M., McCarroll, D., Jun, G., Krupa, A.P., and Herbert, R. 1998. Promoter trapping identifies real genes in C. elegans. Mol. Gen. Genet. 260: 300–308.[CrossRef][Medline] Horowitz, H. and Platt, T. 1983. Initiation in vivo at the internal trp p2 promoter of Escherichia coli. J. Biol. Chem. 258: 7890–7893. Hough, R.F., Lingam, A.T., and Bass, B.L. 1999. Caenorhabditis elegans mRNAs that encode a protein similar to ADARs derive from an operon containing six genes. Nucleic Acids Res. 27: 3424–3432. Huang, X.Y. and Hirsh, D. 1989. A second trans-spliced RNA leader sequence in the nematode Caenorhabditis elegans. Proc. Natl. Acad. Sci. 86: 8640–8644. Hwang, B.J., Muller, H.M., and Sternberg, P.W. 2004. Genome annotation by high-throughput 5' RNA end determination. Proc. Natl. Acad. Sci. 101: 1650–1655. Imboden, M.A., Laird, P.W., Affolter, M., and Seebeck, T. 1987. Transcription of the intergenic regions of the tubulin gene cluster of Trypanosoma brucei: Evidence for a polycistronic transcription unit in a eukaryote. Nucleic Acids Res. 15: 7357–7368. Jacob, F. and Monod, J. 1961. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3: 318–356.[Medline] Kriventseva, E.V., Koch, I., Apweiler, R., Vingron, M., Bork, P., Gelfand, M.S., and Sunyaev, S. 2003. Increase of functional diversity by alternative splicing. Trends Genet. 19: 124–128.[CrossRef][Medline] Langer, D., Hain, J., Thuriaux, P., and Zillig, W. 1995. Transcription in archaea: Similarity to that in eucarya. Proc. Natl. Acad. Sci. 92: 5768–5772. Lercher, M.J., Blumenthal, T., and Hurst, L.D. 2003. Coexpression of neighboring genes in Caenorhabditis elegansis mostly due to operons and duplicate genes. Genome Res. 13: 238–243. Liu, H., Jang, J.K., Graham, J., Nycz, K., and McKim, K.S. 2000. Two genes required for meiotic recombination in Drosophila are expressed from a dicistronic message. Genetics 154: 1735–1746. Liu, Y., Huang, T., MacMorris, M., and Blumenthal, T. 2001. Interplay between AAUAAA and the trans-splice site in processing of a Caenorhabditis elegans operon pre-mRNA. RNA 7: 176–181.[Abstract] Lu, C., Bentley, W.E., and Rao, G. 2004. A high-throughput approach to promoter study using green fluorescent protein. Biotechnol. Prog. 20: 1634–1640.[CrossRef][Medline] Lynch, M. and Conery, J.S. 2000. The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155. Macejak, D.G. and Sarnow, P. 1991. Internal initiation of translation mediated by the 5' leader of a cellular mRNA. Nature 353: 90–94.[Medline] MacMorris, M., Kumar, M., Lasda, E., Larsen, A., Kraemer, B., and Blumenthal, T. 2007. A novel family of C. elegans snRNPs contains proteins associated with trans-splicing. RNA 13: 511–520. McKay, S.J., Johnsen, R., Khattra, J., Asano, J., Baillie, D.L., Chan, S., Dube, N., Fang, L., Goszczynski, B., Ha, E., et al. 2003. Gene expression profiling of cells, tissues, and developmental stages of the nematode C. elegans. Cold Spring Harb. Symp. Quant. Biol. 68: 159–169.[CrossRef][Medline] Murphy, W.J., Watkins, K.P., and Agabian, N. 1986. Identification of a novel Y branch structure as an intermediate in trypanosome mRNA processing: Evidence for trans splicing. Cell 47: 517–525.[CrossRef][Medline] Nilsen, T.W. 2001. Evolutionary origin of SL-addition trans-splicing: Still an enigma. Trends Genet. 17: 678–680.[CrossRef][Medline] Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York. Pflugrad, A., Meir, J.Y., Barnes, T.M., and Miller, D.M. 1997. The Groucho-like transcription factor UNC-37 functions with the neural specificity gene unc-4 to govern motor neuron identity in C. elegans. Development 124: 1699–1709.[Abstract] Pleasance, E.D., Marra, M.A., and Jones, S.J. 2003. Assessment of SAGE in transcript identification. Genome Res. 13: 1203–1215. Ross, L.H., Freedman, J.H., and Rubin, C.S. 1995. Structure and expression of novel spliced leader RNA genes in Caenorhabditis elegans. J. Biol. Chem. 270: 22066–22075. Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor Miklos, G.L., Nelson, C.R., Hariharan, I.K., Fortini, M.E., Li, P.W., Apweiler, R., Fleischmann, W., et al. 2000. Comparative genomics of the eukaryotes. Science 287: 2204–2215. Saha, S., Sparks, A.B., Rago, C., Akmaev, V., Wang, C.J., Vogelstein, B., Kinzler, K.W., and Velculescu, V.E. 2002. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20: 508–512.[CrossRef][Medline] Saldi, T., Wilusz, C., Macmorris, M., and Blumenthal, T. 2007. Functional redundancy of worm spliceosomal proteins U1A and U2B. Proc. Natl. Acad. Sci. 104: 9753–9757. Schuler, G.D. 1997. Sequence mapping by electronic PCR. Genome Res. 7: 541–550. Semple, C. and Wolfe, K.H. 1999. Gene duplication and gene conversion in the Caenorhabditis elegansgenome. J. Mol. Evol. 48: 555–564.[CrossRef][Medline] Singer, G.A., Lloyd, A.T., Huminiecki, L.B., and Wolfe, K.H. 2005. Clusters of co-expressed genes in mammalian genomes are conserved by natural selection. Mol. Biol. Evol. 22: 767–775. Sloan, J., Kinghorn, J.R., and Unkles, S.E. 1999. The two subunits of human molybdopterin synthase: Evidence for a bicistronic messenger RNA with overlapping reading frames. Nucleic Acids Res. 27: 854–858. Spieth, J., Brooke, G., Kuersten, S., Lea, K., and Blumenthal, T. 1993. Operons in C. elegans: Polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions. Cell 73: 521–532.[CrossRef][Medline] Stein, L., Sternberg, P., Durbin, R., Thierry-Mieg, J., and Spieth, J. 2001. WormBase: Network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res. 29: 82–86. Thimmapuram, J., Duan, H., Liu, L., and Schuler, M.A. 2005. Bicistronic and fused monocistronic transcripts are derived from adjacent loci in the Arabidopsis genome. RNA 11: 128–138. Trinklein, N.D., Aldred, S.F., Hartman, S.J., Schroeder, D.I., Otillar, R.P., and Myers, R.M. 2004. An abundance of bidirectional promoters in the human genome. Genome Res. 14: 62–66. Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial analysis of gene expression. Science 270: 484–487. Ye, X., Fong, P., Iizuka, N., Choate, D., and Cavener, D.R. 1997. Ultrabithorax and Antennapedia 5' untranslated regions promote developmentally regulated internal translation initiation. Mol. Cell. Biol. 17: 1714–1721.[Abstract] Zhao, Z., Sheps, J.A., Ling, V., Fang, L.L., and Baillie, D.L. 2004. Expression analysis of ABC transporters reveals differential functions of tandemly duplicated genes in Caenorhabditis elegans. J. Mol. Biol. 344: 409–417.[CrossRef][Medline] Zorio, D.A., Cheng, N.N., Blumenthal, T., and Spieth, J. 1994. Operons as a common form of chromosomal organization in C. elegans. Nature 372: 270–272.[CrossRef][Medline]
Received June 18, 2007; accepted in revised format June 29, 2007. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||