|
|
|
|
Published online before print
June 13, 2007, 10.1101/gr.6030107 Genome Res. 17:1005-1014, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter Distinct class of putative "non-conserved" promoters in humans: Comparative studies of alternative promoters of human and mouse genes1 Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minatoku, Tokyo 108-8639, Japan; 2 Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan; 3 Laboratory of Viral Infection II Kitasato Institute for Life Sciences, Kitasato University, Tokyo 108-8641, Japan
Although recent studies have revealed that the majority of human genes are subject to regulation of alternative promoters, the biological relevance of this phenomenon remains unclear. We have also demonstrated that roughly half of the human RefSeq genes examined contain putative alternative promoters (PAPs). Here we report large-scale comparative studies of PAPs between human and mouse counterpart genes. Detailed sequence comparison of the 17,245 putative promoter regions (PPRs) in 5463 PAP-containing human genes revealed that PPRs in only a minor fraction of genes (807 genes) showed clear evolutionary conservation as one or more pairs. Also, we found that there were substantial qualitative differences between conserved and non-conserved PPRs, with the latter class being AT-rich PPRs of relative minor usage, enriched in repetitive elements and sometimes producing transcripts that encode small or no proteins. Systematic luciferase assays of these PPRs revealed that both classes of PPRs did have promoter activity, but that their strength ranges were significantly different. Furthermore, we demonstrate that these characteristic features of the non-conserved PPRs are shared with the PPRs of previously discovered putative non-protein coding transcripts. Taken together, our data suggest that there are two distinct classes of promoters in humans, with the latter class of promoters emerging frequently during evolution.
With the completion of the human and mouse genome sequencing projects (Waterston et al. 2002
The functional diversification of a single gene enabled by the use of alternative splices (ASs) and APs is thought to be the molecular basis whereby the human genome is able to establish highly complex systems, such as the brain and immune systems (King and Wilson 1975
To address questions currently of interest in genome, evolutionary and pharmaceutical sciences, large-scale attempts to discover and characterize ASs/APs in human genes have been started. We have also been identifying and characterizing the transcriptional start sites (TSSs) and the adjacent putative promoter regions (PPRs) using the data of our 1.8 million human full-length cDNAs. These cDNAs were collected from cDNA libraries constructed by a cap-targeting method, oligo-capping (Suzuki and Sugano 2003
In spite of potential importance of widespread PAPs in humans, it is not still clear why there are so many PAPs. In the present study, in order to understand what biological relevance those PAPs have and how they have been shaped during evolution, we carried out a large-scale comparative study of PAPs between human and mouse putative counterpart genes. For this purpose, we first prepared the TSS and PPR data for mice based on mouse full-length cDNA sequences collected from the mouse full-length cDNA project (Okazaki et al. 2002
Identification of widespread presence of PAPs in mice and sequence comparison between human and mouse PPRs For the comparative study of PAPs (groups of PPRs), we collected the TSS information and retrieved the adjacent PPRs for mice using the same procedure as previously described for humans (Kimura et al. 2006
The retrieved mouse PPRs were subjected to comparative studies of PAPs in humans and mice. For our 7674 human PAP-containing genes, TSS information for mouse counterpart genes (according to Homologene; http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene ) was found for 5463 genes. In total, the 5463 gene pairs included 17,245 and 8622 PPRs in humans and mice, respectively. As for each of these PPRs, sequence alignments between humans and mice were generated (500 bp to +0 bp of the TSSs were used; the position of the most frequently used TSS was defined as 0; this range was set to avoid sequence overlap between different PPRs within a particular PAP). All human and mouse PPR pairs belonging to the same mutually best-hit homologous gene were considered. For the sequence alignment, we used LALIGN (http://www.ch.embnet.org/software/LALIGN_form.html). We used this local alignment program because it is relatively robust for gaps and thus was expected to generate precise sequence alignments of promoters, although its application for genome-wide comparison is impossible due to its computational cost (also see the references Suzuki et al. 2004
Lack of evolutionary conservation of a major part of the PPR members of a PAP
However, to our surprise, it was rather rare that the multiple PPR members within a single PAP were conserved altogether. It was far more common that the PPR sequence could be aligned only for one PPR in a PAP, while the remainder of the PPRs could not be aligned at all (Fig. 1A). As shown in Figure 1C, the number of "conserved" PPRs did not increase in proportion to the increase of the number of PPR members in the PAP. Even within a PAP consisting of more than five individual PPRs, the average number of "conserved" PPRs remained nearly one (see the solid bars in Fig. 1C). The increased parts were mostly accounted for PPRs for which no clear conservations were observed ("marginal" or "non-conserved"; see below).
For those PPRs for which no significant alignments could be generated, we analyzed genomegenome BLASTZ alignments in UCSC Genome Browser (http://genome.ucsc.edu/). Among 11,510 human PPRs (67% of the 17,245 PPRs) for which no mouse counterpart PPRs could be found, 4601 PPRs (27% of the 17,245 PPRs) were located within alignable regions, although no mouse TSSs were observed in their proximal regions. There were two possibilities to explain this: (1) cDNA coverage was insufficient; (2) promoter activities were lost in mice in spite of the fact that certain levels of sequence similarity remained. To decide between these two possibilities, we compared the number of TSSs allocated to each of the PPRs. In 1014 cases, insufficient coverage of the cDNAs was unlikely to be accounted for the absence of TSSs. In these cases, the statistical estimation (with the cutoff of P< 0.05) based on the comparison of the number of TSSs between humans and mice indicated that there must be at least one TSS at the corresponding position in mice, too (intuitively, it is understood as a case in which no TSS was observed from a particular mouse genomic region [non-PPR] although there are many human TSSs identified from the corresponding human genomic region [PPR]; for examples and further details, see Supplemental Fig. 1B). It was instead likely that the corresponding genomic regions had come to have the promoter activities only on the human side. Special cases of these observations in which the distances of the TSS clusters (PPRs) are small are also reported as "TSS turnover" by a recent study using CAGE tag analysis (Frith et al. 2006 On the other hand, although we scrutinized the genomegenome alignments as well as the PPRPPR alignments, we could not find any significant alignments for the remaining 6909 PPRs (40% of the 17,245 PPRs). In these cases, the corresponding genomic sequences together with the corresponding TSSs were completely missing from the mouse side. According to these observations, we classified the individual PPRs into three groups: "conserved (genome aligned with TSS support)," "marginal (genome aligned without TSS support)," and "non-conserved (genome not aligned)," respectively, as illustrated in Figure 1A. In the following study, we will focus the discussion on the comparison between "conserved" and "non-conserved" PPRs, but the "marginal" PPRs showed features generally similar to "non-conserved" PPRs in each analysis.
Characteristic features of "conserved" and "non-conserved" PPRs
Experimental characterization of the "non-conserved" PPRs and their resemblance to the PPRs of "ncRNAs"
Having determined the range of transcriptional activities of "non-conserved" PPRs, which frequently drive transcripts encoding no or very small proteins, we wished to analyze the PPRs of a class of so-called long non-protein coding transcripts, which were discovered in recent full-length cDNA studies. (Note: We will simply call them "ncRNAs" hereafter; they are sometimes called transcripts of unknown functions [TUFs]; see Mattick and Makunin 2006 2 test), enriched in TATA-like elements (63%; "conserved""ncRNA": P < 2 x 108; "non-conserved""ncRNA": P = 0.2: 2 test), and the major part is lacking corresponding regions in the mouse genome according to UCSC BLASTZ alignment (70% were located outside of BLASTZ-alignable regions in mice).
Possible origin of the "non-conserved" PAPs
We also found that repetitive sequence elements were enriched in the "non-conserved" PPRs (2004; 29%; Fig. 4B) compared to the "conserved" PPRs (728; 13%; P < 1 x 10100: 2 test). Especially, the so-called retroelement-type repetitive elements, such as L1 and Alu, mostly accounted for this differential distribution. There are a number of reported examples in which such classes of retroelements were integrated in the vicinity of future TSSs and acquired transcriptional regulatory activities via slight changes in their sequences (Norris et al. 1995As for the "non-conserved" PPRs, as reference genomic sequences have become available for several other mammals due to recent genome sequencing projects (see the Web site of NHGRI, http://www.genome.gov/), the genomic regions in chimpanzees, macaque monkeys, dogs, and cows were analyzed in a similar way as for the human and mouse comparison; first, the PPRs were tentatively defined as the 5'-end adjacent regions of annotated genes and available ESTs of full-length cDNAs aligned with human PPRs using LALIGN; for those for which no clear alignments were generated, respective genomic sequences in chimpanzees, macaque monkeys, dogs, and cows were further searched according to the BLASTZ alignment in UCSC Genome Browser. We found that the "non-conserved" PPRs were swiftly lost in proportion to the evolutional distances, and no more than 30% of them were identified in dogs, cows, and rats (Fig. 4C). On the other hand, at least 60% of the (humanmouse) "conserved" PPRs were found in other organisms genomes. Even considering the incompleteness of the genome sequencing in some of these species, we concluded that a major part of the "non-conserved" PPRs appear to have emerged evolutionarily in a lineage- or species-specific manner and are likely to have evolved very rapidly.
Here we have described a large-scale comparative study of PAPs of human and mouse genes. Taking advantage of the collection of the 5'-end information of human and mouse full-length cDNAs, we used the well-defined 5'-end cDNA information for the identification and analyses of the PPR members. This allows a thorough comparison both of the transcriptional activity (PPRs) and sequence similarities and differences in both mouse and human. Both species have widespread PAPs, but the patterns of sequence conservation vary dramatically among the PAPs.
Interestingly, while we found that two or more PPRs were conserved in 807 genes, we unexpectedly observed that such conserved PAP relationships were only a minor fraction. In most cases, only one PPR was conserved within a given PAP, while the rest was non-conserved. It is unlikely that this general lack of conservation resulted from misidentification of the PPRs. It is true that, if PPRs were identified by dubious "full-length" cDNAs, the regions adjacent to their 5'-ends would merely be intronic sequences, and thus would be expected to be non-conserved. However, in the present study, we carefully removed potential erroneous oligo-cap cDNAs from our data set (see Methods; also see Kimura et al. 2006
Our finding that a large population of the "non-conserved" PAPs was located well inside of the gene seems in line with the findings obtained from recent ChIP-on-chip analyses. Binding analyses of common transcription factors, including SP1, MYC, TP53, and CREB, revealed that there are comparable numbers of docking sites for them at the internal part of the genes as well as at the 5'-ends (Cawley et al. 2004
The presumed biological roles of "non-conserved" PPRs immediately raise the question of how these roles would be realized. One mechanism may be encoding alternative proteins with modified functions or proteins with identical functions expressed in different conditions. In the case of the human SHC1 gene, transcripts derived from the proximal alternative promoter encode a protein lacking the interaction domain for binding with some of the interacting partners, and thereby serve as modulators of signaling pathways (Luzi et al. 2000
Another mechanism for the "non-conserved" PPRs to realize their functions may be producing ncRNAs (Figs. 2H). A recent stream of reports have shown that ncRNAs serve important regulatory roles (Hornstein and Shomron 2006 However, it was surprising that, regardless of whether they encoded proteins or not, "non-conserved" PPRs which seemed ultimately to be associated with regulatory roles were commonly found to be in a different evolutionary track from canonical PPRs. Although further detailed studies on the nucleotide changes accompanied by consequent functional changes in the promoter activities would be necessary to reveal at which point of the evolutionary stage the "non-conserved" PPRs emerged and which of them are on the way to positive, purifying, or neutral selections, it was significant to observe that the most frequent and dynamic aspects of functional diversification of genes in higher mammals should generally be orchestrated via a core "conserved" promoter playing the main tune in an ensemble of accessory "non-conserved" promoters.
The finding of the widespread presence of "non-conserved" PAPs is somewhat reminiscent of the case of ASs: The major population of ASs identified in both humans and mice was also shown to be evolutionarily non-conserved and is of minor usage. The recently born ASs are regarded as primitive forms, presumably serving as an evolutional reservoir for new transcript variants. Likewise, ab initio emergence of the promoters may take place relatively frequently among the wide variety of genomic sequences. Generally, the so-called consensus sequences for many of the transcription factor binding sites as well as those for splice junctions and other splicing enhancers are short and frequently found throughout any genomes. Basic sequence materials which can potentially consist of a promoter or a splicing junction have been constantly forming during evolution and are abundantly found throughout long mammalian genomes (Rockman and Wray 2002
Very recently, a paper appeared from another group, also analyzing the properties of PAPs using our data set (Baek et al. 2007
Mapping and clustering of the 5'-end data The mouse PPRs data set was generated similarly as the case in the human PPRs. The 5'-end information for 580,204 mouse full-length cDNAs which were obtained from the 119 kinds of cap-trapper full-length cDNA libraries (see Supplemental Table 1) was collected and mapped onto the mouse genomic sequence (mm5; as of UCSC Genome Browser). TSSs were clustered so that the distances of the TSSs from each other were >500 bp. Details of the procedure were described previously (Kimura et al. 2006
Sequence alignment
Procedures used in computational characterizations In order to evaluate the relative usage of the "non-conserved" against "conserved" PPRs, the numbers of cDNAs belonging to every PPR were counted separately (as representing the individual expression level of the PPR), and the proportion of them relative to the total number of cDNAs belonging to the corresponding locus was calculated (as representing the total expression level). Those thereby-calculated relative usages of the PPRs were compared between the "conserved" and "non-conserved" PPRs. In order to identify putative protein-coding regions in the transcripts whose TSSs were defined by each of the PPRs, the 5'-end sequences were connected to RefSeq sequences from the position where they overlapped. The possible protein coding regions were determined from the resultant virtual hybrid transcripts, and putative amino acid lengths were calculated. According to the obtained information, the ratios of transcripts from commonly used regions relative to transcripts covering the entire amino acid sequences of RefSeq were also calculated. The indicated categories of possible evolutional origins of the PPRs were defined as follows: (1) Ab initio generation: the case which could not be defined by 24; (2) local duplication: the case in which BLASTN search detected a homologous sequence within the local region; from the terminal exon of the upstream adjacent gene to the 3'-end of the last exon of the gene; (3) repeat insertion: the case in which the repetitive element (as defined by UCSC Genome Browser) was found in the PPR; (4) other genomic rearrangement: the case in which BLASTN search detected a homologous sequence outside of the local region defined in 2. For analyzing the conservation of the "non-conserved" PPRs in other organisms, the surrounding sequences of the 5'-ends of the cDNAs were retrieved from the chimpanzee, macaque, cow, dog, and rat genomic sequences as of UCSC Genome Browser. The surrounding sequences were defined as those of annotated genes and ESTs, which should be the closest equivalents of the human and mouse data, although the accumulation of data for them was relatively poor. The obtained sequences were analyzed similarly in the cases of humans and mice using LALIGN and by considering the genome alignments registered in UCSC.
For evaluating statistical significance,
Luciferase assays For amplifications of the random genomic DNAs, the primers having the cloning sites only were used with low annealing temperature. The products were size-fractionated by agarose gel electrophoresis. The recovered fragments were sequenced and the redundancy was removed. In total, 250 genomic DNA were selected from non-promoter regions. Each of the mapped positions of the fragments is shown in Supplemental Material. The amplified genomic DNAs were cloned into the luciferase vector using the Gateway System (Invitrogen). The plasmid DNAs were purified using Qiaprep Ultra (Qiagen) and transfected into HEK293 cells using Fugene6 (Roche) according to the manufacturers instructions. The luciferase assays were performed 48 h after the transfections using a dual luciferase kit (Promega). Every assay was performed in triplicate.
We thank K. Abe and K. Imamura for technical support. We also thank E. Nakajima for careful reading of the manuscript. This work was supported by grants from the New Energy and Industrial Technology Development Organization (NEDO) project of the Ministry of Economy, Trade, and Industry (METI) of Japan; the Japan Key Technology Center project of METI of JAPAN; and a Grant-in-Aid for Scientific Research on Priority Areas from the Ministry of Education, Science, Sports, and Culture of Japan.
4 Corresponding author.
E-mail ysuzuki{at}hgc.jp; fax +81-4-7136-3607. [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank under accession nos. BP870448BP873619 and BP244227BP249739.] Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6030107.
Baek, D., Davis, C., Ewing, B., Gordon, D., and Green, P. 2007. Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters. Genome Res. 17: 145155. C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282: 20122018. Carninci, P., Kasukawa, T., Katayama, S., Gough, J., Frith, M.C., Maeda, N., Oyama, R., Ravasi, T., Lenhard, B., Wells, C., et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 15591563. Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C., et al. 2006. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38: 626635.[CrossRef][Medline] Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., et al. 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116: 499509.[CrossRef][Medline] Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., et al. 2005. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308: 11491154. Dermitzakis, E.T. and Clark, A.G. 2002. Evolution of transcription factor binding sites in Mammalian gene regulatory regions: Conservation and turnover. Mol. Biol. Evol. 19: 11141121. Frith, M.C., Ponjavic, J., Fredman, D., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., and Sandelin, A. 2006. Evolutionary turnover of mammalian transcription start sites. Genome Res. 16: 713722. Gardiner-Garden, M. and Frommer, M. 1987. CpG islands in vertebrate genomes. J. Mol. Biol. 196: 261282.[CrossRef][Medline] Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. 1996. Life with 6000 genes. Science 274: 546567. Grandien, K., Berkenstam, A., and Gustafsson, J.A. 1997. The estrogen receptor gene: Promoter organization and expression. Int. J. Biochem. Cell Biol. 29: 13431369.[CrossRef][Medline] Hamdi, H.K., Nishio, H., Tavis, J., Zielinski, R., and Dugaiczyk, A. 2000. Alu-mediated phylogenetic novelties in gene regulation and development. J. Mol. Biol. 299: 931939.[CrossRef][Medline] Hashimoto, S., Suzuki, Y., Kasai, Y., Morohoshi, K., Yamada, T., Sese, J., Morishita, S., Sugano, S., and Matsushima, K. 2004. 5'-End SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22: 11461149.[CrossRef][Medline] Hornstein, E. and Shomron, N. 2006. Canalization of development by microRNAs. Nat. Genet. 38: S20S24.[CrossRef][Medline] Imanishi, T., Itoh, T., Suzuki, Y., O'Donovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M., et al. 2004. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2: e162.[CrossRef][Medline] Impey, S., McCorkle, S.R., Cha-Molstad, H., Dwyer, J.M., Yochum, G.S., Boss, J.M., McWeeney, S., Dunn, J.J., Mandel, G., and Goodman, R.H. 2004. Defining the CREB regulon: A genome-wide analysis of transcription factor regulatory regions. Cell 119: 10411054.[Medline] International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931945.[CrossRef][Medline] Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P., and Gingeras, T.R. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916919. Kapranov, P., Drenkow, J., Cheng, J., Long, J., Helt, G., Dike, S., and Gingeras, T.R. 2005. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 15: 987997. Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond, T.A., Wu, Y., Green, R.D., and Ren, B. 2005. A high-resolution map of active promoters in the human genome. Nature 436: 876880.[CrossRef][Medline] Kimura, K., Wakamatsu, A., Suzuki, Y., Ota, T., Nishikawa, T., Yamashita, R., Yamamoto, J., Sekine, M., Tsuritani, K., Wakaguri, H., et al. 2006. Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes. Genome Res. 16: 5565. King, M.C. and Wilson, A.C. 1975. Evolution at two levels in humans and chimpanzees. Science 188: 107116. Landry, J.R., Mager, D.L., and Wilhelm, B.T. 2003. Complex controls: The role of alternative promoters in mammalian genomes. Trends Genet. 19: 640648.[CrossRef][Medline] Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120: 1520.[CrossRef][Medline] Luzi, L., Confalonieri, S., Di Fiore, P.P., and Pelicci, P.G. 2000. Evolution of Shc functions from nematode to human. Curr. Opin. Genet. Dev. 10: 668674.[CrossRef][Medline] Mattick, J.S. and Makunin, I. V. 2006. Non-coding RNA. Hum. Mol. Genet. 15: R17R29. Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., et al. 2006. TRANSFAC and its module TRANSCompel: Transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34: D108D110. Modrek, B. and Lee, C. 2002. A genomic view of alternative splicing. Nat. Genet. 30: 1319.[CrossRef][Medline] Norris, J., Fan, D., Aleman, C., Marks, J.R., Futreal, P.A., Wiseman, R.W., Iglehart, J.D., Deininger, P.L., and McDonnell, D.P. 1995. Identification of a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J. Biol. Chem. 270: 2277722782. Oh, S.Y., Lee, M.Y., Kim, J.M., Yoon, S., Shin, S., Park, Y.N., Ahn, Y.H., and Kim, K.S. 2005. Alternative usages of multiple promoters of the acetyl-CoA carboxylase beta gene are related to differential transcriptional regulation in human and rodent tissues. J. Biol. Chem. 280: 59095916. Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H., et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563573.[CrossRef][Medline] Ota, T., Suzuki, Y., Nishikawa, T., Otsuki, T., Sugiyama, T., Irie, R., Wakamatsu, A., Hayashi, K., Sato, H., Nagai, K., et al. 2004. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat. Genet. 36: 4045.[CrossRef][Medline] Owens, I.S., Basu, N.K., and Banerjee, R. 2005. UDP-glucuronosyltransferases: Gene structures of UGT1 and UGT2 families. Methods Enzymol. 400: 122.[Medline] Pan, Q., Bakowski, M.A., Morris, Q., Zhang, W., Frey, B.J., Hughes, T.R., and Blencowe, B.J. 2005. Alternative splicing of conserved exons is frequently species-specific in human and mouse. Trends Genet. 21: 7377.[CrossRef][Medline] Rockman, M.V. and Wray, G.A. 2002. Abundant raw material for cis-regulatory evolution in humans. Mol. Biol. Evol. 19: 19912004. Strausberg, R.L., Feingold, E.A., Grouse, L.H., Derge, J.G., Klausner, R.D., Collins, F.S., Wagner, L., Shenmen, C.M., Schuler, G.D., Altschul, S.F., et al. 2002. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci. 99: 1689916903. Su, D. and Gladyshev, V.N. 2004. Alternative splicing involving the thioredoxin reductase module in mammals: A glutaredoxin-containing thioredoxin reductase 1. Biochemistry 43: 1217712188.[CrossRef][Medline] Suzuki, Y. and Sugano, S. 2003. Construction of a full-length enriched and a 5'-end enriched cDNA library using the oligo-capping method. Methods Mol. Biol. 221: 7391.[Medline] Suzuki, Y., Yamashita, R., Shirota, M., Sakakibara, Y., Chiba, J., Mizushima-Sugano, J., Nakai, K., and Sugano, S. 2004. Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions. Genome Res. 14: 17111718. Tautz, D. 2000. Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10: 575579.[CrossRef][Medline] Tominaga, K., Kirtane, B., Jackson, J.G., Ikeno, Y., Ikeda, T., Hawks, C., Smith, J.R., Matzuk, M.M., and Pereira-Smith, O.M. 2005. MRG15 regulates embryonic development and cell proliferation. Mol. Cell. Biol. 25: 29242937. Tsunoda, T. and Takagi, T. 1999. Estimating transcription factor bindability on DNA. Bioinformatics 15: 622630. Vansant, G. and Reynolds, W.F. 1995. The consensus sequence of a major Alu subfamily contains a functional retinoic acid response element. Proc. Natl. Acad. Sci. 92: 82298233. Wang, H. and Negishi, M. 2003. Transcriptional regulation of cytochrome p450 2B genes by nuclear receptors. Curr. Drug Metab. 4: 515525.[CrossRef][Medline] Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520562.[CrossRef][Medline] Willingham, A.T. and Gingeras, T.R. 2006. TUF love for "junk" DNA. Cell 125: 12151220.[CrossRef][Medline] Wu, Q. 2005. Comparative genomics and diversifying selection of the clustered vertebrate protocadherin genes. Genetics 169: 21792188. Yamashita, R., Suzuki, Y., Wakaguri, H., Tsuritani, K., Nakai, K., and Sugano, S. 2006. DBTSS: DataBase of Human Transcription Start Sites, progress report 2006. Nucleic Acids Res. 34: D86D89. Zhang, Q.H., Ye, M., Wu, X.Y., Ren, S.X., Zhao, M., Zhao, C.J., Fu, G., Shen, Y., Fan, H.Y., Lu, G., et al. 2000. Cloning and functional analysis of cDNAs with open reading frames for 300 previously undefined genes expressed in CD34+ hematopoietic stem/progenitor cells. Genome Res. 10: 15461560.
Received October 11, 2006; accepted in revised format February 12, 2007. Related Articles
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||