|
|
|
|
Genome Res. 14:331-342, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Novel RNAs Identified From an In-Depth Analysis of the Transcriptome of Human Chromosomes 21 and 22Affymetrix, Santa Clara, California 95051, USA
In this report, we have achieved a richer view of the transcriptome for Chromosomes 21 and 22 by using high-density oligonucleotide arrays on cytosolic poly(A)+ RNA. Conservatively, only 31.4% of the observed transcribed nucleotides correspond to well-annotated genes, whereas an additional 4.8% and 14.7% correspond to mRNAs and ESTs, respectively. Approximately 85% of the known exons were detected, and up to 21% of known genes have only a single isoform based on exon-skipping alternative expression. Overall, the expression of the well-characterized exons falls predominately into two categories, uniquely or ubiquitously expressed with an identifiable proportion of antisense transcripts. The remaining observed transcription (49.0%) was outside of any known annotation. These novel transcripts appear to be more cell-line-specific and have lower and less variation in expression than the well-characterized genes. Novel transcripts were further characterized based on their distance to annotations, transcript size, coding capacity, and identification as antisense to intronic sequences. By RT-PCR, 126 novel transcripts were independently verified, resulting in a 65% verification rate. These observations strongly support the argument for a re-evaluation of the total number of human genes and an alternative term for "gene" to encompass these growing, novel classes of RNA transcripts in the human genome.
The working draft of the human genome has led to several estimates of the total number of encoded genes ranging from 30,000 to 120,000 (Ewing and Green 2000
Following this study, other investigators have estimated the amounts of transcription emanating from human and mouse genomes using a variety of experimental approaches, such as serial analysis of gene expression (SAGE; Chen et al. 2002
In this study, we describe an in-depth analysis of the poly(A)+ cytosolic RNA transcription data reported earlier (Kapranov et al. 2002
Generation of Transcription Maps Based on Collective Behavior of Neighboring Probes In an earlier analysis of poly(A)+ cytosolic RNA transcripts from Chromosomes 21 and 22 (Kapranov et al. 2002 k is the set of all probe pairs whose genomic coordinates lie within [Pk BW, Pk + BW], where BW is the bandwidth and the resulting widow size is given by (2 x BW) + 1. The determination of a suitable window size is constrained by two facts: the spacing of the probes along the chromosomes (approximately every 35 bp) and the median size of an exon on these chromosomes (137 bp). We used a bandwidth of 50 bp, which approximately corresponds to three probe pairs. Multiple bandwidths (comprising a specific number of bases) were initially tested for their ability to achieve the greatest detection sensitivity and specificity based on spiked-in quantitative bacterial RNA transcripts. This approach resulted in greater sensitivity for any given false-positive rate than by using the results of the performance of a single probe pair (data not shown). One fact to be mindful of when using such a window-based approach is that although the false-positive call rate was reduced in identifying the sites of transcription, the performance of the probe pairs was smoothed, making strict determination of the transcription boundaries possibly problematic.
Using this analysis approach, the average of the overall total positive probes detected from the poly(A)+ cytosolic RNA fraction in each cell line tested was 10.3% of the mapped probe pairs on the arrays (Supplemental Table S1 available online at www.genome.org and http://transcriptome.affymetrix.com/download/genome_res_data). This number is increased significantly to 25.8% if one considers all positive probes in the cumulative map of all 11 cell lines (i.e., the "1 of 11" map). These are consistent with our previous analysis (Kapranov et al. 2002 Following the determination of which of the probe pairs were likely to be detecting transcription, the behavior of neighboring probe pairs was evaluated to assemble contiguous fragments of transcribed units (i.e., transfrags). Transfrag maps were constructed by implementing a fixed intensity threshold (threshold = 150), a maximum gap between positive probe pairs (maxgap = 40), and a minimum length of adjacent positive probe pairs (minrun = 90; Fig. 1A). Transfrag maps contain all transfrags that meet these criteria and are thus a chromosome-wide summary of transcriptional fragments. A threshold of 150 gives a median false-positive rate of 2.9% from the negative bacterial controls on all arrays (Supplemental Table S2). These parameters were chosen such that a very conservative transfrag map, low in false-positive calls, would result. Transfrags in such maps would contain no negative probe pairs. This analysis allows for the formation of continuous blocks of transcription as well as further minimizing probe-specific effects. Transfrag maps were generated for each cell line individually as well as a "1 of 11" map, which contains transfrags detected in at least one of the 11 cell lines. The average total number of transfrags found in each cell line was 2965, and the average length of a transfrag was 153 bp (Supplemental Table S3). The number of transfrags increases significantly to 9001 in the "1 of 11" map, yet the average length of a transfrag remains approximately the same, 154 bp, suggesting a considerable tendency toward cell-line-specific transfrags. Other evidence discussed below supports this conclusion.
It is important to note that alterations in any of these analysis strategies and their corresponding parameters (i.e., window size, minrun, maxgap, etc.), results in different but overlapping maps. By implementing this transfrag approach, a more stringent set of maps can be generated because each probe pair must meet the collective minrun, maxgap, and threshold criteria. The choice of what values to choose for the threshold, minrun, and maxgap criteria ultimately involves a tradeoff between the desired false-positive and sensitivity rates.
Association of Detected Transfrags to Chromosome-Wide Annotations In many instances, transfrags that overlapped annotations extended well beyond the annotation bounds. Such extensions could be the result of overlapping transcription from an adjacent gene, extensions of exons or UTRs, or antisense transcription at such loci. To differentiate transfrags that are synonymous with annotations versus transfrags that are extensions of known annotations, transfrags that overlapped and extended any annotation were fragmented to derive a unique set of novel transfrags (Fig. 1B). Following this classification, a "1 of 11 known" map and a "1 of 11 novel" map were compiled to represent the nonoverlapping union of all known or novel transfrags from all cell lines using a combine refinement approach (Fig. 1C).
Approximately half (49.0%) of the base pairs within transfrags mapped along Chromosomes 21 and 22 do not overlap with any well-characterized exon, mRNA, or EST, in at least "1 of 11" cell lines (Fig. 2A). Only 31.4% of the observed transcription aligned within known exons. An additional 4.8% and 14.7% of the transcription were seen within mRNAs and ESTs, respectively. For each cell line, the proportion of detected transcription within the well-characterized exons ranges from 41.0%50.1%, 5.1%7.9% in mRNAs, 10.1%16.7% in ESTs, and 26.5%42.3% of the observed transcription is novel (Fig. 2B). Interestingly, if one considers only the probe pairs without the generation of transfrags,
Relating Transfrags to Genes Using these alternative analyses approaches, transcripts mapping to well-characterized exons of known genes were also readily detectable. On Chromosomes 21 and 22, there are 990 well-characterized genes that are composed of 6463 exons (Kent et al. 2002 7.5. In the "1 of 11" map, 84.5% (5068/5995) of the known exons were detected and 70.5% (27,088/38,407) of the probe pairs within these exons are positive (Supplemental Fig. S1). The total percentage of known exons detected in each cell line ranged from 40.8% to 63.8%, with 27.6%47.1% of the probes pairs within these exons positive (Supplemental Fig. S1).
The binning of positive probes in exons resulted in a bimodal distribution indicating that the majority of exons fell into two cases: exons that have no or few positive probes (<10%) and exons that have all or mostly all (>90%) positive probes (Fig. 3). A similar trend resulted when using all exons or only exons that contain
By generating an "on/off" profile for each exon on the array (5995) in all genes (990) in all cell lines, we estimated the degree of differential expression in terms of exon skipping. The collection of genes selected for this analysis came from the UCSC collection of known genes (Kent et al. 2002 12%21% (105/852 of genes with all exons, 146/684 of genes with exons having 4 probes) of the genes displaying a single profile or have only one isoform in terms of exon skipping. This estimation of alternative splicing of known genes is likely to be an underestimate because it only accounts for exon-skipping events as opposed to alterative splicing of exons resulting in truncation or extensions of exons. Furthermore, the plot contains only genes with more than one exon. An important caveat is that exons with only a single or few interrogating probe pairs might be misleading because a single probe pair determines the overall expression of the exon and thus the on/off call. This might lead to a higher number of profiles that are potentially an overestimate of exon-skipping isoforms (Fig. 4, blue bars). Needless to say, this analysis allows for the determination of the usage of each exon for each gene. Thus, all alternatively spliced forms for each gene on Chromosomes 21 and 22 in each cell line tested denoted by exon-swapping as the differential splicing motif has be constructed (Supplemental Table S4).
Differential Expression of Novel Transfrags Between Cell Lines The proportion of transcription increases in the "1 of 11" map compared with individual cell lines, implying that many of the known and novel transfrags are cell-line-specific (Figs. 2 and 3). This observation can be seen as consistent with the fact that the cell types used in this study are of different developmental origins and therefore have unique expression profiles. The percentage of total nucleotides within known or novel transfrags was plotted against the number of cell lines expressing that transfrag (Fig. 5A). On average, a transfrag corresponding to a well-characterized exon was observed in approximately five cell lines, whereas any novel transfrag was found in approximately three cell lines on average. Upon closer inspection, two different and distinctly segregated populations of transfrags overlap well-characterized exons. The larger population of well-characterized transcription (30.8%) was found within transfrags that were expressed in one or two cell lines. The second and noticeably significant population of known transcription was ubiquitously expressed (11.5%). Conversely, 48.5% of the observed novel transcription was limited to a single cell line (Fig. 5A).
Analysis of the degree of differential expression across the 11 cell lines by an ANOVA test evaluated the variance of expression within each transfrag and between cell lines. The ANOVA F-statistic for each transfrag was defined by the variance of average pseudo-median values in each transfrag between cell lines divided by the average of the within cell line variation of pseudo-median values in that transfrag. Figure 5B illustrates the distribution of ANOVA F-statistics for the total, known, and novel transfrags. By comparing the population of ANOVA F-statistics between the known and novel transfrags, the known transfrags displayed a significantly greater differential expression across the 11 cell lines than the novel transfrags. The smaller variation in the novel transfrags was likely a result of their lower expression levels, expression in fewer cell lines, and consequently expression levels that are closer to the background making any variation difficult to measure. On average, the intensity of the positive probes of transfrags that align to known exons compared with the novel transfrags is higher, 323 compared with 187.
Characterization of Novel Transfrags
Strand Determination of Transfrags All transcription maps described above were generated using non-strand-specific double-stranded cDNA made with random primers on cytosolic poly(A)+ RNA. Because one of the interesting subclasses of the novel transfrags (36% of total) is those located within the bounds of an annotated gene, that is, overlapping an intron, determination of the strand becomes important. To investigate whether these novel transfrags represent alternative exon isoforms (novel exons, extension of exons, or 5'- or 3'-UTRs) or antisense transcripts, the data obtained using the cDNA labeling assay were supplemented with a strand-specific RNA assay to assign strand information to the cDNA-derived transfrags. We used a novel direct RNA end-labeling method (K. Cole, V. Truong, D. Barone, and G. McGall, unpubl.) as opposed to using first-strand cDNA synthesis, to avoid the potential for unintended second-strand synthesis or any spurious priming events. End-labeled RNA targets from two cell lines (A375 and Jurkat) were hybridized to the arrays representing each strand of Chromosomes 21 and 22 separately, and transfrag maps were generated using the following parameters (threshold = FPR 5%, maxgap = 40 bp, and minrun = 90 bp). The transfrag maps for each strand were combined into a nonoverlapping map ("1 of 2" [+] strand map and "1 of 2" [] strand map) and used to assess the total amount of stranded transfrags in the cDNA-derived data by comparing them with a "1 of 2" double-stranded cDNA map obtained from the A375 and Jurkat cell lines. This "1 of 2" double-stranded cDNA-derived map resulted in 3747 (496.5 kb) transfrags that could be compared with the 1392 (159.8 kb) and 1112 (131.0 kb) transfrags in the "1 of 2" (+) strand and "1 of 2" () strand maps, respectively (Table 1A).
Although the hybridization targets and conditions are quite different between the cDNA and RNA direct-labeling assays and the cDNA assay is more sensitive than the RNA assay (data not shown), 35% of the positive probe pairs between the cDNA and RNA end-labeled data are identical. On a transfrag level, 73.5 kb out of the 496.5 kb (14.8%) of the cDNA data overlap with the RNA end-labeled data indicating strand. Of these, 57 kb (77%) of the cDNA transfrags corresponds to known exons, mRNAs, or ESTs, whereas the remaining 23% (16.8 kb) corresponds to novel transcribed regions (Table 1B). Approximately 11% of the base pairs in strand-specific transfrags that overlap a known exon, mRNA, or EST were antisense (Table 1C). Of the 5.3 kb of the novel transcription (31%) that is intronic or overlaps intronic regions from well-characterized genes, 2.7 kb (51%) was antisense (Table 1D). These data indicate that a significant proportion of the observed transcription is antisense to well-characterized exons, introns, mRNAs, or ESTs. Supplemental Figure S3 displays several examples of antisense and novel transcription observed in this study for which strandedness was determined. The stranded transcript in the 3'-UTR of SEC14L2 on Chromosome 22 represents an example of an antisense transcript of an annotated gene (Supplemental Fig. S3A). The novel () strand-specific transfrag adjacent to an exon in the C21orf66 gene shows an example of a possible exon extension (Supplemental Fig. S3B). The gap between the novel transfrag and the exon of C21orf66 is caused by the use of a 50-bp masked region proximal to exons to account for the smoothing effect of using a window-based approach. Although adjacent transfrags identified on the same strand are not necessarily part of a transcript, it is tempting to speculate that the two novel transfrags shown in Supplemental Figure S3C that overlap a portion of a Genscan predicted gene represent a gene. These are but a few of the observed antisense transcripts (12% of the total strand-specific transcription identified).
Additional Experimental Validation of Novel Transcripts
Overall, 82/130 (63%) of the new regions have been successfully cloned and/or sequence-verified (Supplemental Table S5). Interestingly, sequence analysis of the PCR products showed little evidence of coding capacity for the majority of the sequences, indicating that many of these novel transfrags are likely to be noncoding transcripts (see Supplemental Table S5). Together with the regions reported previously (Kapranov et al. 2002
In addition, several cloned PCR products were used as probes on Northern blots with cytosolic total and poly(A)+ fraction RNAs (Fig. 7). Approximately 30% of PCR products detected a specific RNA transcript on the Northern blots. All detected RNAs were enriched in the poly(A)+ fraction, indicating that they have poly(A)+ segments. All of the detected transcripts appear to be at very low abundance, requiring 612 µg of poly(A)+ RNA/lane and prolonged exposures (1 wk on phosphor-screen) to be detected. This indicates that
An in-depth analysis and characterization of the transcription identified using high-density arrays along Chromosomes 21 and 22 (Kapranov et al. 2002
Our earlier results indicated that the number of transcribed base pairs located within any of the well-characterized exons of Chromosomes 21 and 22 was as much as an order of magnitude less than that observed outside these annotations. In our present analyses, by joining together neighboring positive probe pairs into transfrags based on a stringent set of parameters, and by overlaying the locations of transcribed regions to a more complete set of annotations including mRNAs and ESTs, almost half of the observed transcription was observed to be outside of any annotation in 1 of 11 cell lines, with two-thirds found outside of well-characterized exons (Fig. 2A). For any of the individual cell lines studied, almost two-thirds of the mapped transcription is located within one of these annotated regions, whereas novel transcripts make up only one-third of the observed transcription (Fig. 2B). Using this conservative analysis approach, the proportion of novel transcription is slightly smaller then previously reported. However, these estimates are likely to represent an underestimate of the amount of novel transcription, especially because a sliding-window-based analysis on single positive probe levels indicated that
Although only a third of the observed transcription corresponds to well-characterized genes, the distribution of positive probes in exons found that many exons are either all "off" or all "on" (Supplemental Fig. S1; Fig. 3). By generating differential expression profiles for all the exons within a gene, we have specifically identified a large population of alternative gene expression based on a single mode used by the splicing machinery, exon skipping (Fig. 4). Whereas our observations indicate that 12%21% of the genes on Chromosomes 21 and 22 have a single isoform, the remaining 79%88% have multiple forms. This estimate is considerably higher than previously noted by the analysis of the public databases (Mironov et al. 1999
RNA transcription of protein-coding mRNAs observed from both strands of a locus (Labrador et al. 2001
Approximately 11% of the observed transcription of base pairs in exons, mRNAs, and EST was also found to be antisense (Table 1C; Supplemental Fig. S3). Although the strand-specific data are informative, they are considerably less sensitive than the cDNA-based maps. By extrapolation, this implies that at least 20% of the total base pairs on Chromosomes 21 and 22 constitute antisense transcription. Such wide-spread antisense transcription in human and other eukaryotic genomes is becoming increasingly evident (Lehner et al. 2002
The observed novel transcription found distal to well-characterized exons is likely to represent novel (coding and noncoding) transcripts (Fig. 6). Because these novel transfrags are a significant distance to any annotation, their inclusion as interrogated sites of the genome using other types of array platforms is unlikely, given that such arrays do not interrogate the genome on such a scale or in an unbiased manner (Chen et al. 2002 Finally, the possibility that the observed novel transcription was due to cross-hybridization to other sequence-related regions in the genome was investigated and shown not to be a major contributing factor (A. Piccolboni, S. Cawley, S. Bekiranov, and T.R. Gingeras, unpubl.). These data indicated that neither gene families, pseudogenes, nor partial duplication of probe sequences contribute significantly to the hybridization signals observed for probe pairs interrogating novel transcribed regions.
A biological function for some portion of these novel transcripts, both proximal and distal to well-characterized annotations, is supported by their evolutionary sequence conservation when compared with the mouse genome (
With the increasing number of large-scale comparative genomic studies and the completion of a working draft of the human genome sequence, our comprehension of the organization, size, and structure of the genome continues to increase. The present estimate of 30,00040,000 genes in the human genome (Lander et al. 2001 The accumulation of this groundswell of recent reports indicating a greater amount of transcription than previously determined offers two possible ideas for consideration. First, the use of the term "gene" to identify all the transcribed units in the genome may need reconsideration, given the fact that this is a term that was coined to denote a genetic concept and not necessarily a physical and measurable entity. With respect to the efforts to enumerate all functional transcribed units, it may be helpful to consider using the term "transcript(s)" in place of gene. This suggestion has the attraction of allowing for the enumeration of each of the isoforms for all presently annotated genes as well as any distinct novel transcript that possesses some but not all of the properties of well-characterized coding transcripts. Thus, the fact that a transcript possesses coding capabilities would only be one of its possible features, not the most important one. Following on this last consideration, a second possible consideration stemming from the growing list of transcribed regions of the genome is the likelihood that the present efforts in estimating the total number of genes in the genome is misguided and at the very least miscalculated. These efforts are misguided given the discussion presented previously that a more useful entity to be counted is the number of transcripts. They are also miscalculated because such estimates are biased strongly in favor of protein-coding transcripts. Although we believe it is premature to arrive at an accurate estimate of the total number of transcripts that a living cell could synthesize, based on our own studies it is likely to be a much larger number than the 30,00040,000 present estimates.
Cell Culture, Nucleic Acid Purification, cDNA Synthesis, Fragmentation, Labeling, and Hybridization See Kapranov et al. (2002
Determination of Thresholds and Estimation of Sensitivity/Specificity
Construction of Transcription Maps for the Human Chromosomes 21 and 22
Direct RNA End-Labeling Assay
RT-PCR and Sequencing of Cloned PCR Products
The authors thank Victor Sementchenko, Alan Williams, and Kevin Struhl for helpful discussions during data analysis; Sandeep Patel, Ray Wheeler, and Harley Gorrell for technical assistance; Kyle Cole and Vivi Truong for advice concerning direct-RNA labeling methodology; and Robert L. Strausberg for encouragement and support. Support for this work was provided in part by NCI contracts (21XS019A and 21XS019B) and No1-LO-12400 and Affymetrix, Inc. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2094104.
1 Corresponding author. [Supplemental material is available online at www.genome.org. All novel, sequence-verified transcripts (Supplemental Table S4) have been submitted to dbEST (CF798425 [GenBank] CF798506). The following individuals kindly provided unpublished information as indicated in the paper: K. Cole, V. Truong, D. Barone, G. McGall, H.H. Ng, E.A. Sekinger, A.J. Williams, R. Wheeler, B. Wong, and K. Struhl.]
Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19: 185193. Cawley, S., Bekiranov, S., Ng, H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A., et al. 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 point to widespread regulation of non-coding RNAs. Cell (in press).
Chen, J., Sun, M., Lee, S., Zhou, G., Rowley, J.D., and Wang, S.M. 2002. Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc. Natl. Acad. Sci. 99: 1225712262.
Collins, J.E., Goward, M.E., Cole, C.G., Smink, L.J., Huckle, E.J., Knowles, S., Bye, J.M., Beare, D.M., and Dunham, I. 2003. Reevaluating human gene annotation: A second generation analysis of human Chromosome 22. Genome Res. 13: 2736.
Conrad, C., Vianna, C., Freeman, M., and Davies, P. 2002. A polymorphic gene nested within an intron of the Dermitzakis, E.T., Reymond, A., Lyle, R., Scamuffa, N., Ucla, C., Deutsch, S., Stevenson, B.J., Flegel, V., Bucher, P., Jongeneel, C.V., et al. 2002. Numerous potentially functional but non-genic conserved sequences on human Chromosome 21. Nature 420: 578582.[CrossRef][Medline] Ewing, B. and Green, P. 2000. Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25: 232234.[CrossRef][Medline]
Fodor, S.P., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A.T., and Solas, D. 1991. Light-directed, spatially addressable parallel chemical synthesis. Science 251: 767773. Fodor, S.P., Rava, R.P., Huang, X.C., Pease, A.C., Holmes, C.P., and Adams, C.L. 1993. Multiplexed biochemical assays with biological chips. Nature 364: 555556.[CrossRef][Medline]
Guigo, R., Dermitzakis, E.T., Agarwal, P., Ponting, C.P., Parra, G., Reymond, A., Abril, J.F., Keibler, E., Lyle, R., Ucla, C., et al. 2003. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl. Acad. Sci. 100: 11401145. Hollander, M. and Wolfe, D.A. 1999. Nonparametric statistical methods, 2nd ed. John Wiley and Sons, Inc., New York.
Kan, Z., Rouchka, E.C., Gish, W.R., and States, D.J. 2001. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11: 889900.
Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P., and Gingeras, T.R. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916919.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., et al. 2003. The UCSC Genome Browser Database. Nucleic Acids Res. 31: 5154.
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 9961006. Kiss, T. 2002. Small nucleolar RNAs: An abundant group of noncoding RNAs with diverse cellular functions. Cell 109: 145148.[CrossRef][Medline] Labrador, M., Mongelard, F., Plata-Rengifo, P., Baxter, E.M., Corces, V.G., and Gerasimova, T.I. 2001. Protein encoding by both DNA strands. Nature 409: 1000.[CrossRef][Medline] Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline] Lehner, B., Williams, G., Campbell, R.D., and Sanderson, C.M. 2002. Antisense transcripts in the human genome. Trends Genet. 18: 6365.[CrossRef][Medline] Levinson, B., Kenwrick, S., Gamel, P., Fisher, K., and Gitschier, J. 1992. Evidence for a third transcript from the human factor VIII gene. Genomics 14: 585589.[CrossRef][Medline] Liang, F., Holt, I., Pertea, G., Karamycheva, S., Salzberg, S.L., and Quackenbush, J. 2000. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25: 239240.[CrossRef][Medline]
Mironov, A.A., Fickett, J.W., and Gelfand, M.S. 1999. Frequent alternative splicing of human genes. Genome Res. 9: 12881293.
Modrek, B., Resch, A., Grasso, C., and Lee, C. 2001. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29: 28502859. Nekrutenko, A., Chung, W.Y., and Li, W.H. 2003. An evolutionary approach reveals a high protein-coding capacity of the human genome. Trends Genet. 19: 306310.[CrossRef][Medline] Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H., et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563573.[CrossRef][Medline] Pasquinelli, A.E. and Ruvkun, G. 2002. Control of developmental timing by microRNAs and their targets. Annu. Rev. Cell Dev. Biol. 18: 495513.[CrossRef][Medline]
Pease, A.C., Solas, D., Sullivan, E.J., Cronin, M.T., Holmes, C.P., and Fodor, S.P. 1994. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. 91: 50225026. Rahman, L., Bliskovski, V., Reinhold, W., and Zajac-Kaye, M. 2002. Alternative splicing of brain-specific PTB defines a tissue-specific isoform pattern that predicts distinct functional roles. Genomics 80: 245249.[CrossRef][Medline]
Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P., Gerstein, M., et al. 2003. The transcriptional activity of human Chromosome 22. Genes & Dev. 17: 529540. Saha, S., Sparks, A.B., Rago, C., Akmaev, V., Wang, C.J., Vogelstein, B., Kinzler, K.W., and Velculescu, V.E. 2002. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20: 508512.[CrossRef][Medline] Shendure, J. and Church, G.M. 2002. Computational discovery of sense-antisense transcription in the human and mouse genomes. Genome Biol. 3: RESEARCH0044.[Medline]
Storz, G. 2002. An expanding universe of noncoding RNAs. Science 296: 12601263.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. 2001. The sequence of the human genome. Science 291: 13041351. Yelin, R., Dahary, D., Sorek, R., Levanon, E.Y., Goldstein, O., Shoshan, A., Diber, A., Biton, S., Tamir, Y., Khosravi, R., et al. 2003. Widespread occurrence of antisense transcription in the human genome. Nat. Biotechnol. 21: 379386.[CrossRef][Medline]
Received October 27, 2003;
accepted in revised format January 6, 2004.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||