|
|
|
|
Published online before print
December 12, 2005, 10.1101/gr.3883606 Genome Res. 16:45-54, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Letter Abundant novel transcriptional units and unconventional gene pairs on human chromosome 22Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA
Novel transcriptional units (TUs) are EST-supported transcribed features not corresponding to known genes. Unconventional gene pairs (UGPs) are pairs of genes and/or TUs sharing exon-to-exon cis-antisense overlaps or putative bidirectional promoters. Computational TU and UGP discovery followed by manual curation was performed in the entire published 34.9-Mb human chromosome 22 euchromatic sequence. Novel TUs (n = 517) were as abundant as known genes (n = 492) and typically did not have nonprimate DNA and protein homologies. One hundred seventy-one (33%) of TUs, but only 13 (3%) of genes, both lacked nonprimate conservation and localized to gaps in the humanmouse BLASTZ alignment. Novel TUs were richer in exonic primate-specific interspersed repetitive elements (P = 0.001) and were more likely to rely on splice junctions provided by them, than were known genes: 19% of spliced TUs, versus 5% of spliced genes, had a splice site within a primate-specific repeat. Hence, novel TUs and known genes may represent different portions of the transcriptome. Two hundred nine (21%) of chromosome 22 transcripts participated in 77 cis-antisense and 42 promoter-sharing UGPs. Transcripts involved simultaneously in both UGP types were more common than was expected (P = 0.01). UGPs were nonrandomly distributed along the sequence: 89 (75%) clustered in distinct regions, the sum of which equaled 4.4 Mb (<13% of the chromosome). Eighty (67%) of the UGPs possessed significant locus structure differences between primates and rodents. Since some TUs may be functional noncoding transcripts and since the cis-regulatory potential of UGPs is well recognized, TUs and UGPs specific to the primate lineage may contribute to the genomic basis for primate-specific phenotypes.
Despite the publication of a highly accurate human genome sequence (International Human Genome Sequencing Consortium [IHGSC] 2004
Transcriptome hybridization to genomic tiling arrays suggests that noncoding TUs are surprisingly widespread and that existing annotations greatly underestimate the number of transcribed features (Shoemaker et al. 2001
A second intriguing feature of genome organization emerging from cDNA discovery projects is the abundance of unconventional gene pairs (UGPs), each comprised of two transcripts that overlap or are in close proximity to one another, in a manner suggesting coordinated regulation of gene expression. These UGPs include naturally occurring cis-antisense pairs, as well as genes and/or TUs that share a putative bidirectional promoter. Noncoding TUs and UGPs may be functionally linked in that many noncoding RNAs are antisense to coding genes and have been shown to regulate gene expression in many species (Hildebrandt and Nellen 1992 We define a TU as one or more flcDNA-supported and/or EST-supported transcripts mapping to the same locus and sharing exonic sequence on the same strand. In this report, we refer to TUs identical to known genes as "known genes," or simply "genes," and to TUs identified by our analysis but devoid of known-gene identities and public-database annotations as "novel TUs," or simply "TUs." The goal of the present study was to annotate TUs and UGPs on chromosome 22 (chr22) and to characterize their incidence, evolutionary conservation, and distribution along the genomic landscape of the chromosome.
Characterization of known genes and novel TUs To catalog known genes and novel TUs, all genomic clones comprising the chr22 tiling path were subjected to a Perl-based analysis pipeline (see Methods). For every clone, matching ESTs and cDNAs were identified and their exonintron structures defined. Transcripts with better scoring matches at a genomic locale other than the query clone were excluded. EST-supported transcribed features without full-length cDNA evidence were operationally defined as putative novel TUs. These TUs were manually analyzed to eliminate ESTs with ambiguous orientation and those likely originating from pre-mRNA and genomic contaminants. Remaining TUs were further curated to minimize artifactual fragmentation of genes with long UTRs into multiple transcript models; EST clusters within 10 kb of known-gene boundaries with expression profiles complementary to the known genes were generally considered UTR extensions and not standalone TUs.
Chr22 yielded 1009 transcript models: 492 genes and 517 TUs (Table 1). Most known genes were supported by the Sanger chr22 reference gene catalog (Collins et al. 2003
Sensitivity and specificity of known gene identification We defined the sensitivity of our method as the percentage of Sanger genes we successfully detected and annotated. Of the 577 Sanger genes that were neither pseudogenes nor immunoglobulins, 468 (Table 2, rows 1, 2) were identified by our approach, for a sensitivity of 81%. To determine the reason for this potentially subpar sensitivity, we analyzed the 109 Sanger genes that lacked equivalents in our data set (Table 3). Only 11 of these Sanger genes were missing due to problems with our algorithm. The rest were undetected because they did not meet our criteria for a gene or TU: that a genomic sequence be transcribed, that the transcript be represented by a feature other than a single unspliced nonpolyadenylated cDNA or EST, and that the sequence not contain any immunoglobulin homology. Therefore, the discrepancy between our and Sanger catalogs is due primarily to differences in operational definitions of transcribed features, with our definition being more rigorous.
We defined the specificity of our approach as the fraction of Sanger pseudogenes that our algorithm examined and excluded, rather than mistakenly including them among genes or TUs. Of the 234 Sanger pseudogenes, 207 did not match any of our genes or TUs, for a specificity of 88%. The other 27 Sanger pseudogenes all had cDNA or EST evidence for sense-strand transcription and thus were included in our analysis.
Quality assessment of novel TUs To test the quality of unspliced, singleton-EST TUs, we checked for perfect identity of ESTs to genomic sequence at canonical AATAAA or ATTAAA polyadenylation signals present within the 3'-most 40 bp of the ESTs. Only four of 100 randomly selected singleton-EST TUs had sequencing errors. This indicates that the majority of those TUs which are defined solely by singleton ESTs probably originate from biologically real, canonically polyadenylated transcripts.
Our splice-based and polyadenylation-based estimates of the fraction of novel TUs representing biologically real transcripts (85% and 96%, respectively) are likely conservative, because completely noncanonical splice sites and polyadenylation signals have been reported in mammals (Caffrey et al. 2000
Nonprimate homologies and protein-coding potential of known genes and novel TUs We then evaluated whether some of these human genes and TUs might be protein-coding despite high nucleotide-level divergence (see Methods). However, BLASTX alignments indicated that ORFs of only 17 (25%) of the genes and 13 (3%) of the TUs primate-specific by BLASTN had homology to nonprimate proteins (Supplemental Table 3). Therefore, the majority of genes and TUs apparently specific to primates are unlikely to represent highly diverged coding transcripts. Finally, we compared ORF lengths of genes and TUs on chr22 (Supplemental Table 4). Gene ORF lengths significantly exceeded TU ORF lengths (P = 0.0001 by Wilcoxon rank-sum test), suggesting that TUs, to a greater extent than genes, are representative of the noncoding portion of the transcriptome.
Our chr22 results parallel a comparative analysis of human chromosome 21 (chr21) by Gardiner et al. (2003
Primate-specific exonic sequences in known genes and novel TUs Thirty of 155 novel TUs (19%) versus 21 of 434 spliced known genes (5%) had at least one splice junction within a primate-specific repetitive element (P < 0.0001, two-sample binomial z-test), suggesting that engagement of novel intrarepeat splice sites during primate evolution may have been more frequent in the TUs than in the known genes (Supplemental Table 5).
Characterization of cis-antisense UGPs
Surprisingly, the remaining 41 cis-antisense UGPs did not fit either category. Eight structural types of complex antisense pairs could be distinguished by manual annotation (Supplemental Table 7). In the most common structure, an unspliced antisense transcript overlapped one internal exon of a spliced transcript. The unspliced transcript is a novel TU in 10 of 11 of those cases. In other structures, multiple categories of terminalterminal, terminalinternal, and internalinternal exon overlap were seen, revealing a substantial diversity and complexity of antisense-overlap structures.
A gene-only approach to annotation would miss more than half of the cis-antisense pairs on chromosome 22. Our results confirm those of Yelin et al. (2003 For both genegene and geneTU pairs, complex pairs were the prevalent type, followed by tail-to-tail and finally head-to-head pairs. Therefore, the complexity of genomic structure of a given antisense pair does not appear to depend on whether or not the pair includes a TU.
Characterization of putative bidirectionally promoted UGPs
Most (81%) of the putative bidirectional promoters on chr22, and all bidirectional promoters of genegene pairs, overlapped CpG islands (Supplemental Table 8). This is consistent with evidence that the majority of RNA polymerase IItranscribed genes initiating at bidirectional promoters have a CpG island between them (Adachi and Lieber 2002
Anecdotal reports in the literature suggested that mammalian CpG-island bidirectional promoters are frequently devoid of TATA boxes (Smith et al. 1990
Simultaneous involvement of some transcribed features in multiple UGPs Seven genes participated in cis-antisense overlaps with two other genes or TUs (Fig. 1C), while one (UNC84B) participated in three independent cis-antisense overlaps (Supplemental Table 9). We also identified all transcript models on chr22 that shared a putative bidirectional promoter with a second model while also participating in a cis-antisense pair with a third (Fig. 1D). Sixteen genes and four TUs were in this category (Supplemental Table 10). Their counts are summarized in Table 4. Significantly more genes and TUs are involved in both cis-antisense pairs and putative bidirectional promoter pairs than predicted by the frequencies of the two types of independent events (20 observed vs. 11.76 expected, P = 0.01). Therefore, for a given transcript model, presence of one UGP type increases the probability of the other. A remarkable chain (group of genes and TUs connected by multiple UGPs)six genes and TUs linked by three cis-antisense pairs and two putative bidirectional promotersis shown in Figure 2.
Distribution of UGPs along the genomic sequence The distribution of UGPs on chr22 is illustrated in Figure 3. Most UGPs mapped closely to one another within several UGP clusters. We refer to these clusters as UGP islands, operationally defined as regions with at least two UGPs 250 kb from each other. To determine the proportion of human chr22 sequence within UGP islands, we first measured the length of each genomic region corresponding to a cis-antisense UGP island (for coordinates, see Supplemental Table 6). The combined length of the cis-antisense UGP islands on Figure 3 was 3.4 Mb. We emphasize that this was the sum of lengths of UGP islands, rather than of the extremely small exon-to-exon cis-antisense overlaps themselves. The sum of UGP islands enriched in cis-antisense UGPs represented a small fraction of the chr22 sequence, and the majority of the cis-antisense UGPs (63 of 77; 82%) resided in that small fraction (3.4 Mb; 10%) of the total sequence. Similarly, we measured each region corresponding to an island of putatively bidirectionally promoted UGPs (Supplemental Table 8). The combined length of the putative bidirectional promoter UGP islands was 1.5 Mb. This was the sum of lengths of several extensive genomic regions enriched in putatively bidirectionally promoted UGPs, not of the extremely small putative bidirectional promoters themselves. The sum of UGP islands enriched in putative bidirectional promoters represented a small fraction of the chr22 sequence, and the majority of putative bidirectional promoters (26 of 42; 62%) resided in that small fraction (1.5 Mb; 4%) of the total genomic sequence. Visual examination of UGP island map locations revealed five areas of substantial overlap between cis-antisense islands and putative bidirectional promoter islands. These regions were simultaneously enriched in both types of UGPs. They are represented by ovals at the top of Figure 3 and are found approximately at Mb 8, 16, 22, 26, and 35 on the map.
Islands of putative bidirectional promoters were weakly correlated with locally high CpG island density. Qualitative comparison of our UGP island distribution with Sanger Institute's SuperMap22 did not demonstrate any correlation of UGP islands of either class with GC content, SINE or LINE density, recombination hotspots, humanmouse synteny breakpoints, or recent segmental duplications.
The recent genomewide assessment of cis-antisense pairs in the mouse (Kiyosawa et al. 2003 To assess significance of UGP clustering, we nonparametrically derived four chromosome-wide P-values expressing the likelihood that the observed incidence of antisense pairs, antisense pairs within antisense UGP islands, putative bidirectional promoters, and putative bidirectional promoters within bidirectional-promoter UGP islands can occur by chance (see Methods). For each of the four characteristics, we searched 10,000 gene distribution simulations for instances where the simulated incidence of the UGPs or islands under consideration exceeded the actual incidence. No such instances were found. Therefore, all four P-values were <10-4.
For simulations, we divided chr22 into 20 intervals with different genomic sizes but approximately equal numbers of transcribed features and thus different gene densities. Intervals with similar gene densities had widely varying interval-specific P-values (probabilities that the observed complexity can be matched by chance). Therefore, our analysis does not support the hypothesis that the visually apparent clustering of UGPs along chr22 into UGP islands depends entirely on gene density. This result is consistent with earlier observations that incidence of bidirectional promoters does not correlate with gene density (Adachi and Lieber 2002
In silico expression profiling of UGPs
For 35 (45%) of the 77 antisense pairs, human ESTs suggested expression of both members of the pair in the same tissue or cell type, allowing the possibility of post-transcriptional regulation by dsRNA mechanisms. This is less than the 67% seen in a small-scale (n = 39) experimental test of expression profile complementarity in cis-antisense pairs identified in silico (Shendure and Church 2002 Of those 35 pairs, 18 were genegene, 16 were geneTU, and one was TUTU. These proportions approximately mirrored the proportions of the three pair types in the total chr22 antisense data set. These proportions support the nonartifactual nature of TUs: If TUs cis-antisense to genes were likely to be derived from artifactually misoriented ESTs of those genes, then geneTU cis-antisense pairs with expression profile complementarity would be disproportionately common, rather than occur at a frequency corresponding to their frequency in the total data set.
Thirty-four (81%) of the 42 putative bidirectionally promoted transcript model pairs on chr22 had expression profile complementarity. This is significantly greater than the 45% seen for antisense pairs. Therefore, antisense pairs may be less likely to have expression profile complementarity than do putative bidirectionally promoted pairs. This extent of expression profile complementarity in putative bidirectionally promoted pairs is consistent with the finding that the majority of human bidirectional promoters are coregulatory and contain cis-regulatory elements affecting both genes at once (Trinklein et al. 2004
Humanmouse comparative analysis of UGPs Of the 84 putatively bidirectionally promoted transcripts on chr22 (60 genes and 24 TUs), 25 (30%) had no BLASTN-detectable homologies to any mouse transcribed or genomic sequences in the NR, EST, and MGSCv3 divisions of GenBank. They included seven genes and 18 TUs. Therefore, 12% (7/60) of putatively bidirectionally promoted genes and 75% (18/24) of putatively bidirectionally promoted TUs lacked mouse homology. In addition, only 10 (24%) of the 42 putative bidirectionally promoted pairs had their genomic structure completely conserved in the mouse. The present analysis suggests that putative bidirectionally promoted as well as cis-antisense UGPs frequently have lineage-specific genomic structures and on occasion harbor lineage-specific transcripts.
Parallels to previous TU and UGP analyses UGPs and novel EST-supported TUs have been identified in the human genome in previous studies (including Shendure and Church 2002
Cis-regulation, nonconservation, and the bimodal transcriptome The most striking feature of novel TUs relative to known genes is the near-absence of nonhuman homologies. We infer that some TUs are lineage specific to primates and perhaps solely to humans. If some of these TUs are functional, then their lineage specificity can provide part of the genomic basis for primate- and human-specific phenotypes.
The potentially large number of lineage-specific transcripts in humans lends new credence to the assertion that our ability to model human biological processes in nonhuman models must be critically reexamined (Margolin 2001 This is the first analysis to tabulate demonstrably primate-specific sequences in exons on human chr22, which add up to 71 kb (Supplemental Table 5). Although 71 kb of exonic sequence is not a lot, it is a highly conservative estimate due to its omission of primate-specific sequences other than Alu and Mer1 elements and its failure to account for primate-specific repeats in alternatively spliced and polyadenylated regions that are not parts of our reference transcripts. Even this small amount of sequence, however, affords an interesting glimpse into how much of a human chromosome can become newly recruited into transcribed structures specifically in the course of primate evolution.
One of the most noteworthy properties of our chr22 UGP set was the frequent incidence of genes and TUs participating in multiple types and instances of UGPs. This challenges the accepted view that clusters of closely spaced but functionally unrelated genes in mammals are rare (Angiolillo et al. 2002 Together, such genes and TUs signify that regulatory relationships specified by the genomic proximity or overlap of expressed features may be more complex than is simple coregulation or antiregulation of bidirectionally promoted pairs or the downregulation of a sense gene by an antisense TU. We propose that clusters of apparently functionally unrelated genes and TUs linked by combinations of UGPs are analogous to the sentences of a new sequence-based regulatory language. The words of this language are the transcribed features themselves. The exon-to-exon cis-antisense overlaps in which they are involved, and the bidirectional promoters that some of them share, are the punctuation marks. The sentences are to be deciphered along the genomic sequence. It is therefore distressing that the majority of transcripts involved in multiple UGPs do not have a known function. Transcriptome-wide studies should move beyond large-scale cDNA sequencing and toward large-scale functional investigations of the sequenced transcripts. If they do not, any sequence-based regulatory language will be as mystifying as a language with a non-Latin alphabet is to a monolingual English speaker.
Implications for gene birth and primate-specific phenotypes
Second, which TUs are evolutionarily young genes? The origin of new genes is recognized as a fundamental biological process that is essential for the appearance of novel biological functions and makes a major contribution to genetic diversity. However, the exact mechanisms giving rise to new genes remain to be elucidated, although one mechanism by which new genes are created is the shuffling of existing coding-gene exons, which generates both coding and noncoding new genes and is often facilitated by retrotransposition (Long et al. 2003 The genomic structures of certain human TUs and UGPs strongly suggest that the existence of those TUs and UGPs is made possible by primate-specific sequences, primate-specific genomic structures, or both. Therefore, the third question is whether primate-specific, and possibly human-specific, TUs and UGPs comprise an essential part of the genomic basis of primate-specific phenotypic characteristics and of the phenotypes that so strikingly differentiate humans from other primates, respectively.
Definitions We define known genes as those represented by at least one experimentally based, full-length cDNA in the NT division of GenBank, regardless of coding potential. We define novel TUs as transcribed features in the genome other than known genes. TUs are predicted in silico from EST-to-genomic DNA alignments in which the ESTs do not correspond to exons of known genes. ESTs comprising a TU must be canonically spliced (GT-AG introns) and/or canonically polyadenylated (AATAAA or ATTAAA polyadenylation signal within 40 bp of the submitter-indicated 3' end). Since all EST-to-genomic alignments were manually curated, the presence of the polyadenylation signals in high-quality EST and genomic sequence was verified. Combined with the requirement for splicing and/or canonical polyadenylation, this effectively eliminated ESTs primed from genomic (A)n stretches. We excluded ESTs from the ORESTES (Strausberg et al. 2002
TUs represent genomic segments capable of generating transcripts, regardless of the coding capacity of those transcripts. They are inferred solely from EST evidence, with a single clone sufficient to define a TU in some cases, although a TU may not be supported by a singleton unspliced EST without a canonical polyadenylation signal. For every TU supported by multiple ESTs, we used the 5'-most EST-supported putative transcription start site to define the 5' boundary, and the 3'-most EST-supported polyadenylation site to define the 3' boundary. We defined TUs in a strand-specific fashion, as did Okazaki et al. (2002 UGPs are of two types. An exon-to-exon cis-antisense gene pair is a pair of overlapping genes or TUs, transcribed from the opposite strands of the same locus (Fig. 1A). A putative bidirectionally promoted pair is a pair of divergently transcribed features whose transcription start sites are separated by <1 kb of genomic sequence (Fig. 1B).
Perl-based TU discovery and UGP analysis pipeline
Nonprimate conservation analysis of gene and TU DNA and protein sequences BLASTZ alignments to mouse were visualized from the July 2003 human assembly at the UCSC Genome Browser. Only exons of the human genes and TUs were considered when reporting whether a BLASTZ alignment existed. No distinction was made between partial and complete overlaps of an exon with a sequence block alignable to mouse by BLASTZ. Genes and TUs whose exons at least partially corresponded to blocks on the "Chained BLASTZ mouse/human alignment," but not to blocks on the "BLASTZ mouse, tight subset of best alignments," were reported as "aligned but not tight." For TUs involved in antisense pairs, we excluded the region of cis-antisense overlap in all BLASTN and BLASTZ analyses, focusing instead on sequence conservation in exonic regions unique to the TU and not shared with potentially conserved genes in the same locus. BLASTX criteria were as follows: low complexity filtering enabled, Expect = 10, word size = 3, matrix BLOSUM62. Only sense-strand protein homologies outside of masked repetitive and low-complexity sequence were considered. Homologies to low-complexity protein sequences corresponding to unmasked DNA sequences, including but not limited to proline-rich and glycine-rich tracts, were disregarded during the manual analysis. BLASTX-suggested putative orthologs were considered only if their ORF direction corresponded to the correct direction of transcription of the query.
Nonparametric assessment of UGP distribution
Expression profile complementarity and interspecies comparative analysis of UGPs To determine if a human antisense pair was conserved at the orthologous mouse locus, we submitted the region of overlap to BLASTN against the MGSCv3 mouse genome database at NCBI and used the highest-scoring hit on the mouse genome as a BLASTN query against the mouse subset of dbEST. We searched for evidence that the mouse query is transcribed in both directions, with an approach identical to that used in human. In addition, if one or both members of the human pair had mouse orthologs, and if the human configuration was either tail-to-tail or head-to-head, we searched for positional equivalents of antisense transcripts in the mouse by analyzing the directionality of mouse EST matches to the last or first exon, respectively, of the mouse transcripts. We did not perform expression profile complementarity testing in mouse. We tested for expression profile complementarity in each putative bidirectionally promoted pair with the same protocol as that used for antisense analysis. To determine the structure of the orthologous mouse locus, we used the reference transcripts (flcDNAs, ESTs, or hand-constructed contigs bearing the most characteristic exonintron footprints) of the two members of each human pair as BLASTN queries against the mouse subsets of the NT and EST databases. The top-scoring mouse hit, if any, was the putative mouse ortholog. When both human pair members had putative mouse orthologs, the mouse orthologs were submitted to BLASTN against the MGSCv3 mouse genome database at NCBI (Expect = 10, default filter, no MegaBLAST). The coordinates of their 5' ends on the mouse genomic sequence were used to determine locus structure in the mouse. Whenever only one of the two human pair members had a putative mouse ortholog, we manually interpreted BLAST output for 1 kb of genomic sequence upstream of the 5'-most known end of the mouse ortholog, searching for divergently transcribed ESTs indicative of positionally equivalent TUs. When curating the outputs of BLAST searches against the mouse EST database, we eliminated all ESTs with "RIKEN" in their FASTA descriptor, due to well-known misorientation problems in the RIKEN mouse EST set.
We thank Phil Green and Debbie Nickerson for helpful discussions and guidance, as well as Ming K. Lee for assistance with SeqHelp and advice on Perl programming. This work was supported by NIH training grants HG-00035 and CA-09437, as well as by NIH R01 grant CA-27632.
Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.3883606.
1 Present address: Information and Mathematical Sciences Group, Genome Institute of Singapore, 138672 Singapore.
2 Corresponding author. [Supplemental material is available online at www.genome.org.]
Adachi, N. and Lieber, M.R. 2002. Bidirectional gene organization: A common architectural feature of the human genome. Cell 109: 807-809.[CrossRef][Medline] Andres, A.M., Soldevila, M., Saitou, N., Volpini, V., Calafell, F., and Bertranpetit, J. 2003. Understanding the dynamics of Spinocerebellar ataxia 8 (SCA8) locus through a comparative genetic approach in humans and apes. Neurosci. Lett. 336: 143-146.[CrossRef][Medline] Angiolillo, A., Russo, G., Porcellini, A., Smaldone, S., D'Alessandro, F., and Pietropaolo, C. 2002. The human homologue of the mouse Surf5 gene encodes multiple alternatively spliced transcripts. Gene 284: 169-178.[CrossRef][Medline] Burset, M., Seledtsov, I.A., and Solovyev, V.V. 2000. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28: 4364-4375. Caffrey, J.J., Safrany, S.T., Yang, X., and Shears, S.B. 2000. Discovery of molecular and catalytic diversity among human diphosphoinositol-polyphosphate phosphohydrolases: An expanding Nudt family. J. Biol. Chem. 275: 12730-12736. Carninci, P., Waki, K., Shiraki, T., Konno, H., Shibata, K., Itoh, M., Aizawa, K., Arakawa, T., Ishii, Y., Sasaki, D., et al. 2003. Targeting a complex transcriptome: The construction of the mouse full-length cDNA encyclopedia. Genome Res. 13: 1273-1289. Chong, A., Zhang, G., and Bajic, V.B. 2004. Information for the Coordinates of Exons (ICE): A human splice sites database. Genomics 84: 762-766.[CrossRef][Medline] Collins, J.E., Goward, M.E., Cole, C.G., Smink, L.J., Huckle, E.J., Knowles, S., Bye, J.M., Beare, D.M., and Dunham, I. 2003. Reevaluating human gene annotation: A second-generation analysis of chromosome 22. Genome Res. 13: 27-36. Courseaux, A. and Nahon, J.L. 2001. Birth of two chimeric genes in the Hominidae lineage. Science 291: 1293-1297. Delihas, N. and Forst, S. 2001. MicF: An antisense RNA gene involved in response of Escherichia coli to global stress factors. J. Mol. Biol. 313: 1-12.[CrossRef][Medline] Edgar, A.J. 2003. The gene structure and expression of human ABHD1: Overlapping polyadenylation signal sequence with Sec12. BMC Genomics 4: 18.[CrossRef][Medline] Ejima, Y. and Yang, L. 2003. Trans mobilization of genomic DNA as a mechanism for retrotransposon-mediated exon shuffling. Hum. Mol. Genet. 12: 1321-1328. Enard, W., Khaitovich, P., Klose, J., Zollner, S., Heissig, F., Giavalisco, P., Nieselt-Struwe, K., Muchmore, E., Varki, A., Ravid, R., et al. 2002. Intra- and interspecific variation in primate gene expression patterns. Science 296: 340-343. Feng, J., Funk, W.D., Wang, S.S., Weinrich, S.L., Avilion, A.A., Chiu, C.P., Adams, R.R., Chang, E., Allsopp, R.C., and Yu, J. 1995. The RNA component of human telomerase. Science 269: 1236-1241. Gardiner, K., Fortna, A., Bechtel, L., and Davisson, M.T. 2003. Mouse models of Down syndrome: How useful can they be? Comparison of the gene content of human chromosome 21 with orthologous mouse genomic regions. Gene 318: 137-147.[CrossRef][Medline] Harrington, J.J., Sherf, B., Rundlett, S., Jackson, P.D., Perry, R., Cain, S., Leventhal, C., Thornton, M., Ramachandran, R., Whittington, J., et al. 2001. Creation of genome-wide protein expression libraries using random activation of gene expression. Nat. Biotechnol. 19: 440-445.[CrossRef][Medline] Hildebrandt, M. and Nellen, W. 1992. Differential antisense transcription from the Dictyostelium EB4 gene locus: Implications on antisense-mediated regulation of mRNA stability. Cell 69: 197-204.[CrossRef][Medline] Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., et al. 2002. The Ensembl genome database project. Nucleic Acids Res. 30: 38-41. International Human Genome Sequencing Consortium (IHGSC). 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931-945.[CrossRef][Medline] Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P., and Gingeras, T.R. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916-919. Kawashima, I., Mita-Honjo, K., and Takiguchi, Y. 1992. Characterization of the primate-specific repetitive DNA element MER1. DNA Seq. 2: 313-318.[Medline] Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 996-1006. Kiyosawa, H., Yamanaka, I., Osato, N., Kondo, S., and Hayashizaki, Y. 2003. Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. Genome Res. 13: 1324-1334. Kramer, C., Loros, J.J., Dunlap, J.C., and Crosthwaite, S.K. 2003. Role for antisense RNA in regulating circadian clock function in Neurospora crassa. Nature 421: 948-952.[CrossRef][Medline] Kumar, A., Harrison, P.M., Cheung, K.H., Lan, N., Echols, N., Bertone, P., Miller, P., Gerstein, M.B., and Snyder, M. 2002. An integrated approach for finding overlooked genes in yeast. Nat. Biotechnol. 20: 58-63.[CrossRef][Medline] Kutach, A.K. and Kadonaga, J.T. 2000. The downstream promoter element DPE appears to be as widely used as the TATA box in Drosophila core promoters. Mol. Cell. Biol. 20: 4754-4764. Larsson, T.P., Murray, C.G., Hill, T., Fredriksson, R., and Schioth, H.B. 2005. Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery. FEBS Lett. 579: 690-698.[CrossRef][Medline] Lee, R.C. and Ambros, V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294: 862-864. Lewin, B. 2000. Genes VII. Oxford University Press, New York. Long, M., Deutsch, M., Wang, W., Betran, E., Brunet, F.G., and Zhang, J. 2003. Origin of new genes: Evidence from experimental and computational analyses. Genetica 118: 171-182.[CrossRef][Medline] Margolin, J. 2001. From comparative and functional genomics to practical decisions in the clinic: A view from the trenches. Genome Res. 11: 923-925. Mattick, J.S. 2003. Challenging the dogma: The hidden layer of non-protein-coding RNAs in complex organisms. Bioessays 25: 930-939.[CrossRef][Medline] Millar, R., Conklin, D., Lofton-Day, C., Hutchinson, E., Troskie, B., Illing, N., Sealfon, S.C., Hapgood, J. 1999. A novel human GnRH receptor homolog gene: Abundant and wide tissue distribution of the antisense transcript. J. Endocrinol. 162: 117-126.[Abstract] Misra, S., Crosby, M.A., Mungall, C.J., Matthews, B.B., Campbell, K.S., Hradecky, P., Huang, Y., Kaminker, J.S., Millburn, G.H., Prochnik, S.E., et al. 2002. Annotation of the Drosophila melanogaster euchromatic genome: A systematic review. Genome Biol. 3: research0083. Numata, K., Kanai, A., Saito, R., Kondo, S., Adachi, J., Wilming, L.G., Hume, D.A., Hayashizaki, Y., and Tomita, M. 2003. Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Res. 13: 1301-1306. Ohlsson, R., Paldi, A., and Graves, J.A. 2001. Did genomic imprinting and X chromosome inactivation arise from stochastic expression? Trends Genet. 17: 136-141.[CrossRef][Medline] Okazaki, Y. and Hume, D.A. 2003. A guide to the mammalian genome. Genome Res. 13: 1267-1272. Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H., et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563-573.[CrossRef][Medline] Qvist, H., Sjostrom, H., and Noren, O. 1998. The TATA-less, GC-rich porcine dipeptidylpeptidase IV (DPPIV) promoter shows bidirectional activity. Biol. Chem. 379: 75-81.[Medline] Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P., Gerstein, M., et al. 2003. The transcriptional activity of human chromosome 22. Genes & Dev. 17: 529-540. Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. 2004. Identification of mammalian microRNA host genes and transcription units. Genome Res. 14: 1902-1910. Seki, Y., Ikeda, S., Kiyohara, H., Ayabe, H., Seki, T., and Matsui, H. 2002. Sequencing analysis of a putative human O-sialoglycoprotein endopeptidase gene (OSGEP) and analysis of a bidirectional promoter between the OSGEP and APEX genes. Gene 285: 101-108.[CrossRef][Medline] Shendure, J. and Church, G.M. 2002. Computational discovery of sense-antisense transcription in the human and mouse genomes. Genome Biol. 3: research0044. Shklar, M., Strichman-Almashanu, L., Shmueli, O., Shmoish, M., Safran, M., and Lancet, D. 2005. GeneTideTerra Incognita Discovery Endeavor: A new transcriptome focused member of the GeneCards/GeneNote suite of databases. Nucleic Acids Res. 33: D556-D561. Shoemaker, D.D., Schadt, E.E., Armour, C.D., He, Y.D., Garrett-Engele, P., McDonagh, P.D., Loerch, P.M., Leonardson, A., Lum, P.Y., Cavet, G., et al. 2001. Experimental annotation of the human genome using microarray technology. Nature 409: 922-927.[CrossRef][Medline] Sleutels, F., Zwart, R., and Barlow, D.P. 2002. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 415: 810-813.[Medline] Smith, M.L., Mitchell, P.J., and Crouse, G.F. 1990. Analysis of the mouse Dhfr/Rep-3 major promoter region by using linker-scanning and internal deletion mutations and DNase I footprinting. Mol. Cell. Biol. 10: 6003-6012. Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G., Korf, I., Lapp, H., et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12: 1611-1618. Strausberg, R.L., Feingold, E.A., Grouse, L.H., Derge, J.G., Klausner, R.D., Collins, F.S., Wagner, L., Shenmen, C.M., Schuler, G.D., Altschul, S.F., et al. 2002. Generation and initial analysis of more than 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||