|
|
|
Published online before print
June 18, 2002, 10.1101/gr.62002. Article published online before print in June 2002
Vol. 12, Issue 7, 1068-1074, July 2002
LETTER
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The publication of a draft of the human genome and of large collections of transcribed sequences has made it possible to study the complex relationship between the transcriptome and the genome. In the work presented here, we have focused on mapping mRNA 3' ends onto the genome by use of the raw data generated by the expressed sequence tag (EST) sequencing projects. We find that at least half of the human genes encode multiple transcripts whose polyadenylation is driven by multiple signals. The corresponding transcript 3' ends are spread over distances in the kilobase range. This finding has profound implications for our understanding of gene expression regulation and of the diversity of human transcripts, for the design of cDNA microarray probes, and for the interpretation of gene expression profiling experiments.
[The following individuals kindly provided reagents, samples or unpublished information as indicated in the paper: G. Riggins, C. Ruegg, J.-B. Demoulin, P. Olsson, F. Funari, P. Schneider, L.F. Reis, and J.-C. Renauld]
| |
INTRODUCTION |
|---|
|
|
|---|
Parallel to the sequencing of the human genome, a less heralded but
nevertheless massive effort has been undertaken to
document experimentally the portion of the genome that is transcribed
into RNA, the transcriptome. It is only by comparing it with the
transcriptome that the capacity of the genome to code for the RNAs and
proteins that make up the cell machinery can be precisely defined
(Burge 2001
). Although the mapping of transcribed sequences to the
genome has been used extensively to document the positions of genes
(Caron et al. 2001
), it has not yet been fully exploited to explore the complexity of the transcriptome. Of the three major mechanisms that
contribute to this complexity, alternative initiation of transcription,
splicing, and polyadenylation, the latter seemed most immediately
amenable to analysis because of the wealth of data about transcript 3'
ends provided by the expressed sequence tag (EST) sequences generated
by the NCI Cancer Genome Anatomy Project (Strausberg et al. 2000
) (at
Washington University, the NIH Intramural Sequencing Center, and Incyte
Pharmaceuticals), the Merck Gene Index (Aaronson et al. 1996
), and the
NIH Mammalian Gene Collection (Strausberg et al. 1999
). Although
alternative polyadenylation of transcripts has been known to occur for
a long time, the proportion of transcripts affected, the number of
sites per transcript, and the distances over which alternative sites are spread have been explored exclusively using EST clustering techniques (Gautheret et al. 1998
; Beaudoing and Gautheret 2001
; Pauws
et al. 2001
) and have relied on the poly(A) being documented in the EST
sequences. The most recent of these studies have concluded that >40%
of human transcripts may undergo alternative polyadenylation, but that
most of the observed variation is over a short range (<50 nt) and
driven by a single polyadenylation signal (Beaudoing and Gautheret
2001
; Pauws et al. 2001
). Long-range variation (>1 kb) has so far been
observed only experimentally. We show here that long-range variation is
in fact extremely common, possibly affecting more than half of all genes.
| |
RESULTS |
|---|
|
|
|---|
To generate a transcript to genome map, we have exploited all
publicly available human genome data (finished and draft) and transcriptome data (full-length mRNAs, partial mRNAs, ESTs, and electropherograms from EST projects). We also included reference human
transcript sequences from the RefSeq database. The dataset that we have
constructed comprises a set of alignments between transcript and genome
sequences, documenting the position of the alignment on each sequence,
and a set of poly(A)-proximal sequence tags aligned to the genome
sequence. We visualized the complex relationships between the genome
and full-length mRNAs, ESTs, and 3' tags in the ACEDB environment
(Durbin and Thierry-Mieg 1994
). For chromosomes 21 and 22, which have
been extensively annotated, we included the transcripts identified by
the sequencing consortia (Dunham et al. 1999
; Hattori et al. 2000
);
most of the examples described here were taken from chromosome 21, because they illustrate the additional information gained by using our methods relative to existing genome annotation procedures. We also
developed a program called the Transcriptome Analyzer (tromer; C. Iseli, unpublished data) that uses
the transcript to genome alignments to identify exon boundaries and
analyzes the connectivity of these boundaries. The output of
tromer can be used to reconstruct virtual transcripts from
the underlying genomic sequence following a path from 3' tags along
experimentally verified exon boundaries.
One of the pillars of our strategy is the identification of trusted 3'
tags that provide unique identifiers for transcript 3' ends. We chose
to analyze the 50 nt immediately upstream of the poly(A) tail, as this
should guarantee the uniqueness of the tag (there are 1030
possible tags of length 50, compared with 3 × 109 nt in
the haploid genome) whereas keeping the effects of sequencing errors
reasonably low (approximately a 50% chance of a single error in
typical EST data). A set of candidate tags was selected by identifying
runs of at least 10 A's or T's in the original electropherograms
produced by the Merck, CGAP, and MGC sequencing projects, reverse
complementing the sequence if necessary to end in poly(A) and
extracting the 50 nt preceding the poly(A). This first set was reduced
to unique tags and filtered to remove abundant human repeats and low
complexity sequences. This "clean" set was then mapped onto the
genome sequences and genome-linked tags were derived by combining the
50 nt 3' tag with the 50 nt following the tag in the genome (the total
genome-linked tag length is thus 100 nt). These combined tags were
clustered to resolve short-range variations in the exact
polyadenylation site between individual transcript sequences; it should
be noted that this procedure eliminates from our analysis the
short-range variation shown by Pauws et al. (2001)
. After clustering,
13% of the tags identified genome regions with nonidentical downstream
sequences. The majority of these diverged in only one or two positions
because of genetic polymorphisms or genome duplications (note that we
used all available genome sequence data, not one of the nonredundant
assemblies). Many others map to pseudogenes or retroposons. To
distinguish bona fide mRNA 3'ends from poly(A)-containing sequences
generated by internal priming or genomic contamination, we flagged tags that contained at least 10 A's or 11 A's and G's in the first 15 nt
of downstream genome sequence; this is a conservative estimate of the
minimal requirement for oligo(dT)-mediated priming and ensures that our
tag collection is free of sequences derived from genomic or internal
priming during cDNA library construction. Of 152,307 clustered, genome-matched candidates, 95,787 were judged to represent
trusted 3' tags by these criteria. The tag generation procedure is
summarized in Table 1. The number of 3'
tags will most likely increase as more cell types, pathological
conditions, and differentiation stages are sampled by cDNA cloning.
|
We examined the distribution on the genome of trusted 3'tags relative to cDNA sequences known or predicted to encode proteins. In about half of the genes, we observed intron-less EST matches downstream from the known polyadenylation site(s), many ending in trusted 3' tags and forming clusters that connect them to the known extent of 3' untranslated regions (UTRs). The regions containing the ESTs and 3' tags can extend over several kilobases and contain multiple tags. The most likely explanation for the existence of these features is that transcripts originating from the gene localized upstream can be polyadenylated at many locations and thus contain long 3' UTR. Figure 1a clearly illustrates this point in a well-annotated region of chromosome 21. The mRNA for the NCAM2 gene, which encodes a neural cell adhesion molecule, appears in the RefSeq database of full-length mRNA sequences (NM_004540). The known polyadenylation site is marked by a 3' tag (2 ESTs), and an upstream site is also evident (3' tag derived from 3 ESTs). The 3 kb downstream from these known sites are densely populated by intron-less ESTs (at least 26) and a cDNA clone of unknown function (DKFZp761I1311 from the German cDNA Consortium, EMBL/GenBank AL137344). This cDNA clone and a set of ESTs have been clustered together in the UniGene database, whereas other ESTs mapping downstream have been assigned to yet another cluster. It is clear from the transcript to genome map that all of the intron-less transcript sequences in this 3-kb region are in fact derived from the 3' UTR of polyadenylation variants of the NCAM2 gene, and therefore that NCAM2 transcripts have been grouped in at least three distinct UniGene clusters (Hs.177691 for the NCAM2 cDNA, Hs.135892 for the DKFZ clone, and Hs.76118 for the 3'-most cluster of ESTs). There is a single gap in the EST sequence coverage of the region that is bridged by the assignment of two ESTs (N51204 and N47997) to the ends of the same IMAGE clone (281608).
|
Each gene with an extended 3' UTR is a unique case, and therefore it is
difficult to design a generally applicable automated procedure to
detect such genes. To get a semiquantitative estimate of the proportion
of genes that may be affected, we manually examined all genes in a
relatively gene-rich region on chromosome 21q22.3 that spans 3.5 mb
(NCBI contig NT_011515). A summary of the results is shown in Table
2, and an ACEDB database incorporating all transcripts and 3' tags mapping to this contig can be downloaded from
ftp://ftp.licr.org/pub/Genome_Research. Of a total of 52 genes
currently annotated in this region, half (26) showed clear evidence of
multiple polyadenylation sites spread over areas ranging from 300 nt to
>15 kb. The existence of long-range alternative polyadenylation is
independent of the size or the number of exons of the genes.
|
If we extrapolate this small sample to the genome, long-range
alternative polyadenylation could affect 15,000-20,000
genes. This is almost certainly an underestimate, because
there are many transcripts for which 3' tags are not available (see
above) and in which EST coverage is insufficient to convincingly
document the extent of the 3' UTR. Although there have been numerous
reports in the literature of transcripts undergoing alternative
polyadenylation involving relatively distant sites (van Eyndhoven et
al. 1998
; Coy et al. 1999
; Touriol et al. 1999
), it was unexpected to
observe it at this high frequency. Estimates gathered from EST
clustering alone are significantly lower (Gautheret et al. 1998
;
Beaudoing and Gautheret 2001
; Pauws et al. 2001
). We experimentally
verified our methodology by performing reverse transcriptase-polymerase chain reaction (RT-PCR) experiments on the predicted long 3' UTRs of
the WDR9 (WD repeat 9) and KCNJ5 (potassium
inwardly rectifying channel 5) genes and found that they do indeed
encode transcripts extending far downstream from the 3' ends documented
by "full-length" cDNA sequences, even though the intervening
regions are not fully covered by ESTs (data not shown). Conversely, the
TRAF3 gene on chromosome 14q32.3 has been shown
experimentally to have two polyadenylation sites separated by 6 kb (van
Eyndhoven et al. 1998
); our data document this fact very clearly (not
shown) and in addition mark a previously unrecognized site that
explains the presence of 2.2 kb and 2.6 kb mRNA species observed in the
original report. Northern blots very commonly show multiple bands when
hybridized with a unique coding region probe. We informally collected
Northern data from colleagues doing laboratory work. Probes for the
FLG2, SCD, SLC20A1, PTEN,
TNFRSF10C, IRF1, and IL9R genes all
detect multiple bands; for all but SLC20A1 (which shows
evidence of extensive alternative splicing), we were able to detect
multiple polyadenylation sites documenting the origin of the major
bands. A probe for ACAT2 detects only one band, and this
gene has two polyadenylation sites only 150 nt apart, which would not
be resolved into individual bands;
-actin (gene ACTB),
commonly used as an internal control in Northerns and producing a
single band, shows no evidence of long-range alternative
polyadenylation despite an abundance of EST data. However, the fine
structure of the ACTB polyadenylation site reveals six
positions for the start of the poly(A) tail within 50 nt of each other.
One consequence of this hitherto unrecognized long-range variation in polyadenylation sites is that the extended 3' UTR of unrelated genes transcribed in opposite directions occasionally overlap. A clear-cut case is provided by the SCL19A1 (folate transporter) and the COL18A1 (collagen type XVIII) genes, both located on the AL163302 segment of chromosome 21 (Fig. 1b). The known 3' ends of the corresponding transcripts are separated by only 1 kb on the genome. The intervening region is densely populated by ESTs, which document unambiguously four polyadenylation sites for SCL19A1 and three for COL18A1; interestingly, the locations of the sites on the two strands are almost identical. The poly(A)-containing ESTs derived from one or the other gene can easily be distinguished from each other by their polarity on the genome; other ESTs can be oriented based on their annotation. The UniGene database assigns ESTs whose ends map before position ~213,500 to the Hs.78409 cluster (COL18A1) and those mapping beyond that position to the Hs.84190 cluster (SLC19A1). Although these assignments are perfectly correct within the logic of EST clustering based on sequence overlaps, they fail to reflect the anatomy of the transcripts derived from this region.
In genes that are expressed at a sufficient level, the differential
usage of polyadenylation sites can be estimated in silico by counting
the ESTs documenting one or the other site or by extracting the
corresponding serial analysis of gene expression (SAGE) tag counts from
publicly available data (Lal et al. 1999
). We counted poly(A)-containing ESTs and SAGE tags for the major polyadenylation sites of five of the genes in Table 2, and the results are shown in
Table 3. Because many of the cDNA libraries
from which ESTs were derived were normalized, the SAGE tag count is a
better estimate of the relative abundance of the corresponding
transcripts. It can be seen that in this limited sample the usage of
individual polyadenylation sites is not related to their distance from
the last splice acceptor. A larger scale study would be required to properly study the relationship between the abundance of transcripts and the length of their 3' UTRs. Similarly, we did not attempt to
determine from SAGE data whether some alternative polyadenylation sites
were differentially used in libraries of different origin, because in
most cases the tag numbers are not sufficient to make such comparisons.
|
| |
DISCUSSION |
|---|
|
|
|---|
The results presented here have important practical implications for
the analysis of the human transcriptome, as well as those of other
vertebrates. The cDNA or cRNA targets used in gene expression profiling
experiments are almost always labeled after oligo(dT) primed cDNA
synthesis by RT. Therefore, they are enriched in poly(A) proximal
sequences, and as a general rule cDNA clones or oligonucleotides derived from the 3' ends of transcripts have been chosen as probes for
hybridization. The finding that in a significant proportion of genes
polyadenylation sites are distributed over relatively long distances
should have a significant impact on probe design, because ideally,
sequences adjacent to each site should be included in the arrays. The
parameters influencing polyadenylation site selection have not been
studied for most genes, and the inclusion in cDNA arrays of multiple
probes for each gene should allow one to readily address this question.
The collection of trusted 3' tags and the assignment of associated EST
clusters to the 3' UTR of validated genes will thus be a crucial
resource for the rational design of hybridization arrays. Another
important implication of the work described here is the need for more
comprehensive tag to gene maps for SAGE experiments (Pauws et al.
2001
). Current maps do not take into account the locations of trusted
mRNA 3' ends. In addition, tags derived from long 3' UTR are often
mapped to transcripts whose coding capacity has not been determined
because they have not been linked to the correct upstream
protein-coding region.
It has been argued that the relatively small number of genes present in
the human genome encodes a much larger variety of transcripts, and that
its true complexity cannot be deduced from a mere counting of genes.
However, we are still far from comprehending the extent of this
complexity, as evidenced by the results reported here. Alternative
polyadenylation has been shown in numerous cases to give rise to
transcripts with distinct biological properties by alteration of their
protein-coding capacity (Chuvpilo et al. 1999
), the regulation of their
translation (Knirsch and Clerch 2000
), their stability (Touriol et al.
1999
), or their intracellular localization (Kislauskis et al. 1994
). It
is now evident that this contributes in a major way to the diversity of
the human transcriptome.
| |
METHODS |
|---|
|
|
|---|
MegaBlast was used to identify pairwise similarities
between all known transcript sequences and the draft genome sequence
deposited in release 66 (March 2001) of the EMBL database. The
transcript sequences analyzed include all human sequences deposited in
the EST section, sequences identified as RNA in the HUM section,
sequences available on May 15, 2001 from the NCBI curated RefSeq
database (http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html), and
~700,000 ORESTES sequences from the LICR/FAPESP Human Cancer Genome
project (Camargo et al. 2001
). The transcript sequences were filtered
of contaminants and repetitive elements were masked out using the
PFP software package (Paracel). For the draft genome, we
used human genomic sequences of a size >10 kb that are deposited in
the HUM and HTG sections. Before analysis, we removed bacterial and
other contaminants. Those HTG entries that had not been fully assembled
were split into individual components. The human genome dataset we used
is thus highly redundant but can easily be reduced to one of the
available assemblies. For each pair of matching RNA and genomic
sequence, local alignments were generated using sim4
(Florea et al. 1998
) with the parameters W = 15, R = 0, A = 4,
and P = 1. The output of sim4 was filtered to eliminate
all alignments that did not contain at least one region (exon) matching
with at least 95% identity >30 nt. ACEDB databases were generated
directly from the filtered sim4 output files.
The tromer program attempts to automate the reconstruction of transcripts from transcript to genome mapping data. The output of sim4 is used to construct a set of oriented exons that are then merged if they share boundaries, thus reducing the redundancy of the EST data. The order of splice donors and acceptors is used to define the orientation of a transcript; for unspliced transcripts, the presence of a 3' tag within 50 nt of the end (when available) is used instead. Full-length mRNA and RefSeq sequences are assumed to be derived from the coding strand. Virtual transcripts are reconstituted by following exon boundaries with known polarity; when several paths can be followed (i.e., when there is evidence for alternative splicing or polyadenylation), multiple transcripts can be generated. The output of tromer links each virtual transcript to a 3' tag (if known) and to other virtual transcripts that share sequences derived from the same cDNA clones to flag potential gaps in EST sequence coverage.
Because poly(A) tracts documenting the position of mRNA 3' ends have
commonly been removed from the EST sequences deposited in the public
databases, we analyzed the original trace files generated for each
sequence. Sequences were extracted using the extract_seq
(Staden et al. 2000
) or phred (Ewing et al. 1998
)
programs; the longest poly(A) or poly(T) was identified, and if it
was longer than 10 nt then the 50 nt immediately adjacent to it were
recorded as a candidate tag (after obtaining the reverse complement for
poly(T) tracts). Duplicate tags were eliminated, but information about
the trace files containing them was retained. Tags matching LINE and
Alu repetitive elements, ribosomal or mitochondrial sequences, and
those containing simple repeats were eliminated. Exact matches for the
remaining tags were mapped in the genome using ad hoc software, and the
50 nt found downstream from the match were also recorded. Those tags
that did not find exact matches were mapped again, this time using a
slower dynamic programming algorithm (Smith and Waterman 1981
) allowing
up to two mismatches. All 3' tags that had found matches in the genome
were clustered, again using ad hoc software, based on overlaps between
the sequences of the tags (including the downstream genome sequence)
and on their mapping positions in genomic clones; if two tags mapped within 50 nt they were considered to be part of the same cluster, and
this procedure was iterated until no new members could be added. In
this final collection, 3' tags were tested for the occurrence of at
least 10 A's or 11 A's and G's in the first 15 nt of downstream genomic sequence. A database of 3' tag microclusters is available from
ftp://ftp.licr.org/pub/databases/tags, including their sequence with
the downstream sequence in the genome, the trace file(s) from which
they were extracted, their position within genome segments, their
offset relative to other members of the same microcluster, and
information about the quality of the tag. Individual tags were
incorporated into the ACEDB databases on the basis of their mapping
coordinates on the genome segments.
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
ftp://ftp.licr.org/pub/Genome_Research
http://cgap.nci.nih.gov/; Cancer Genome Anatomy Project.
http://mgc.nci.nih.gov/; Mammalian Gene Collection.
http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html; NCBI curated RefSeq database.
http://www.ncbi.nlm.gov/UniGene/; UniGene database.
| |
ACKNOWLEDGMENTS |
|---|
We thank Dr. Barbara Cohen for her thoughtful contributions during the preparation of this manuscript. We also thank Professor Ricardo Brentani, the director of the São Paulo Branch, and Dr. Lloyd Old, the scientific director of the LICR, for their constant encouragement and stimulation of this work. We gratefully acknowledge Dr. Greg Riggins for providing us with a curated collection of SAGE tags, and Drs. Curzio Ruegg, Jean-Baptiste Demoulin, Pär Olsson, Frank Funari, Pascal Schneider, Luiz Fernando Reis, and Jean-Christophe Renauld for sharing their Northern blot data.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
6 These authors contributed equally to this work.
7 Corresponding author: E-mail Victor.Jongeneel{at}licr.org; FAX 41 21 692 5945.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.62002. Article published online before print in June 2002.
| |
REFERENCES |
|---|
|
|
|---|
Received January 9, 2002; accepted in revised form April 3, 2002.
This article has been cited by other articles:
![]() |
R. L. Strausberg and S. Levy Promoting transcriptome diversity Genome Res., July 1, 2007; 17(7): 965 - 968. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Moucadel, F. Lopez, T. Ara, P. Benech, and D. Gautheret Beyond the 3' end: experimental validation of extended transcript isoforms Nucleic Acids Res., March 19, 2007; 35(6): 1947 - 1957. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Liu, J. M. Brockman, B. Dass, L. N. Hutchins, P. Singh, J. R. McCarrey, C. C. MacDonald, and J. H. Graber Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis Nucleic Acids Res., January 12, 2007; 35(1): 234 - 246. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Y. Lee, I. Yeh, J. Y. Park, and B. Tian PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D165 - D168. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kubo, T. Wada, Y. Yamaguchi, A. Shimizu, and H. Handa Knock-down of 25 kDa subunit of cleavage factor Im in Hela cells alters alternative polyadenylation within 3'-UTRs Nucleic Acids Res., December 4, 2006; 34(21): 6264 - 6271. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Lopez, S. Granjeaud, T. Ara, B. Ghattas, and D. Gautheret The disparate nature of "intergenic" polyadenylation sites RNA, October 1, 2006; 12(10): 1794 - 1801. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Gilmartin Eukaryotic mRNA 3' processing: a common means to different ends Genes & Dev., November 1, 2005; 19(21): 2517 - 2521. [Full Text] [PDF] |
||||
![]() |
F. Naef and J. Huelsken Cell-type-specific transcriptomics in chimeric models using transcriptome-based masks Nucleic Acids Res., July 19, 2005; 33(13): e111 - e111. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. V. Jongeneel, M. Delorenzi, C. Iseli, D. Zhou, C. D. Haudenschild, I. Khrebtukova, D. Kuznetsov, B. J. Stevenson, R. L. Strausberg, A. J.G. Simpson, et al. An atlas of human gene expression from massively parallel signature sequencing (MPSS) Genome Res., July 1, 2005; 15(7): 1007 - 1014. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Venkataraman, K. M. Brown, and G. M. Gilmartin Analysis of a noncanonical poly(A) site reveals a tripartite mechanism for vertebrate poly(A) site recognition Genes & Dev., June 1, 2005; 19(11): 1315 - 1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yan and T. G. Marr Computational analysis of 3'-ends of ESTs shows four classes of alternative polyadenylation in human, mouse, and rat Genome Res., March 1, 2005; 15(3): 369 - 375. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dahary, O. Elroy-Stein, and R. Sorek Naturally occurring antisense: Transcriptional leakage or real overlap? Genome Res., March 1, 2005; 15(3): 364 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Quere, L. Manchon, M. Lejeune, O. Clement, F. Pierrat, B. Bonafoux, T. Commes, D. Piquemal, and J. Marti Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression Nucleic Acids Res., November 23, 2004; 32(20): e163 - e163. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Jowett, K. S. Elliott, J. E. Curran, N. Hunt, K. R. Walder, G. R. Collier, P. Z. Zimmet, and J. Blangero Genetic Variation in BEACON Influences Quantitative Variation in Metabolic Syndrome-Related Phenotypes Diabetes, September 1, 2004; 53(9): 2467 - 2472. [Abstract] [Full Text] [PDF] |
||||
![]() |
The Ludwig-FAPESP Transcript Finishing Initiative, M. C. Sogayar, and A. A. Camargo A Transcript Finishing Initiative for Closing Gaps in the Human Transcriptome Genome Res., July 1, 2004; 14(7): 1413 - 1423. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. G. Noble, P. A. Walker, L. J. Calder, and I. A. Taylor Rna14-Rna15 assembly mediates the RNA-binding capability of Saccharomyces cerevisiae cleavage factor IA Nucleic Acids Res., June 23, 2004; 32(11): 3364 - 3375. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sperisen, C. Iseli, M. Pagni, B. J. Stevenson, P. Bucher, and C. V. Jongeneel trome, trEST and trGEN: databases of predicted protein sequences Nucleic Acids Res., January 1, 2004; 32(90001): D509 - 511. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Brentani, O. L. Caballero, A. A. Camargo, A. M. da Silva, W. A. da Silva Jr., E. D. Neto, M. Grivet, A. Gruber, P. E. M. Guimaraes, W. Hide, et al. The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags PNAS, November 11, 2003; 100(23): 13418 - 13423. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Carninci, K. Waki, T. Shiraki, H. Konno, K. Shibata, M. Itoh, K. Aizawa, T. Arakawa, Y. Ishii, D. Sasaki, et al. Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia Genome Res., June 1, 2003; 13(6): 1273 - 1289. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. V. Jongeneel, C. Iseli, B. J. Stevenson, G. J. Riggins, A. Lal, A. Mackay, R. A. Harris, M. J. O'Hare, A. M. Neville, A. J. G. Simpson, et al. Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing PNAS, April 15, 2003; 100(8): 4702 - 4705. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||