Genome Res. 13:1273-1289, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia
Piero Carninci1,2,
Kazunori Waki1,
Toshiyuki Shiraki1,
Hideaki Konno1,
Kazuhiro Shibata2,
Masayoshi Itoh2,
Katsunori Aizawa1,
Takahiro Arakawa1,
Yoshiyuki Ishii1,
Daisuke Sasaki1,
Hidemasa Bono1,
Shinji Kondo1,
Yuichi Sugahara1,
Rintaro Saito1,
Naoki Osato1,
Shiro Fukuda1,
Kenjiro Sato2,3,
Akira Watahiki2,3,
Tomoko Hirozane-Kishikawa1,
Mari Nakamura1,
Yuko Shibata2,6,
Ayako Yasunishi1,
Noriko Kikuchi2,
Atsushi Yoshiki5,
Moriaki Kusakabe5,7,
Stefano Gustincich8,
Kirk Beisel9,
William Pavan10,
Vassilis Aidinis11,
Akira Nakagawara12,
William A. Held13,
Hiroo Iwata14,
Tomohiro Kono15,
Hiromitsu Nakauchi16,
Paul Lyons17,
Christine Wells18,
David A. Hume18,
Michela Fagiolini19,
Takao K. Hensch19,
Michelle Brinkmeier20,
Sally Camper20,
Junji Hirota21,
Peter Mombaerts21,
Masami Muramatsu1,2,3,
Yasushi Okazaki1,2,
Jun Kawai1,2 and
Yoshihide Hayashizaki1,2,3,4,22
1Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
2Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan
3Institute of Basic Medical Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
4Japan Division of Genomic Information Resources, Science of Biological Supramolecular Systems, Graduate School of Integrated Science, Yokohama City University, Tsurumi-Ku, Yokohama 230-0045, Japan
5Experimental Animal Research Division, Biogenic Resources Center, RIKEN Tsukuba Institute, Tsukuba, Ibaraki 305-0074, Japan
6Dnaform International, Inc., Ami Town, Inashiki District, Ibaraki 300-0332, Japan
7Aloka Co., LTD, Kasumigaura-cho, Niihari-gun, Ibaraki 300-0134 Japan
8Department of Neurobiology, Harvard Medical School, Boston, Massachusetts 02115, USA
9Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
10National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
11Institute of Immunology, Biomedical Sciences Research Center A1. Fleming, 16672 Vari, Greece
12Chiba Cancer Center Research Institute, Division of Biochemistry, Chuo-ku, Chiba 260-8717, Japan
13Roswell Park Cancer Institute, Buffalo, New York 14263, USA
14Department of Reparative Materials Field of Tissue Engineering, Institute for Frontier Medical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
15Faculty of Applied Bioscience, Department of BioScience, Tokyo University of Agriculture, Setagaya-ku, Tokyo 156-8502, Japan
16Laboratory of Stem Cell Therapy Center for Experimental Medicine, Institute of Medical Science, University of Tokyo Minato-ku, Tokyo 108-8639, Japan
17DRF/WT Diabetes and Inflammation Laboratory Cambridge Institute for Medical Research, Cambridge CB2 2XYUK
18The Institute for Molecular Biosciences, The University of QLD, St. Lucia Brisbane, QLD 4072 Australia
19Neuronal Function Research, Lab for Neuronal Circuit Development, RIKEN Brain Science Institute (BSI), Wako-shi, Saitama 300-0198, Japan
20University of Michigan Medical, Ann Arbor, Michigan 48109, USA
21Developmental Biology and Neurogenetics, The Rockefeller University, New York, New York 10021, USA
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1119703.
22 Corresponding author. E-MAIL: rgscerg{at}gsc.riken.go.jp; FAX 8145 503 9216.
[Supplemental material available online at www.genome.org.]

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
M. C. Frith, E. Valen, A. Krogh, Y. Hayashizaki, P. Carninci, and A. Sandelin
A code for transcription initiation in mammalian genomes
Genome Res.,
January 1, 2008;
18(1):
1 - 12.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Kawai, P. Carninci, and Y. Hayashizaki
Transcriptomics resources for functional genomics
Brief Funct Genomic Proteomic,
November 19, 2007;
(2007)
elm024v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Kong, Y. Zhang, Z.-Q. Ye, X.-Q. Liu, S.-Q. Zhao, L. Wei, and G. Gao
CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W345 - W349.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Carninci
Constructing the landscape of the mammalian transcriptome
J. Exp. Biol.,
May 1, 2007;
210(9):
1497 - 1506.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Y. Gracey
Interpreting physiological responses to environmental change through gene expression profiling
J. Exp. Biol.,
May 1, 2007;
210(9):
1584 - 1592.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F. Lopez, S. Granjeaud, T. Ara, B. Ghattas, and D. Gautheret
The disparate nature of "intergenic" polyadenylation sites
RNA,
October 1, 2006;
12(10):
1794 - 1801.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Gustincich, A. Sandelin, C. Plessy, S. Katayama, R. Simone, D. Lazarevic, Y. Hayashizaki, and P. Carninci
The complexity of the mammalian transcriptome
J. Physiol.,
September 1, 2006;
575(2):
321 - 332.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Hirao, Y. Natsuka, T. Tamura, I. Wada, D. Morito, S. Natsuka, P. Romero, B. Sleno, L. O. Tremblay, A. Herscovics, et al.
EDEM3, a Soluble EDEM Homolog, Enhances Glycoprotein Endoplasmic Reticulum-associated Degradation and Mannose Trimming
J. Biol. Chem.,
April 7, 2006;
281(14):
9650 - 9658.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Lipovich and M.-C. King
Abundant novel transcriptional units and unconventional gene pairs on human chromosome 22
Genome Res.,
January 1, 2006;
16(1):
45 - 54.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Ravasi, H. Suzuki, K. C. Pang, S. Katayama, M. Furuno, R. Okunishi, S. Fukuda, K. Ru, M. C. Frith, M. M. Gongora, et al.
Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome
Genome Res.,
January 1, 2006;
16(1):
11 - 19.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. S. Siddiqui, J. Khattra, A. D. Delaney, Y. Zhao, C. Astell, J. Asano, R. Babakaiff, S. Barber, J. Beland, S. Bohacec, et al.
A mouse atlas of gene expression: Large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells
PNAS,
December 20, 2005;
102(51):
18485 - 18490.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. R. Williams, L. E. Epperson, W. Li, M. A. Hughes, R. Taylor, J. Rogers, S. L. Martin, A. R. Cossins, and A. Y. Gracey
Seasonally hibernating phenotype assessed through transcript screening
Physiol Genomics,
December 14, 2005;
24(1):
13 - 22.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. E. Cusick, N. Klitgord, M. Vidal, and D. E. Hill
Interactome: gateway into systems biology
Hum. Mol. Genet.,
October 15, 2005;
14(suppl_2):
R171 - R181.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
The FANTOM Consortium, P. Carninci, T. Kasukawa, S. Katayama, J. Gough, M. C. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, et al.
The Transcriptional Landscape of the Mammalian Genome
Science,
September 2, 2005;
309(5740):
1559 - 1563.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
RIKEN Genome Exploration Research Group and Genome, S. Katayama, Y. Tomaru, T. Kasukawa, K. Waki, M. Nakanishi, M. Nakamura, H. Nishida, C. C. Yap, M. Suzuki, et al.
Antisense Transcription in the Mammalian Transcriptome
Science,
September 2, 2005;
309(5740):
1564 - 1566.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. D. Ward, L. T. Raetzman, H. Suh, B. M. Stone, I. O. Nasonkin, and S. A. Camper
Role of PROP1 in Pituitary Gland Growth
Mol. Endocrinol.,
March 1, 2005;
19(3):
698 - 710.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Kawaguchi and J. Bailey-Serres
mRNA sequence features that contribute to translational regulation in Arabidopsis
Nucleic Acids Res.,
February 16, 2005;
33(3):
955 - 965.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Dike, V. S. Balija, L. U. Nascimento, Z. Xuan, J. Ou, T. Zutavern, L. E. Palmer, G. Hannon, M. Q. Zhang, and W. R. McCombie
The mouse genome: Experimental examination of gene predictions and transcriptional start sites
Genome Res.,
December 1, 2004;
14(12):
2424 - 2429.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. E. Hill, M. A. Brasch, A. A. del Campo, L. Doucette-Stamm, J. I. Garrels, J. Glaven, J. L. Hartley, J. R. Hudson Jr., T. Moore, and M. Vidal
Academia-Industry Collaboration: An Integral Element for Building "Omic" Resources
Genome Res.,
October 1, 2004;
14(10b):
2010 - 2014.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. F. Bonaldo, T. B. Bair, T. E. Scheetz, E. Snir, I. Akabogu, J. L. Bair, B. Berger, K. Crouch, A. Davis, M. E. Eyestone, et al.
1274 Full-Open Reading Frames of Transcripts Expressed in the Developing Mouse Nervous System
Genome Res.,
October 1, 2004;
14(10b):
2053 - 2063.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Metsis, U. Andersson, G. Bauren, P. Ernfors, P. Lonnerberg, A. Montelius, M. Oldin, A. Pihlak, and S. Linnarsson
Whole-genome expression profiling through fragment display and combinatorial gene identification
Nucleic Acids Res.,
September 8, 2004;
32(16):
e127 - e127.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Gustincich, M. Contini, M. Gariboldi, M. Puopolo, K. Kadota, H. Bono, J. LeMieux, P. Walsh, P. Carninci, Y. Hayashizaki, et al.
Gene discovery in genetically labeled single dopaminergic neurons of the retina
PNAS,
April 6, 2004;
101(14):
5069 - 5074.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Marino-Ramirez, J. L. Spouge, G. C. Kanga, and D. Landsman
Statistical analysis of over-represented words in human promoter sequences
Nucleic Acids Res.,
February 12, 2004;
32(3):
949 - 958.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Shiraki, S. Kondo, S. Katayama, K. Waki, T. Kasukawa, H. Kawaji, R. Kodzius, A. Watahiki, M. Nakamura, T. Arakawa, et al.
Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage
PNAS,
December 23, 2003;
100(26):
15776 - 15781.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|