Genome Res. 13:2541-2558, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Letter
Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome
Zhaolei Zhang,
Paul M. Harrison,
Yin Liu and
Mark Gerstein1
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts 8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained "bombardment" over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for 20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1429003.
1 Corresponding author. E-MAIL Mark.Gerstein{at}yale.edu; FAX (360) 838-7861.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
Y.-T. Huang, F.-C. Chen, C.-J. Chen, H.-L. Chen, and T.-J. Chuang
Identification and analysis of ancestral hominoid transcriptome inferred from cross-species transcript and processed pseudogene comparisons
Genome Res.,
July 1, 2008;
18(7):
1163 - 1170.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Okamura and K. Nakai
Retrotransposition as a Source of New Promoters
Mol. Biol. Evol.,
June 1, 2008;
25(6):
1231 - 1238.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. P. Stoye and M. W. Yap
Chance favors a prepared genome
PNAS,
March 4, 2008;
105(9):
3177 - 3178.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Z. D. Zhang, P. Cayting, G. Weinstock, and M. Gerstein
Analysis of Nuclear Receptor Pseudogenes in Vertebrates: How the Silent Tell Their Stories
Mol. Biol. Evol.,
January 1, 2008;
25(1):
131 - 143.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. H. Maxwell and M. J. Curcio
Retrosequence formation restructures the yeast genome
Genes & Dev.,
December 15, 2007;
21(24):
3308 - 3318.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Bhutkar, S. M. Russo, T. F. Smith, and W. M. Gelbart
Genome-scale analysis of positionally relocated genes
Genome Res.,
December 1, 2007;
17(12):
1880 - 1887.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. R. Muotri, M. C.N. Marchetto, N. G. Coufal, and F. H. Gage
The necessary junk: new functions for transposable elements
Hum. Mol. Genet.,
October 15, 2007;
16(R2):
R159 - R167.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. L. Goodier, L. Zhang, M. R. Vetter, and H. H. Kazazian Jr.
LINE-1 ORF1 Protein Localizes in Stress Granules with Other RNA-Binding Proteins, Including Components of RNA Interference RNA-Induced Silencing Complex
Mol. Cell. Biol.,
September 15, 2007;
27(18):
6469 - 6483.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Zheng, J. Shi, X. Fang, Y. Li, S. Vang, W. Fan, J. Wang, Z. Zhang, W. Wang, K. Kristiansen, et al.
FGF: A web tool for Fishing Gene Family in a whole genome database
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W121 - W125.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. B. Gerstein, C. Bruce, J. S. Rozowsky, D. Zheng, J. Du, J. O. Korbel, O. Emanuelsson, Z. D. Zhang, S. Weissman, and M. Snyder
What is a gene, post-ENCODE? History and updated definition
Genome Res.,
June 1, 2007;
17(6):
669 - 681.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Ruan, H. S. Ooi, S. W. Choo, K. P. Chiu, X. D. Zhao, K.G. Srinivasan, F. Yao, C. Y. Choo, J. Liu, P. Ariyaratne, et al.
Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs)
Genome Res.,
June 1, 2007;
17(6):
828 - 838.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Zheng, A. Frankish, R. Baertsch, P. Kapranov, A. Reymond, S. W. Choo, Y. Lu, F. Denoeud, S. E. Antonarakis, M. Snyder, et al.
Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution
Genome Res.,
June 1, 2007;
17(6):
839 - 851.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. L. a. S. Li
Genome-wide analyses of retrogenes derived from the human box H/ACA snoRNAs
Nucleic Acids Res.,
January 28, 2007;
35(2):
559 - 571.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. E. Karro, Y. Yan, D. Zheng, Z. Zhang, N. Carriero, P. Cayting, P. Harrrison, and M. Gerstein
Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D55 - D60.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Coulombe-Huntington and J. Majewski
Characterization of intron loss events in mammals
Genome Res.,
January 1, 2007;
17(1):
23 - 32.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Yao, R. Charlab, and P. Li
Systematic identification of pseudogenes through whole genome expression evidence profiling
Nucleic Acids Res.,
September 11, 2006;
34(16):
4477 - 4485.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Drouin
Processed Pseudogenes Are More Abundant in Human and Mouse X Chromosomes than in Autosomes
Mol. Biol. Evol.,
September 1, 2006;
23(9):
1652 - 1655.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. Wang, H. Zheng, C. Fan, J. Li, J. Shi, Z. Cai, G. Zhang, D. Liu, J. Zhang, S. Vang, et al.
High Rate of Chimeric Gene Origination by Retroposition in Plant Genomes
PLANT CELL,
August 1, 2006;
18(8):
1791 - 1802.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Z. Zhang, N. Carriero, D. Zheng, J. Karro, P. M. Harrison, and M. Gerstein
PseudoPipe: an automated pseudogene identification pipeline
Bioinformatics,
June 15, 2006;
22(12):
1437 - 1439.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. van Baren and M. R. Brent
Iterative gene prediction and pseudogene removal improves genome annotation.
Genome Res.,
May 1, 2006;
16(5):
678 - 685.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Kaneko, I. Aki, K. Tsuda, K. Mekada, K. Moriwaki, N. Takahata, and Y. Satta
Origin and Evolution of Processed Pseudogenes That Stabilize Functional Makorin1 mRNAs in Mice, Primates and Other Mammals
Genetics,
April 1, 2006;
172(4):
2421 - 2429.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S.-H. Shiu, J. K. Byrnes, R. Pan, P. Zhang, and W.-H. Li
Role of positive selection in the retention of duplicate genes in mammalian genomes
PNAS,
February 14, 2006;
103(7):
2232 - 2236.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Shemesh, A. Novik, S. Edelheit, and R. Sorek
Genomic fossils as a snapshot of the human transcriptome
PNAS,
January 31, 2006;
103(5):
1364 - 1369.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Akiva, A. Toporik, S. Edelheit, Y. Peretz, A. Diber, R. Shemesh, A. Novik, and R. Sorek
Transcription-mediated gene fusion in the human genome
Genome Res.,
January 1, 2006;
16(1):
30 - 36.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Csuros and I. Miklos
Statistical Alignment of Retropseudogenes and Their Functional Paralogs
Mol. Biol. Evol.,
December 1, 2005;
22(12):
2457 - 2471.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Karlin
Colloquium Perspective: Statistical signals in bioinformatics
PNAS,
September 20, 2005;
102(38):
13355 - 13362.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Juretic, D. R. Hoen, M. L. Huynh, P. M. Harrison, and T. E. Bureau
The evolutionary fate of MULE-mediated duplications of host gene fragments in rice
Genome Res.,
September 1, 2005;
15(9):
1292 - 1297.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Zhang, Y. Wu, Y. Liu, and B. Han
Computational Identification of 69 Retroposons in Arabidopsis
Plant Physiology,
June 1, 2005;
138(2):
935 - 948.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. M. Harrison, D. Zheng, Z. Zhang, N. Carriero, and M. Gerstein
Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
Nucleic Acids Res.,
April 28, 2005;
33(8):
2374 - 2383.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Adel, D. Laurent, and M. Dominique
HOPPSIGEN: a database of human and mouse processed pseudogenes
Nucleic Acids Res.,
January 1, 2005;
33(suppl_1):
D59 - D66.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Bertone, V. Stolc, T. E. Royce, J. S. Rozowsky, A. E. Urban, X. Zhu, J. L. Rinn, W. Tongprasit, M. Samanta, S. Weissman, et al.
Global Identification of Human Transcribed Sequences with Genome Tiling Arrays
Science,
December 24, 2004;
306(5705):
2242 - 2246.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. N. Messina, J. Glasscock, W. Gish, and M. Lovett
An ORFeome-based Analysis of Human Transcription Factor Genes and the Construction of a Microarray to Interrogate Their Expression
Genome Res.,
October 1, 2004;
14(10b):
2041 - 2047.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Nisole, C. Lynch, J. P. Stoye, and M. W. Yap
A Trim5-cyclophilin A fusion protein found in owl monkey kidney cells can restrict HIV-1
PNAS,
September 7, 2004;
101(36):
13324 - 13328.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. R. Weil, P. Widlak, J. D. Minna, and H. R. Garner
Global Survey of Chromatin Accessibility Using DNA Microarrays
Genome Res.,
July 1, 2004;
14(7):
1374 - 1381.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|