Genome Research CSH PROT

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Harrison, P. M.
Right arrow Articles by Gerstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Harrison, P. M.
Right arrow Articles by Gerstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 12, Issue 2, 272-280, February 2002

LETTER
Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22

Paul M. Harrison, Hedi Hegyi, Suganthi Balasubramanian, Nicholas M. Luscombe, Paul Bertone, Nathaniel Echols, Ted Johnson, and Mark Gerstein1

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to ~20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (~20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.


1 Corresponding author.


12:272-280 ©2002 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/02 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Genome Res.Home page
Y. Ruan, H. S. Ooi, S. W. Choo, K. P. Chiu, X. D. Zhao, K.G. Srinivasan, F. Yao, C. Y. Choo, J. Liu, P. Ariyaratne, et al.
Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs)
Genome Res., June 1, 2007; 17(6): 828 - 838.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
D. Zheng, A. Frankish, R. Baertsch, P. Kapranov, A. Reymond, S. W. Choo, Y. Lu, F. Denoeud, S. E. Antonarakis, M. Snyder, et al.
Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution
Genome Res., June 1, 2007; 17(6): 839 - 851.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. E. Karro, Y. Yan, D. Zheng, Z. Zhang, N. Carriero, P. Cayting, P. Harrrison, and M. Gerstein
Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D55 - D60.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
K. V. Prasanth and D. L. Spector
Eukaryotic regulatory RNAs: an answer to the 'genome complexity' conundrum
Genes & Dev., January 1, 2007; 21(1): 11 - 42.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Yao, R. Charlab, and P. Li
Systematic identification of pseudogenes through whole genome expression evidence profiling
Nucleic Acids Res., September 11, 2006; 34(16): 4477 - 4485.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Csuros and I. Miklos
Statistical Alignment of Retropseudogenes and Their Functional Paralogs
Mol. Biol. Evol., December 1, 2005; 22(12): 2457 - 2471.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
N. Juretic, D. R. Hoen, M. L. Huynh, P. M. Harrison, and T. E. Bureau
The evolutionary fate of MULE-mediated duplications of host gene fragments in rice
Genome Res., September 1, 2005; 15(9): 1292 - 1297.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
B. Wirth, V. L. Louis, S. Potier, J.-L. Souciet, and L. Despons
Paleogenomics or the Search for Remnant Duplicated Copies of the Yeast DUP240 Gene Family in Intergenic Areas
Mol. Biol. Evol., September 1, 2005; 22(9): 1764 - 1771.
[Abstract] [Full Text] [PDF]


Home page
Pharmacol. Rev.Home page
S. M. Foord, T. I. Bonner, R. R. Neubig, E. M. Rosser, J.-P. Pin, A. P. Davenport, M. Spedding, and A. J. Harmar
International Union of Pharmacology. XLVI. G Protein-Coupled Receptor List
Pharmacol. Rev., June 1, 2005; 57(2): 279 - 288.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. M. Harrison, D. Zheng, Z. Zhang, N. Carriero, and M. Gerstein
Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
Nucleic Acids Res., April 28, 2005; 33(8): 2374 - 2383.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Adel, D. Laurent, and M. Dominique
HOPPSIGEN: a database of human and mouse processed pseudogenes
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D59 - D66.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. C. Pang, S. Stephen, P. G. Engstrom, K. Tajul-Arifin, W. Chen, C. Wahlestedt, B. Lenhard, Y. Hayashizaki, and J. S. Mattick
RNAdb--a comprehensive mammalian noncoding RNA database
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D125 - D130.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
P. Bertone, V. Stolc, T. E. Royce, J. S. Rozowsky, A. E. Urban, X. Zhu, J. L. Rinn, W. Tongprasit, M. Samanta, S. Weissman, et al.
Global Identification of Human Transcribed Sequences with Genome Tiling Arrays
Science, December 24, 2004; 306(5705): 2242 - 2246.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
D. N. Messina, J. Glasscock, W. Gish, and M. Lovett
An ORFeome-based Analysis of Human Transcription Factor Genes and the Construction of a Microarray to Interrogate Their Expression
Genome Res., October 1, 2004; 14(10b): 2041 - 2047.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
G. Euskirchen, T. E. Royce, P. Bertone, R. Martone, J. L. Rinn, F. K. Nelson, F. Sayward, N. M. Luscombe, P. Miller, M. Gerstein, et al.
CREB Binds to Multiple Loci on Human Chromosome 22
Mol. Cell. Biol., May 1, 2004; 24(9): 3804 - 3814.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
E. J. Devor and K. Moffat-Wilson
An Ancient RNase H1 Splice Junction Mutant Preserved in a 19-Million-Year-Old Genetic Fossil in Ape Genomes
J. Hered., May 1, 2004; 95(3): 257 - 261.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
J. J. Emerson, H. Kaessmann, E. Betran, and M. Long
Extensive Gene Traffic on the Mammalian X Chromosome
Science, January 23, 2004; 303(5657): 537 - 540.
[Abstract] [Full Text] [PDF]


Home page
FASEB J.Home page
J. N. ANDERSEN, P. G. JANSEN, S. M. ECHWALD, O. H. MORTENSEN, T. FUKADA, R. DEL VECCHIO, N. K. TONKS, and N. P. H. MOLLER
A genomic perspective on protein tyrosine phosphatases: gene structure, pseudogenes, and genetic disease linkage
FASEB J, January 1, 2004; 18(1): 8 - 30.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein
Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome
Genome Res., December 1, 2003; 13(12): 2541 - 2558.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
D. Torrents, M. Suyama, E. Zdobnov, and P. Bork
A Genome-Wide Survey of Human Pseudogenes
Genome Res., December 1, 2003; 13(12): 2559 - 2567.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. Martone, G. Euskirchen, P. Bertone, S. Hartman, T. E. Royce, N. M. Luscombe, J. L. Rinn, F. K. Nelson, P. Miller, M. Gerstein, et al.
Distribution of NF-{kappa}B-binding sites across human chromosome 22
PNAS, October 14, 2003; 100(21): 12247 - 12252.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
L. Z. Strichman-Almashanu, M. Bustin, and D. Landsman
Retroposed Copies of the HMG Genes: A Window to Genome Dynamics
Genome Res., May 1, 2003; 13(5): 800 - 812.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
J. L. Rinn, G. Euskirchen, P. Bertone, R. Martone, N. M. Luscombe, S. Hartman, P. M. Harrison, F. K. Nelson, P. Miller, M. Gerstein, et al.
The transcriptional activity of human Chromosome 22
Genes & Dev., February 15, 2003; 17(4): 529 - 540.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. M. Harrison, D. Milburn, Z. Zhang, P. Bertone, and M. Gerstein
Identification of pseudogenes in the Drosophila melanogaster genome
Nucleic Acids Res., February 1, 2003; 31(3): 1033 - 1037.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. E. Collins, M. E. Goward, C. G. Cole, L. J. Smink, E. J. Huckle, S. Knowles, J. M. Bye, D. M. Beare, and I. Dunham
Reevaluating Human Gene Annotation: A Second-Generation Analysis of Chromosome 22
Genome Res., January 1, 2003; 13(1): 27 - 36.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Karlin, C. Chen, A. J. Gentles, and M. Cleary
Associations between human disease genes and overlapping gene groups and multiple amino acid runs
PNAS, December 24, 2002; 99(26): 17008 - 17013.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
Z. Zhang, P. Harrison, and M. Gerstein
Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human Genome
Genome Res., October 1, 2002; 12(10): 1466 - 1482.
[Abstract] [Full Text] [PDF]


Home page
Biol. Reprod.Home page
M. Vallee, F. Guay, D. Beaudry, J. Matte, R. Blouin, J.-P. Laforest, M. Lessard, and M.-F. Palin
Effects of Breed, Parity, and Folic Acid Supplement on the Expression of Folate Metabolism Genes in Endometrial and Embryonic Tissues from Sows in Early Pregnancy
Biol Reprod, October 1, 2002; 67(4): 1259 - 1267.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. M. Roy-Engel, A.-H. Salem, O. O. Oyeniran, L. Deininger, D. J. Hedges, G. E. Kilroy, M. A. Batzer, and P. L. Deininger
Active Alu Element "A-Tails": Size Does Matter
Genome Res., September 1, 2002; 12(9): 1333 - 1344.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Echols, P. Harrison, S. Balasubramanian, N. M. Luscombe, P. Bertone, Z. Zhang, and M. Gerstein
Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes
Nucleic Acids Res., June 1, 2002; 30(11): 2515 - 2523.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.