|
|
|
|
Genome Res. 17:839-851, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 OPEN ACCESS ARTICLE Letter Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution1 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA; 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, United Kingdom; 3 Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA; 4 Affymetrix, Inc., Santa Clara, California 92024, USA; 5 Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; 6 Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland; 7 Genome Institute of Singapore, Singapore 138672, Singapore; 8 Grup de Recerca en Informática Biomèdica, Institut Municipal dInvestigació Mèdica/Universitat Pompeu Fabra, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain; 9 Molecular, Cellular & Developmental Biology Department, Yale University, New Haven, Connecticut 06520, USA; 10 Center for Genomic Regulation, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain; 11 Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA; 12 Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genomes structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (
13 Corresponding authors. E-mail Mark.Gerstein{at}yale.edu; fax (360) 838-7861. E-mail deyou.zheng{at}yale.edu; fax (360) 838-7861. [Supplemental material is available online at www.genome.org and http://www.pseudogene.org/ENCODE/supplement/.] Article is online at http://www.genome.org/cgi/doi/10.1101/gr.5586307
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||