|
|
|
|
Published online before print
June 13, 2007, 10.1101/gr.6255407 Genome Res. 17:1118-1127, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 OPEN ACCESS ARTICLE
Resource RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins1 Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia; 2 Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia; 3 ARC Centre for Structural and Functional Microbial Genomics, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia; 4 John Curtin School of Medical Research, Australian National University, Canberra, Australian National Territory 0200, Australia; 5 School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organisms transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.
6 Corresponding authors. E-mail Maria.GarciadelaBanda{at}infotech.monash.edu.au; fax 61 3 9905 4699. E-mail James.Whisstock{at}med.monash.edu.au; fax 61 3 9905 4699. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6255407
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||