|
|
|
|
Genome Res. 15:537-551, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00 Letter Functional insights from the distribution and role of homopeptide repeat-containing proteins1 Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia 2 Victorian Bioinformatics Consortium, School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia 3 ARC Centre for Structural and Functional Microbial Genomics, School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia 4 School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia 5 Monash Institute of Reproduction and Development, Monash University, Clayton, VIC 3168, Australia 6 Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
Expansion of "low complex" repeats of amino acids such as glutamine (Poly-Q) is associated with protein misfolding and the development of degenerative diseases such as Huntington's disease. The mechanism by which such regions promote misfolding remains controversial, the function of many repeat-containing proteins (RCPs) remains obscure, and the role (if any) of repeat regions remains to be determined. Here, a Web-accessible database of RCPs is presented. The distribution and evolution of RCPs that contain homopeptide repeats tracts are considered, and the existence of functional patterns investigated. Generally, it is found that while polyamino acid repeats are extremely rare in prokaryotes, several eukaryote putative homologs of prokaryote RCPinvolved in important housekeeping processesretain the repetitive region, suggesting an ancient origin for certain repeats. Within eukarya, the most common uninterrupted amino acid repeats are glutamine, asparagines, and alanine. Interestingly, while poly-Q repeats are found in vertebrates and nonvertebrates, poly-N repeats are only common in more primitive nonvertebrate organisms, such as insects and nematodes. We have assigned function to eukaryote RCPs using Online Mendelian Inheritance in Man (OMIM), the Human Reference Protein Database (HRPD), FlyBase, and Wormpep. Prokaryote RCPs were annotated using BLASTp searches and Gene Ontology. These data reveal that the majority of RCPs are involved in processes that require the assembly of large, multiprotein complexes, such as transcription and signaling.
Single amino acid repeats are regions within proteins that comprise a single homopolymeric tract of a particular amino acid. Uncontrolled genetic expansions of such regions have been shown to lead to the development of serious debilitating human diseases. For example, expanded poly-Q and poly-A tracts are associated with the development of neurological disorders such as Huntington disease and Oculopharyngeal Muscular Dystrophy (OPMD), respectively. Several studies have also demonstrated that many nondisease-linked polyamino acid tracts are toxic to cells and/or lead to protein aggregation or misfolding (Dorsman et al. 2002
Of the polyamino acid repeats characterized to date, poly-Q repeats are the most extensively studied. Nine poly-Q-linked diseases have been identified, and the proteins believed to be responsible for the disease contain expanded poly-Q tracts that have been shown to possess an enhanced tendency to aggregate and form fibrils both in vitro and in vivo (Scherzinger et al. 1997
More recently, proteins containing expanded alanine tracts have been linked to several human diseases (Brown and Brown 2004
Several other repeat types have also been investigated. A polyglycine tract in the plant protein Toc-75 (a component of the protein import machinery in the chloroplast) has been shown to be important for targeting this protein to the outer envelope of the chloroplast (Inoue and Keegstra 2003 To date, the role of many amino acid repeats and RCPs remains somewhat obscure, and it is likely that numerous disease-linked RCPs remain to be identified. To begin to address this problem, a global genome survey has been performed to identify all homopeptide RCPs; these data have been stored in an online database. Resources such as Online Mendelian Inheritance in Man (OMIM) and FlyBase were used to map function onto eukaryote RCPs. BLASTp and Gene Ontology were used to functionally annotate where possible prokaryote RCPs. When considered as a whole, striking functional patterns, independent of amino acid type, can be observed across all RCPs; these data reveal that the majority of RCPs perform roles in processes that require the assembly of large multiprotein or protein/nucleic acid complexes.
A Web-accessible database of RCPs We identified all homopeptide repeats in GENPEPT greater than six amino acids in length; these data are available at http://repeats.med.monash.edu.au
Within GENPEPT (2,677,049 proteins) 1.4% of proteins are RCPs; a total of 54,566 homopeptide repeats could be identified in 37,355 RCPs (Table 1; Fig. 1A). RCPs from environmental sequences (Venter et al. 2004
Several general trends are apparent across all the data. The vast majority (87%) of all RCPs are from eukaryotes; prokaryote RCPs are rare (4%) (Table 1). This is in agreement with previous studies (Karlin and Burge 1996
When classified according to their physicochemical properties and normalized for the overall frequency of single amino acids within GENPEPT, there is an overrepresentation of polar repeats in comparison to hydrophobic repeats and of acidic repeats in comparison to basic repeats (Fig. 1B). These data are in agreement with a previous study that suggested that long stretches of hydrophobic residues possess greatly enhanced toxicity in comparison to similar stretches of hydrophilic residues (Dorsman et al. 2002
Homopeptide length
Proteins with more than one homopeptide repeat Within GENPEPT, 23% of all RCPs contain more than one repeat tract (Table 1). In eukaryotes, 24% of RCPs contain multiple repeat tracts and only 9% of proteins in prokaryotes (two archaeal and 113 bacterial proteins) are multirepeat-containing proteins. The most common pattern in GENPEPT after a single amino acid repeat is the doublet GG (752) followed by QQ (736), PP (596), SS (505), and NN (485). The propensity of one repeat type to occur with another in the same protein was investigated. Repeat pairs were tallied according to the number of related sequence families in which they were found; Table 2 shows the frequency with which a repeat of one type occurs with another. Strikingly, for all repeats except poly-L, poly-R, and poly-V, the strongest association was with either poly-N or poly-Q tracts (excluding selfself pairs).
Distribution of repeats within eukaryote organisms Figure 3, A and B, shows the distributions of RCPs in eukaryotes whose genomes are either complete or near completion. These data highlight several interesting anomalies. Drosophila melanogaster possesses an overabundance of poly-Q RCPs, >3.5-fold more than that of Homo sapiens and sixfold more than another insect, the mosquito Anopheles gambiae (Fig. 3B). In contrast, poly-Q repeats are extremely rare in Plasmodium falciparum; this organism instead possesses an overabundance of poly-N RCPs. Analysis of other complete eukaryote genomes revealed that poly-Q repeats are absent in Encephalitozoon cuniculi (an intracellular parasite). Another striking difference is in the distribution of poly-N RCPs. Nonvertebrate organisms all contain asparagine RCPs, whereas poly-N tracts are either absent or extremely rare in vertebrates (Fig. 3A). The human genome contains 233 poly-Q RCPs, but only eight poly-N RCPs, all of which are 8-residue repeats in the N terminus of the insulin receptor substrate 2. The genome of Mus musculus contains 170 poly-Q RCPs and only 13 poly-N RCPs (seven from thioredoxin interacting factor, one in the insulin receptor substrate 2, one in a transcription factor, and four in unknown proteins). The genomes of Gallus gallus and Xenopus laevis do not contain asparagine RCPs.
In order to include an avian representative in our analysis, we also examined the distribution of RCPs in the chicken. These data reveal an apparent paucity of RCPs in G. gallus as compared with H. sapiens and M. musculus (Fig. 3A); however, we cannot exclude the possibility that this observation is a result of the preliminary nature of the available genomic data. Finally, we note that repeats are completely absent in the nucleomorph of Guillardia theta (Chromophyte algae).
Evolution of RCPs
Functional groups in H. sapiens, D. melanogaster, C. elegans, and prokaryote RCPs We used the OMIM database and related resources to functionally group human RCPs. (Fig 7A). Sixty percent of the human RCPs have an OMIM record, and 120 diseases are associated with these records (Supplemental Table 1). In addition, all D. melanogaster RCPs were mapped onto FlyBase (FlyBase Consortium 2003
Interestingly, clear functional trends are apparent throughout the data set, the majority of both human and fruit-fly RCPs performing roles in transcription/translation and signaling processes. Enzymes, transport proteins, adhesion proteins, and structural proteins also commonly contain homopeptide repeats. We performed a similar analysis of C. elegans RCPs using Wormpep, and observed similar trends (Fig. 7C). Finally, we functionally annotated, where possible, prokaryote RCPs (Fig. 7D).
Discrete domains within RCPS
Our data reveal that RCPs are far more abundant in eukaryotes than in prokaryotes. In addition, based upon analysis of the D. melanogaster data set, the majority of eukaryote RCPs are predicted to be intracellular proteins. Furthermore, in agreement with the studies of Marcotte et al. (1999 Glycine, serine, and proline repeats are common in both prokaryotes and eukaryotes; however, common eukaryote repeats such as glutamine, asparagines, and glutamic acid are relatively rare in prokaryote organisms; of 29 asparagine RCPs in prokaryotes, 11 are orphans, 12 do not have eukaryotic homologs, and six have putative eukaryote homologs. However, these homologs do not contain the repeat or an equivalent amino acid-rich region. Of the 51 glutamine RCPs in prokaryotes, 23 are orphans, 21 do not have eukaryotic homologs, and seven have putative eukaryote homologs, but again, these homologs do not contain a repeat or an equivalent amino acid-rich region.
Certain discrepancies are clearly apparent when considering repeat distribution within eukaryotes. For example, glutamine RCPs (the most common eukaryote repeat) are rare or absent in P. falciparum, E. cuniculi, and G. theta. Furthermore, our data reveal that while asparagine repeats are common in nonvertebrates such as insects and nematodes, such repeats are extremely rare in vertebrates. Kreil and Kreil (2000
Very rarely, repeats are conserved across entire protein families, and only three families (DnaJ and the ribosomal proteins L10 and L12) could be identified with repeat regions in eukaryote and prokaryote putative homologs. A sequence alignment of the DnaJ family (Fig. 4) reveals that an extensive glycine repeat is present in most putative homologs. However, in the majority of these, the repeat is interrupted (typically with an alanine or phenylalanine residue), and this region is contracted in many eukaryotic counterparts. DnaJ functions in complex with at least two other proteins (DnaK and GrpE) to control processes such as protein folding, apoptosis, and the degradation of misfolded proteins (Gragerov et al. 1992
Both archaeal and eukaryotic L10 and L12 proteins contain a C-terminal region that comprises an alanine-rich region, termed the hinge, followed by a glutamic acid-rich repeat (Figs. 5, 6). L10 and L12 form part of a complex in the large ribosomal subunit termed the stalk protuberance. The hinge, as well as the acidic region in both proteins, are postulated to function as flexible regions that mediate a variety of proteinprotein interactions and are important for processes such as elongation (Remacha et al. 1995
A major question in repeat-related research fields is the role of RCPs and, in particular, the role of the repeat region itself. Evolutionary pressure on repeat regions is likely to include functional requirement, mutability of the underlying nucleotide sequence, and potential toxicity. Our analysis allows us to begin to address these questions from a functional perspective. Assigning function to protein sequences is nontrivial, since many proteins perform overlapping functions (for review, see Whisstock and Lesk 2003 We performed a functional analysis of RCPs from prokaryotes (Table 3; Fig. 7D). While 38% of these families perform as yet uncharacterized roles, several of the functional themes apparent in the eukaryote data set are also noticeable in prokaryotes. The most common functional class (accounting for 24% of classified molecules) are RCPs that perform roles as enzymes. Transport proteins, structural proteins, and transcription/translation-related RCPs are also common; however, in contrast to eukaryotes, the dramatic bias toward transcription/translation-related processes is not observed. One possible explanation for these data is that bacterial genomes are smaller in relation to eukaryotes and are not packaged and controlled in such a complex fashion. RCPs involved in signaling-related processes are also relatively rare in prokaryotes in comparison to eukaryotes. Again, we argue that this may be a result of the increased complexity of eukaryote processes; while bacteria utilize intracellular signaling processes to communicate intracellularly and with their environment, these processes are relatively modest in comparison to eukaryote-signaling cascades.
The repeats database provides a basis for understanding the function of RCPs and their associated repeats in all organisms. Strikingly, the majority of RCPs considered are involved in processes that require the assembly or association of large multiprotein and/or nucleic acid complexes (Fig. 7). For example, the ribosome (itself a large protein/RNA complex) requires a large number of additional factors (e.g., elongation factors) to properly function. Processes such as transcription involve the assembly of multiprotein complexes (e.g., RNA polymerase) and the binding of discrete sequences of DNA that may be kilobases apart (Tolhuis et al. 2002
The data presented in this study reveal that the vast majority of the human repeat tracts present in the OMIM data set (83%) are located N-terminal to, as well as C-terminal to, or in-between discrete domains. Alba and Guigo (2004
The function of RCPs, as well as the interdomain or terminal location of the repeat tract within these molecules and the disordered structure of these repeat sequences, supports the idea that repeats play roles as flexible spacer elements/"tethers" between individual folded domains in molecules that mediate proteinprotein or proteinnucleic acid interactions (Karlin and Burge 1996
Based upon this work, it is suggested that a general function of the majority of repeat sequences is to mediate the assembly of protein complexes, and that RCPs may act as molecular "fishing lines", mediating interactions either through tethered distant domains, or indeed, through interactions with the repeat itself (e.g., Gerber et al. 1994
Several studies have revealed that long stretches of hydrophobic amino acids are more toxic than hydrophilic counterparts (Dorsman et al. 2002 The majority of proteins sequenced to date do not contain repeats. While certain repeats are common throughout entire protein superfamilies (such as the DnaJ family), the data gleaned in this study reveals that repeat proteins are often "orphans." We suggest that the putative role of repeats is ancient; however, the relatively sporadic distribution of these regions suggests that repeats often evolve to perform in specialized processes unique to a particular organism or set of organisms.
The March (2004) version of GENPEPT (from ftp://ftp.ncbi.nih.gov/blast/db
Homopeptide searches
Evolution of RCPs When considering eukaryote RCPs, only completed or near-complete genomes were used, so as to avoid potential bias due to overrepresentation of commonly studied protein families. Thus, the following species were considered: Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus, Danio rerio, Xenopus laevis, Drosophila melanogaster, Anopheles gambiae, Caenorhabditis elegans, Saccharomyces cerevisiae, Plasmodium falciparum, Oryza sativa, Triticum aestivum, and Arabidopsis thaliana.
The longest sequence from each of the prokaryote clusters was used as probes to search GENPEPT using PSI-BLAST. The following parameters were used: j = 5, b = 100,000, e = 0.001, and -F T. All sequences with significant expect scores (<0.001) (Park et al. 1998
Analysis of repeat pairs
Functional annotation
J.C.W. is a National Health and Medical Research Council of Australia Senior Research Fellow and Monash University Logan Fellow. S.P.B. is an NHMRC R.D. Wright Fellow and Monash University Logan Fellow. J.A.I. is an Anti-Cancer council of Victoria Fellow, Monash University Research Fund Fellow and NHMRC C.J. Martin Fellow. M.G.B. is a Monash University Logan Fellow. We thank the NHMRC, the Australian Research Council, the Victorian Partnership for Advanced Computing, and the State Government of Victoria for support. We thank Sophie Katsabanis for discussion and comment on the manuscript and Michael Cameron, Michelle Dunstone, Sheena McGowan, and Michelle Chow for helpful discussion.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3096505.
7 Corresponding authors. [Supplemental material is available online at www.genome.org.]
Akey, C.W. and Luger, K. 2003. Histone chaperones and nucleosome assembly. Curr. Opin. Struct. Biol. 13: 6-14.[CrossRef][Medline]
Alba, M.M. and Guigo, R. 2004. Comparative analysis of amino acid repeats in rodents and humans. Genome Res. 14: 549-554.
Alba, M.M., Laskowski, R.A., and Hancock, J.M. 2002. Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics 18: 672-678.
Barton, G.J. 1993. ALSCRIPT: A tool to format multiple sequence alignments. Protein Eng. 6: 37-40.
Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. 2004. The Pfam protein families database. Nucleic Acids Res. 32: D138-D141. Becher, M.W., Kotzuk, J.A., Sharp, A.H., Davies, S.W., Bates, G.P., Price, D.L., and Ross, C.A. 1998. Intranuclear neuronal inclusions in Huntington's disease and dentatorubral and pallidoluysian atrophy: Correlation between the density of inclusions and IT15 CAG triplet repeat length. Neurobiol. Dis. 4: 387-397.[CrossRef][Medline] Brown, L.Y. and Brown, S.A. 2004. Alanine tracts: The expanding story of human illness and trinucleotide repeats. Trends Genet. 20: 51-58.[CrossRef][Medline] Burack, W.R. and Shaw, A.S. 2000. Signal transduction: Hanging on a scaffold. Curr. Opin. Cell. Biol. 12: 211-216.[CrossRef][Medline] Calnan, B.J., Tidor, B., Biancalana, S., Hudson, D., and Frankel, A.D. 1991. Arginine-mediated RNA recognition: The arginine fork. Science 252: 1167-1171.[CrossRef][Medline]
Chow, M.K., Ellisdon, A.M., Cabrita, L.D., and Bottomley, S.P. 2004a. Polyglutamine expansion in Ataxin-3 does not affect protein stability: Implications for misfolding and disease. J. Biol. Chem. 279: 47643-47651.
Chow, M.K., Lomas, D.A., and Bottomley, S.P. 2004b. Promiscuous Chow, M.K., Paulson, H.L., and Bottomley, S.P. 2004c. Destabilization of a non-pathological variant of ataxin-3 results in fibrillogenesis via a partially folded intermediate: A model for misfolding in polyglutamine disease. J. Mol. Biol. 335: 333-341.[CrossRef][Medline] Craig, E.A., Weissman, J.S., and Horwich, A.L. 1994. Heat shock proteins and molecular chaperones: Mediators of protein conformation and turnover in the cell. Cell 78: 365-372.[CrossRef][Medline]
Cummings, C.J. and Zoghbi, H.Y. 2000. Fourteen and counting: Unraveling trinucleotide repeat diseases. Hum. Mol. Genet. 9: 909-916.
Dorsman, J.C., Pepers, B., Langenberg, D., Kerkdijk, H., Ijszenga, M., den Dunnen, J.T., Roos, R.A., and van Ommen, G.J. 2002. Strong aggregation and increased toxicity of polyleucine over polyglutamine stretches in mammalian cells. Hum. Mol. Genet. 11: 1487-1496.
Enright, A.J. and Ouzounis, C.A. 2000. GeneRAGE: A robust algorithm for sequence clustering and domain detection. Bioinformatics 16: 451-457.
Fan, X., Dion, P., Laganiere, J., Brais, B., and Rouleau, G.A. 2001. Oligomerization of polyalanine expanded PABPN1 facilitates nuclear protein aggregation that is associated with cell death. Hum. Mol. Genet. 10: 2341-2351. Fandrich, M. and Dobson, C.M. 2002. The behaviour of polyamino acids reveals an inverse side chain effect in amyloid structure formation. EMBO J. 21: 5682-5690.[CrossRef][Medline]
FlyBase Consortium. 2003. The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res. 31: 172-175. Friedl, J.E.F. 2002. Mastering regular expressions. O'Reilly, Sebastopol, CA.
Gerber, H.P., Seipel, K., Georgiev, O., Hofferer, M., Hug, M., Rusconi, S., and Schaffner, W. 1994. Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263: 808-811.
Giri, K., Ghosh, U., Bhattacharyya, N.P., and Basak, S. 2003. Caspase 8 mediated apoptotic cell death induced by Gonzalo, P. and Reboud, J.P. 2003. The puzzling lateral flexible stalk of the ribosome. Biol. Cell 95: 179-193.[CrossRef][Medline] Gotoh, T., Terada, K., Oyadomari, S., and Mori, M. 2004. hsp70-DnaJ chaperone pair prevents nitric oxide- and CHOP-induced apoptosis by inhibiting translocation of Bax to mitochondria. Cell Death Differ. 11: 390-402.[CrossRef][Medline]
Gragerov, A., Nudler, E., Komissarova, N., Gaitanaris, G.A., Gottesman, M.E., and Nikiforov, V. 1992. Cooperation of GroEL/GroES and DnaK/DnaJ heat shock proteins in preventing protein misfolding in Escherichia coli. Proc. Natl. Acad. Sci. 89: 10341-10344. Grigoryev, S.A. 2004. Keeping fingers crossed: Heterochromatin spreading through interdigitation of nucleosome arrays. FEBS Lett. 564: 4-8.[CrossRef][Medline]
Hendrick, J.P., Langer, T., Davis, T.A., Hartl, F.U., and Wiedmann, M. 1993. Control of folding and membrane translocation by binding of the chaperone DnaJ to nascent polypeptides. Proc. Natl. Acad. Sci. 90: 10216-10220.
Holm, L. and Sander, C. 1998. Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14: 423-429.
Holmberg, M., Duyckaerts, C., Durr, A., Cancel, G., Gourfinkel-An, I., Damier, P., Faucheux, B., Trottier, Y., Hirsch, E.C., Agid, Y., et al. 1998. Spinocerebellar ataxia type 7 (SCA7): A neurodegenerative disorder with neuronal intranuclear inclusions. Hum. Mol. Genet. 7: 913-918. Huntley, M. and Golding, G.B. 2000. Evolution of simple sequence in proteins. J. Mol. Evol. 51: 131-140.[Medline] . 2002. Simple sequences are rare in the Protein Data Bank. Proteins 48: 134-140.[CrossRef][Medline] Inoue, K. and Keegstra, K. 2003. A polyglycine stretch is necessary for proper targeting of the protein translocation channel precursor to the outer envelope membrane of chloroplasts. Plant J. 34: 661-669.[CrossRef][Medline]
Karlin, S. and Burge, C. 1996. Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc. Natl. Acad. Sci. 93: 1560-1565. Korschen, H.G., Beyermann, M., Muller, F., Heck, M., Vantler, M., Koch, K.W., Kellner, R., Wolfrum, U., Bode, C., Hofmann, K.P., et al. 1999. Interaction of glutamic-acid-rich proteins with the cGMP signalling pathway in rod photoreceptors. Nature 400: 761-766.[CrossRef][Medline] Kreil, D.P. and Kreil, G. 2000. Asparagine repeats are rare in mammalian proteins. Trends Biochem. Sci. 25: 270-271.[CrossRef][Medline]
Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., and Bork, P. 2004. SMART 4.0: Towards genomic data integration. Nucleic Acids Res. 32: D142-D144. Li, M., Miwa, S., Kobayashi, Y., Merry, D.E., Yamamoto, M., Tanaka, F., Doyu, M., Hashizume, Y., Fischbeck, K.H., and Sobue, G. 1998. Nuclear inclusions of the androgen receptor protein in spinal and bulbar muscular atrophy. Ann. Neurol. 44: 249-254.[CrossRef][Medline] Mar Alba, M., Santibanez-Koref, M.F., and Hancock, J.M. 1999. Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J. Mol. Evol. 49: 789-797.[CrossRef][Medline] Marcotte, E.M., Pellegrini, M., Yeates, T.O., and Eisenberg, D. 1999. A census of protein repeats. J. Mol. Biol. 293: 151-160.[CrossRef][Medline]
Nam, Y.S., Petrovic, A., Jeong, K.S., and Venkatesan, S. 2001. Exchange of the basic domain of human immunodeficiency virus type 1 Rev for a polyarginine stretch expands the RNA binding specificity, and a minimal arginine cluster is required for optimal RRE RNA binding affinity, nuclear accumulation, and trans-activation. J. Virol. 75: 2957-2971. Notredame, C., Higgins, D.G., and Heringa, J. 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302: 205-217.[CrossRef][Medline]
Oma, Y., Kino, Y., Sasagawa, N., and Ishiura, S. 2004. Intracellular localization of homopolymeric amino acid-containing proteins expressed in mammalian cells. J. Biol. Chem. 279: 21217-21222. Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., and Chothia, C. 1998. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284: 1201-1210.[CrossRef][Medline]
Poetsch, A., Molday, L.L., and Molday, R.S. 2001. The cGMP-gated channel and related glutamic acid-rich proteins interact with peripherin-2 at the rim region of rod photoreceptor disc membranes. J. Biol. Chem. 276: 48009-48016. Pollard, T.D. 2000. Reflections on a quarter century of research on contractile systems. Trends Biochem. Sci. 25: 607-611.[CrossRef][Medline] Ramirez, C., Shimmin, L.C., Newton, C.H., Matheson, A.T., and Dennis, P.P. 1989. Structure and evolution of the L11, L1, L10, and L12 equivalent ribosomal proteins in eubacteria, archaebacteria, and eucaryotes. Can. J. Microbiol. 35: 234-244.[Medline] Remacha, M., Jimenez-Diaz, A., Bermejo, B., Rodriguez-Gabriel, M.A., Guarinos, E., and Ballesta, J.P. 1995. Ribosomal acidic phosphoproteins P1 and P2 are not required for cell viability but regulate the pattern of protein expression in Saccharomyces cerevisiae. Mol. Cell. Biol. 15: 4754-4762.[Abstract] Scherzinger, E., Lurz, R., Turmaine, M., Mangiarini, L., Hollenbach, B., Hasenbank, R., Bates, G.P., Davies, S.W., Lehrach, H., and Wanker, E.E. 1997. Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo. Cell 90: 549-558.[CrossRef][Medline] Skinner, P.J., Koshy, B.T., Cummings, C.J., Klement, I.A., Helin, K., Servadio, A., Zoghbi, H.Y., and Orr, H.T. 1997. Ataxin-1 with an expanded glutamine tract alters nuclear matrix-associated structures. Nature 389: 971-974.[CrossRef][Medline]
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673-4680.
Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F., and de Laat, W. 2002. Looping and interaction between hypersensitive sites in the active
Uchiumi, T., Honma, S., Endo, Y., and Hachimori, A. 2002a. Ribosomal proteins at the stalk region modulate functional rRNA structures in the GTPase center. J. Biol. Chem. 277: 41401-41409.
Uchiumi, T., Honma, S., Nomura, T., Dabbs, E.R., and Hachimori, A. 2002b. Translation elongation by a hybrid ribosome in which proteins at the GTPase center of the Escherichia coli ribosome are replaced with rat counterparts. J. Biol. Chem. 277: 3857-3862.
Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., et al. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
Wall, D., Zylicz, M., and Georgopoulos, C. 1995. The conserved G/F motif of the DnaJ chaperone is necessary for the activation of the substrate binding properties of the DnaK chaperone. J. Biol. Chem. 270: 2139-2144. Warrick, J.M., Paulson, H.L., Gray-Board, G.L., Bui, Q.T., Fischbeck, K.H., Pittman, R.N., and Bonini, N.M. 1998. Expanded polyglutamine protein forms nuclear inclusions and causes neural degeneration in Drosophila. Cell 93: 939-949.[CrossRef][Medline] Wetzel, R. 2002. Ideas of order for amyloid fibril structure. Structure 10: 1031-1036.[Medline] Whisstock, J.C. and Lesk, A.M. 2003. Prediction of protein function from protein sequence and structure. Q Rev. Biophys. 36: 307-340.[CrossRef][Medline] Whisstock, J., Skinner, R., and Lesk, A.M. 1998. An atlas of serpin conformations. Trends Biochem. Sci. 23: 63-67.[CrossRef][Medline]
http://repeats.med.monash.edu.au; A database of homopeptide repeats. http://www.hprd.org/; Human Protein Reference Database. ftp://ftp.ncbi.nih.gov/blast/db/; NCBI ftp site of available databases. http://www.ncbi.nlm.nih.gov/omim/; Online Mendelian Inheritance in Man, OMIM. McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), 2000. http://www.sanger.ac.uk/Projects/C_elegans/WORMBASE/current/wormpep.shtml; Wormpep.
Received August 3, 2004; accepted in revised format January 20, 2005. |