|
|
|
|
Published online before print
March 9, 2007, 10.1101/gr.6144007 Genome Res. 17:401-404, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Perspective Evolution and multilevel optimization of the genetic code1 Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA; 2 School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA
The discovery of the genetic code was one of the most important advances of modern biology. But there is more to a DNA code than protein sequence; DNA carries signals for splicing, localization, folding, and regulation that are often embedded within the protein-coding sequence. In this issue, Itzkovitz and Alon show that the specific 64-to-20 mapping found in the genetic code may have been optimized for permitting protein-coding regions to carry this extra information and suggest that this property may have evolved as a side benefit of selection to minimize the negative effects of frameshift errors.
The first glimmer of light in the story of the code came when Dounce (1952) proposed the extraordinary for its time idea that the order of nucleotides determines the order of amino acids in polypeptide chains. After the discovery of the double-helix structure of DNA (Watson and Crick 1953
But analyzing Gamows results (Gamow 1954
The guessing game continued. Sinsheimer (1959)
The discovery of the actual genetic code by Nirenberg and coworkers (Nirenberg and Matthaei 1961
There has been much speculation about how the code evolved (Osawa et al. 1992
These hypotheses are not mutually exclusive, and there is some support for all of them ruling out Cricks frozen accident hypothesis. Evidence for optimization of the code for certain functions exists, as discussed above, and there are indications that the usage frequencies of some amino acids in proteins are decreasing, while those of others are increasing (Jordan et al. 2005
The discovery of variant codes (Barrell et al. 1979
But if the code was optimized for some functions, are there other, less obvious, functions for which it is also optimal? Frameshift mutations might be important because they result in nonfunctional proteins, which waste resources and could also be toxic. A way to minimize the resource waste is to terminate elongation as quickly as possible after the error. There are some bioinformatics clues that the impact of frameshift errors was minimized in evolution. It has been observed that in many (albeit not all) organisms, codon usage frequencies are biased toward codons that can contribute to stop codons if read off-frame (Seligmann and Pollock 2004
If optimization of fast termination after a frameshift error is built into the genetic code itself, what would an optimal code look like? Cricks comma-less code, interpreted such that all nonsense codons correspond to "stop codons" in todays terminology, is the perfect code in this respect: it stops translation immediately after a translational frameshift. However, such extreme optimization comes at a high price. Since there are no synonymous codons for any amino acid in the comma-less code, the majority of point mutations result in nonsense codons, essentially equivalent to null mutations. This would highly increase the mutational load. In the actual genetic code, only about one of 20 point mutations results in a new stop codon (Osawa et al. 1992
In this issue of Genome Research, Itzkovitz and Alon report on the intriguing discovery of two new properties for which the genetic code seems to be optimized. They compared the actual genetic code with an ensemble of all other codes that are equally optimized with respect to mistranslation or mutation (for more on this statistical approach, see also Alff-Steinberger 1969
Itzkovitz and Alon suggest another, quite unanticipated, type of optimality: the code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences. Optimality for encoding additional information is particularly important and relevant given the known signals contained in the nucleotide sequence of coding regions. These include RNA splicing signals, which are encoded in the nucleotide sequence together with the amino acid sequence of the prospective protein (Cartegni et al. 2002 Interestingly, the optimal structure of the code for both information encoding and translation interruption after frameshift appear to derive from the same root cause, namely, the fact that stop codons can easily be concealed within a sequence. For example, the UGA stop codon is only one frameshift away from NNU|GAN; the GAN codons encode Asp and Glu, which are very common in protein sequences. Similarly, UAA and UAG can be frameshifted to give NNU|AAN and NNU|AGN (the AAN codons encode Asn or Lys and AGN gives Ser or Arg). Glu, Lys, Asp, Ser, and Arg are relatively common amino acids in the genome, so the probability of a stop codon arising from a misread of a codon from one of these three amino acids is very high. The fact that a stop codon can be "hidden" in this way using a frameshift means that even a signal sequence that happens to include a stop codon (a problem that is bound to arise sooner or later) can be encoded within the protein sequence by using one of the two reading frames in which the stop codon encodes for a frequently used amino acid. The ability to encode hidden messages is a direct result of the redundancy of the code. Like the universal genetic code, language, such as English, has considerable redundancy, i.e., it takes more letters and words to convey a certain message than necessary from an information theoretical point of view. In other words, the information content of an English sentence is less than what could be encoded in a sequence of Latin letters and punctuation marks of equal length. This redundancy allows for communicating several messages in parallela property occasionally used in human history for sending secret messages that are "camouflaged" in unsuspicious looking communications (steganography). An illustrating example can be found in the following sentence from the "Sherlock Holmes" story, The Adventure of the Gloria Scott (Conan Doyle 1893):
Reading every third word starting with the first (and adding a few punctuation marks), the hidden message emerges: "The game is up. Hudson has told all. Fly for your life." It becomes increasingly difficult to convey such additional messages in a communication with decreasing redundancy of the language or code that is used. This concept of simultaneously communicating two messages, one of which is more obvious and detailed than the other, is similar to that of providing a template for an amino acid sequence together with noncoding information in a nucleotide sequence. However, unlike in human communication, where the main message is used as a camouflage, secrecy is certainly not the reason for the use of this approach in nature. Rather, selection pressure for using resources efficiently may be the reason that the genetic code adapted this property. But, was it really a clear advantage in the early evolution of the code to be able to encode additional noncoding information? The correlation between the ability to encode additional information and the property of optimality of translational termination following frameshift errors offers a possible evolutionary scenario, in which selection for resource waste minimization favored codes that efficiently terminate translation, and the ability of the code to carry additional information was a byproduct. This second property may have become important only later on, when additional complex regulatory programs and regulatory motifs started to develop. A possible exception is the ability to include sequences for stabilizing RNA secondary structure. RNA molecules that possessed this ability in parallel to their protein-coding function might have had an advantage over RNAs that were less effective in this ability. As we learn more about the functions of the genetic code, it becomes ever clearer that the degeneracy in the genetic code is not exploited in such a way as to optimize one function, but rather to optimize a combination of several different functions simultaneously. Looking deeper into the structure of the code, we wonder what other remarkable properties it may bear. While our understanding of the genetic code has increased substantially over the last decades, it seems that exciting discoveries are waiting to be made.
We thank R. Ward for inspiring suggestions and P. Yeh for useful comments on the manuscript.
3 Corresponding author.
E-mail roy_kishony{at}hms.harvard.edu; fax (617) 432-5012. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6144007
Alff-Steinberger, C. 1969. The genetic code and error transmission. Proc. Natl. Acad. Sci. 64: 584591. Barrell, B.G., Bankier, A.T., and Drouin, J. 1979. A different genetic code in human mitochondria. Nature 282: 189194.[CrossRef][Medline] Brenner, S. 1957. On the impossibility of all overlapping triplet codes in information transfer from nucleic acid to proteins. Proc. Natl. Acad. Sci. 43: 687694. Cartegni, L., Chew, S.L., and Krainer, A.R. 2002. Listening to silence and understanding nonsense: Exonic mutations that affect splicing. Nat. Rev. Genet. 3: 285298.[CrossRef][Medline] Conan Doyle, A. 1893. The Memoirs of Sherlock Holmes. Murray, London. Crick, F.H. 1966. Codonanticodon pairing: The wobble hypothesis. J. Mol. Biol. 19: 548555.[Medline] Crick, F.H. 1968. The origin of the genetic code. J. Mol. Biol. 38: 367379.[CrossRef][Medline] Crick, F.H., Griffith, J.S., and Orgel, L.E. 1957. Codes without commas. Proc. Natl. Acad. Sci. 43: 416421. Di Giulio, M. 2004. The origin of the genetic code: Theories and their relationships, a review. Biosystems 80: 175184.[Medline] Dounce, A.L. 1952. Duplicating mechanism for peptide chain and nucleic acid synthesis. Enzymologia 15: 251258.[Medline] Fox, T.D. 1987. Natural variation in the genetic code. Annu. Rev. Genet. 21: 6791.[Medline] Freeland, S.J. and Hurst, L.D. 1998. The genetic code is one in a million. J. Mol. Evol. 47: 238248.[CrossRef][Medline] Gamow, G. 1954. Possible relation between deoxyribonucleic acid and protein structures. Nature 173: 318.[CrossRef] Gamow, G., Rich, A., and Y Golomb, S.W. 1962. Efficient coding for the desoxyribonucleic channel. In Proceedings of Symposia in Applied Mathematics, pp. 87100. American Mathematical Society, Providence, RI. Haig, D. and Hurst, L.D. 1991. A quantitative measure of error minimization in the genetic code. J. Mol. Evol. 33: 412417.[CrossRef][Medline] Hayes, B. 1998. The invention of the genetic code. Am. Sci. 86: 814.[CrossRef] Hurst, L.D., Feil, E.J., and Rocha, E.P.C. 2006. Causes of trends in amino-acid gain and loss. Nature 442: E11E12.[CrossRef][Medline] Jordan, I.K., Kondrashov, F.A., Adzhubei, I.A., Wolf, Y.I., Koonin, E.V., Kondrashov, A.S., and Sunyaev, S. 2005. A universal trend of amino acid gain and loss in protein evolution. Nature 433: 633638.[CrossRef][Medline] Katz, L. and Burge, C.B. 2003. Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 13: 20422051. Knight, R.D. and Landweber, L.F. 2000. Guilt by association: The arginine case revisited. RNA 6: 499510.[Abstract] Knight, R.D., Freeland, S.J., and Landweber, L.F. 1999. Selection, history and chemistry: The three faces of the genetic code. Trends Biochem. Sci. 24: 241247.[CrossRef][Medline] Knight, R.D., Freeland, S.J., and Landweber, L.F. 2001a. Rewiring the keyboard: Evolvability of the genetic code. Nat. Rev. Genet. 2: 4958.[CrossRef][Medline] Knight, R.D., Freeland, S.J., and Landweber, L.F. 2001b. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2: RESEARCH0010.[Medline] Konecny, J., Schoniger, M., Hofacker, I., Weitze, M.D., and Hofacker, G.L. 2000. Concurrent neutral evolution of mRNA secondary structures and encoded proteins. J. Mol. Evol. 50: 238242.[Medline] Lozupone, C., Changayil, S., Majerfeld, I., and Yarus, M. 2003. Selection of the simplest RNA that binds isoleucine. RNA 9: 13151322. Nirenberg, M. 2004. Historical review: Deciphering the genetic codeA personal account. Trends Biochem. Sci. 29: 4654.[CrossRef][Medline] Nirenberg, M.W. and Matthaei, J.H. 1961. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proc. Natl. Acad. Sci. 47: 15881602. Osawa, S., Jukes, T.H., Watanabe, K., and Muto, A. 1992. Recent evidence for evolution of the genetic code. Microbiol. Rev. 56: 229264. Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thastrom, A., Field, Y., Moore, I.K., Wang, J.P., and Widom, J. 2006. A genomic code for nucleosome positioning. Nature 442: 772778.[CrossRef][Medline] Seligmann, H. and Pollock, D.D. 2004. The ambush hypothesis: Hidden stop codons prevent off-frame gene reading. DNA Cell Biol. 23: 701705.[CrossRef][Medline] Shpaer, E.G. 1985. The secondary structure of mRNAs from Escherichia coli: Its possible role in increasing the accuracy of translation. Nucleic Acids Res. 13: 275288. Sinsheimer, R.L. 1959. Is the nucleic acid message in a 2-symbol code? J. Mol. Biol. 1: 218220.[Medline] Sonneborn, T. 1965. Degeneracy of the genetic code: Extent, nature, and genetic implications. In Evolving genes and proteins (eds. V. Bryson and H. Vogel), pp. 377397. Academic Press, New York. Vetsigian, K., Woese, C., and Goldenfeld, N. 2006. Collective evolution and the genetic code. Proc. Natl. Acad. Sci. 103: 1069610701. Watson, J.D. and Crick, F.H. 1953. Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Nature 171: 737738.[CrossRef][Medline] Woese, C.R. 1965a. On the evolution of the genetic code. Proc. Natl. Acad. Sci. 54: 15461552. Woese, C.R. 1965b. Order in the genetic code. Proc. Natl. Acad. Sci. 54: 7175. Woese, C.R., Dugre, D.H., Dugre, S.A., Kondo, M., and Saxinger, W.C. 1966a. On the fundamental nature and evolution of the genetic code. Cold Spring Harb. Symp. Quant. Biol. 31: 723736.[Medline] Woese, C.R., Dugre, D.H., Saxinger, W.C., and Dugre, S.A. 1966b. The molecular basis for the genetic code. Proc. Natl. Acad. Sci. 55: 966974. Wong, J.T. 1975. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. 72: 19091912. Wong, J.T. 2005. Coevolution theory of the genetic code at age thirty. Bioessays 27: 416425.[CrossRef][Medline] Y Yuan, G.C., Liu, Y.J., Dion, M.F., Slack, M.D., Wu, L.F., Altschuler, S.J., and Rando, O.J. 2005. Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309: 626630. Zuker, M. and Stiegler, P. 1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9: 133148.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||