|
|
|
|
Genome Res. 14:54-61, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Letter Elevated Rates of Protein Secretion, Evolution, and Disease Among Tissue-Specific GenesMRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK
Variation in gene expression has been held responsible for the functional and morphological specialization of tissues. The tissue specificity of genes is known to correlate positively with gene evolution rates. We show here, using large data sets, that when a gene is expressed highly in a small number of tissues, its protein is more likely to be secreted and more likely to be mutated in genetic diseases with Mendelian inheritance. We find that secreted proteins are evolving at faster rates than nonsecreted proteins, and that their evolutionary rates are highly correlated with tissue specificity. However, the impact of secretion on evolutionary rates is countered by tissue-specific constraints that have been held constant over the past 75 million years. We find that disease genes are underrepresented among intracellular and slowly evolving housekeeping genes. These findings illuminate major selective pressures that have shaped the gene repertoires expressed in different mammalian tissues.
The human body is assembled from >200 cell types present in a variety of tissue types. Variations in gene expression patterns are thought to underlie the morphological differences apparent between different tissue types (King and Wilson 1975
Comparative tissue gene-expression analysis can exploit high-throughput gene-expression data from expressed sequence tag (EST), serial analysis of gene expression (SAGE), and microarray gene-expression systems. In particular, high-quality data sets have been made available by Su and colleagues from 46 human and 45 mouse tissues obtained by use of high-density oligonucleotide microarrays (Su et al. 2002
Studying the evolution of genes has increased our understanding of the selective pressures that have shaped organism fitness (Hughes 1999
EST data have been used previously to show that substitution rates at nonsynonymous sites are strongly negatively correlated with tissue distribution breadth (Duret and Mouchiroud 2000 Several studies have considered the expression of single genes in multiple tissues from a single organism. In contrast, we wished to consider the expression of multiple genes in multiple tissues from two species (human and mouse) in order to investigate functional and evolutionary aspects of tissue biology. To link genetic, cellular, and tissue aspects with models of mammalian gene evolution, we have studied tissue-specific genes with respect to their involvement in disease, protein localization, and evolutionary rates.
Our initial studies investigated possible relationships between the following four quantities: (1) tissue specificity of gene expression, (2) protein secretion, (3) KA/KS ratio, and (4) association with human disease. Previous studies have suggested that the sequences of tissue-specific genes, and gene portions whose products are secreted, tend to be more divergent (Duret and Mouchiroud 2000
Tissue Specificity of Gene Expression
To investigate possible relationships between TS and protein secretion, evolutionary rate, or disease, we divided the abovementioned 4960 genes into five partitions according to their maxTS (Table 1). Partition 1 (0 < maxTS < 0.1) contains housekeeping genes (Warrington et al. 2000
Studies of All Genes For each of the five maxTS partitions, we calculated three quantities as follows: the median of the genes' KA/KS values, the fraction of predicted (Nielsen et al. 1997
Increase of tissue specificity is also associated with elevated median values of both KA and KS (Table 1). Rate variation in synonymous site substitutions (KS) has been proposed previously to have arisen from nonsynonymous (KA) mutational influences of 5'- and 3'-flanking bases (Bains 1992 To further investigate possible correlations among maxTS, median KA/KS, fsec, and fdis, we analyzed the relationships between each pair of these quantities in turn. When considering protein secretion and evolutionary rate, the set of secreted gene products was found to exhibit a significantly higher median KA/KS value (0.115) than the set of nonsecreted gene products (0.065; Kolmogorov-Smirnov P-value < 2 x 10-16). When considering protein secretion and disease association, we observed that 39% of the complete set of disease genes encode predicted secreted proteins, compared with only 16.1% of genes that are not known to be associated with disease. In addition, secreted proteins exhibit a greater correlation between median KA/KS and maxTS than do nonsecreted proteins (Fig. 1B,C). This suggests that genes encoding secreted products account for much of the dependency observed between tissue specificity and KA/KS. We also found that the median KA/KS differences between secreted and nonsecreted proteins are significant for each maxTS partition (Fig. 1, legend). This indicates that secretion and KA/KS are highly correlated, irrespective of tissue specificity. We found no significant difference between the KA/KS distributions for disease genes and nondisease genes (Kolmogorov-Smirnov test probability for the difference = 0.36). However, when measuring the dependency between median KA/KS and tissue specificity, we found that for partition 1 only, disease genes exhibit significantly higher KA/KS values, on average, than nondisease genes (P = 1 x 10-4; Fig. 1D,E). It thus seems that slow-evolving housekeeping genes are underrepresented in disease. At first glance, this is surprising, because mutations in highly conserved and ubiquitously expressed genes might be thought to be more liable to cause disease. However, our previous finding that housekeeping genes are more likely to have been subject to strong purifying selection (i.e, have lower KA/KS values) suggests an alternative explanation. This is that housekeeping genes are underrepresented among disease genes, due to a higher chance of embryonic lethality when mutated. Thus, we predict that our results reflect prenatal pathology, rather than postnatal disease.
To test this hypothesis, we linked the human genes represented in the five partitions, with their probable orthologs in the nematode Caenorhabditis elegans (see Methods). No large-scale targeted-deletion data set is available for mammals. Results from an RNAi screen involving the majority of C. elegans genes (Kamath and Ahringer 2003
Studies of Tissue-Specific Genes
Duret and Mouchiroud (2000
Tissue Specificity and Evolutionary Rates For half of the human tissues considered, TS values vary uniformly with respect to KA/KS (Supplemental Fig. 2a). However, for other tissues, in particular from brain and liver, strong dependencies exist between TS and KA/KS (Supplemental Fig. 2b). For these two tissues, similar dependencies are apparent for either adult or fetal data (Table 3). In contrast, the distributions for the Hep3b-transformed liver cell line and normal liver cells differ significantly. This is likely to be due to the loss of liver-specific characteristics among Hep3b highly expressed genes, as seen by word stem analysis (Suppl. Table 1).
Tissue Biology We sought to investigate whether tissues exhibit variations in their tissue-specific gene repertoires between human and mouse. We calculated the fraction (fT) of human genes that are specific to a tissue, whose mouse ortholog is also specific to that tissue. Values of fT (Table 2) demonstrate approximately eightfold variation among tissues, with brain and liver, the two tissues that exhibit most variation in protein-coding evolutionary rates, showing the greatest fT values. Thus, although on average, brain-specific proteins evolve most slowly and liver-specific proteins most rapidly, both tissues exhibit high conservation of their tissue-specific gene sets that have been maintained since the common ancestor of human and mouse.
Constancy of Selective Pressures
Three factors that are not mutually exclusive appear to predominate in shaping the repertoires of genes expressed in restricted tissue sets. First, the most rapidly evolving genes have a greater likelihood of being expressed in fewer tissues. Our results, which used large-scale microarray gene-expression data confirm and extend those from a previous EST-based study (Duret and Mouchiroud 2000
A second factor affecting gene expression in tissues is protein cellular localization. We have shown that there is a strong correlation among tissue specificity, protein secretion, and evolutionary rates. Secreted gene products have significantly higher KA/KS values than nonsecreted gene products and are enriched among tissue specific genes. To some degree, these observations are consistent with the `genetic arms race' hypothesis (Dawkins and Krebs 1979
Lastly, gene-expression profiles are largely influenced by tissue-specific biology. For example, brain-specific genes are poorly represented among disease genes, and purifying selection appears to predominate in preserving brain-specific functions, such as cognition and information processing. This is consistent with a model in which mutation of brain-specific genes is more likely to result in embryonic lethality, compared with mutation of genes highly expressed in other tissues (Table 2). Tissue-specific biology and protein-secretion effects are largely independent. For instance, brain-specific genes, which encode secreted proteins, evolve more slowly than genes encoding secreted proteins from other tissues. Testis-specific genes are associated with elevated KA/KS values, even for intracellular gene products (Table 2). This may be a consequence of sexual selection (Darwin 1871 Our analysis of disease genes has resulted in three findings. Firstly, the frequency of Mendelian-type diseases positively correlates with gene tissue specificity (Table 1). Secondly, the set of disease genes is enriched in genes whose products are secreted. Thirdly, although no significant differences between the evolutionary rates of disease and nondisease genes were detected, we found significant differences between these rates among genes with low tissue-specificity (i.e., housekeeping genes, partition 1, Fig. 1D,E). These three findings highlight a complex association between disease genes, tissue specificity, and evolution rates. We find that slowly evolving housekeeping and intracellular proteins' genes are underrepresented in human disease. This is likely to be due to higher degrees of purifying selection forces acting upon them and the greater chance of embryonic lethality when mutated. These genes can thus be regarded as essential to the organism's course of development. To test this assertion, we considered the association between tissue specificity of human genes and the embryonic and larval lethality exhibited by targeted deletion of their probable orthologs in the nematode C. elegans. Consistent with the positive correlation between tissue specificity and disease, we find that nematode lethal phenotypes are negatively correlated with human-tissue specificity (Table 1). We observed that disease genes are more likely to be highly expressed in tissues such as liver, kidney, or lung, and to have products that are secreted. Consequently, these correlations should assist in the prioritization of candidate disease genes for genetic-association studies. Moreover, the identification of tissue specificity, protein secretion, and tissue-specific biology as main factors influencing gene evolutionary rates should assist investigations into the evolution of individual mammalian tissues and organs.
Mapping Expression Data to Sequence We used NOVARTIS microarray data (http://expression.gnf.org
Tissue Specificity and KA/KS Values
fsec,fdis
C. elegans RNAi Phenotype Study
Statistics
We thank the Medical Research Council (UK) for financial support. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1924004.
1 Corresponding author. [Supplemental material is available online at www.genome.org.]
Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., and Watson, J.D. 1994. Chapter 1. From single cells to multicellular organisms. In Molecular biology of the cell, pp. 26-39. Garland Publishing, Inc., New York.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Bains, W. 1992. Local sequence dependence of rate of base replacement in mammals. Mutat. Res. 267: 43-54.[Medline]
Bulmer, M., Wolfe, K.H., and Sharp, P.M. 1991. Synonymous nucleotide substitution rates in mammalian genes: Implications for the molecular clock and the relationship of mammalian orders. Proc. Natl. Acad. Sci. 88: 5974-5978.
Castillo-Davis, C.I. and Hartl, D.L. 2002. Genome evolution and developmental constraint in Caenorhabditis elegans. Mol. Biol. Evol. 19: 728-735.
Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., et al. 2003. Ensembl 2002: Accommodating comparative genomics. Nucleic Acids Res. 31: 38-42. Darwin, C. 1871. The descent of man and selection in relation to sex. D. Appleton, New York. Dawkins, R. and Krebs, J.R. 1979. Arms races between and within species. Proc. R. Soc. Lond. B. Biol. Sci. 205: 489-511.[Medline] Duret, L. 2002. Evolution of synonymous codon usage in metazoans. Curr. Opin. Genet. Dev. 12: 640-649.[CrossRef][Medline]
Duret, L. and Mouchiroud, D. 2000. Determinants of substitution rates in mammalian genes: Expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17: 68-74.
Giraud, A., Matic, I., Tenaillon, O., Clara, A., Radman, M., Fons, M., and Taddei, F. 2001. Costs and benefits of high mutation rates: Adaptive evolution of bacteria in the mouse gut. Science 291: 2606-2608. Hughes, A.L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York. Huminiecki, L., Lloyd, A.T., and Wolfe, K.H. 2003. Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases. BMC Genomics 4: 31.[CrossRef][Medline] Jordan, I.K., Kondrashov, F.A., Rogozin, I.B., Tatusov, R.L., Wolf, Y.I., and Koonin, E.V. 2001. Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins. Genome Biol. 2: RESEARCH0053. Kamath, R.S. and Ahringer, J. 2003. Genome-wide RNAi screening in Caenorhabditis elegans. Methods 30: 313-321.[CrossRef][Medline]
Kent, W.J. 2002. BLATthe BLAST-like alignment tool. Genome Res. 12: 656-664.
King, M.C. and Wilson, A.C. 1975. Evolution at two levels in humans and chimpanzees. Science 188: 107-116. McKusick, V.A. 2000. Online mendelian inheritance in man, OMIM (TM).
Meiklejohn, C.D., Parsch, J., Ranz, J.M., and Hartl, D.L. 2003. Rapid evolution of male-biased gene expression in Drosophila. Proc. Natl. Acad. Sci. 100: 9894-9899.
Menne, K.M., Hermjakob, H., and Apweiler, R. 2000. A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16: 741-742. Mouchiroud, D., Gautier, C., and Bernardi, G. 1995. Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40: 107-113.[CrossRef][Medline]
Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10: 1-6.
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al. 2002. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. 99: 4465-4470.
Warrington, J.A., Nair, A., Mahadevappa, M., and Tsyganskaya, M. 2000. Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol. Genomics 2: 143-147. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562.[CrossRef][Medline] Wyckoff, G.J., Wang, W., and Wu, C.I. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403: 304-309.[CrossRef][Medline]
Yang, Z. and Nielsen, R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17: 32-43. Zhang, J., Zhang, Y.P., and Rosenberg, H.F. 2002. Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leaf-eating monkey. Nat. Genet. 30: 411-415.[CrossRef][Medline]
http://expression.gnf.org; Gene Expression Atlas. http://www.ensembl.org; Project ENSEMBL. http://genome.ucsc.edu; UCSC Genome Server. http://www.ncbi.nlm.nih.gov/Omim; Online Mendelian Inheritance in Man (OMIM). http://www.minitab.com; MINITAB statistical software. http://www.wormbase.org; WORMBASE database.
Received August 29, 2003;
accepted in revised format October 29, 2003.
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||