|
|
|
|
Genome Res. 14:742-749, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Resources Applications of a Rat Multiple Tissue Gene Expression Data Set1 Genomics Institute of the Novartis Research Foundation, San Diego, California 92121, USA 2 Nervous System Research, Novartis Pharma AG, 4002 Basel, Switzerland 3 Department of Psychiatry, UT Southwestern Medical Center, Dallas, Texas 75390, USA
With the sequencing and assembly of the rat genome comes the difficult task of assigning functions to genes. Tissue localization of gene expression gives some information about the potential role of a gene in physiology. Various examples of the utility of multiple tissue gene expression data sets are illustrated here. First, we highlight their use in finding genes that might play an important role in a particular tissue on the basis of exclusive expression in that tissue or coexpression with a gene or genes with known function. Second, we show how this data might be used to explain known phenotypic differences between strains. Third, we show how expression patterns of genes in a genomic interval might identify candidate genes in quantitative trait loci (QTL) mapping studies. Lastly, we show how multiple tissue and species data can help researchers prioritize follow up studies to microarray experiments. All of these applications of multiple tissue gene expression data sets will play a role in functionally annotating the rat genome.
The rat is preferred to mice to model some aspects of human physiology and disease due to its larger size, more complex metabolism, and advanced intelligence. Rats were the first mammalian species used for scientific research, and some important genetic discoveries were first made in this species (Jacob and Kwitek 2002
Regional gene expression data can be used in many ways to functionally annotate the genome (Su et. al. 2002
Here, we examine normal physiological expression levels of
Analysis of Strain Differences Sprague Dawley, Wistar, and Wistar Kyoto rats are some of the more common albino strains used to model human disease and physiology. Phenotypic differences that exist between the strains tend to complicate their use as models for human disease. However, these differences will ultimately reveal physiological roles of some genes. Here, we compare gene-expression differences between strains in a brain region mediating a known phenotypic difference.
Wistar Kyoto rats show exaggerated responses to a number of stressors, and are thus often used as a model to study the relationship between stress and depression (Rittenhouse et. al. 2002
Gene Coexpression Patterns Coexpression of uncharacterized genes with genes of important physiological function can be examined with this data set. Such an approach has been hypothesized to be effective in assigning functions to genes (Staudt and Brown 2000
A similar search for correlated expression patterns, using a seed set yields more sequences with the same general expression pattern. We used a function of the Rosetta Resolver algorithm (GROW, see Methods) that searches for genes with a similar expression pattern to a defined set of genes. As an example, several genes with known roles in striatal function were chosen for a seed set (GenBank accession nos. X57659 [GenBank] , X56065 [GenBank] , M35077 [GenBank] , AB019145 [GenBank] ). Using this method, many additional sequences with known roles in striatal neurotransmitter signaling are found, including type V adenylyl cyclase (M96159 [GenBank] ), CaM-PDE (M94537 [GenBank] ), and a striatal-enriched phosphatase (S49400 [GenBank] ) (Supplemental Table 1 available online at www.genome.org). In addition, a transcribed sequence of unknown function shows a dramatic enrichment in the nucleus accumbens core and shell, whole nucleus accumbens, and dorsal and ventral striatum of all rat strains analyzed (Affymetrix probe set rc_AI639435_at, Accession no. AI639435 [GenBank] ; Fig. 3). This expression pattern suggests that this unknown sequence may play an important role in striatal function.
Tissue-Enriched Gene Expression The Web interface for this data set is useful for finding sequences that are uniquely enriched in expression in particular tissues. The Web site tool allows for selection of one to several tissues, setting relative levels of expression between tissues, and specifying whether or not expression should be exclusive for the chosen tissue or set of tissues. The method simply searches for genes on the basis of specified fold expression levels in a tissue over the median expression level across all tissues. Supplemental Table 2 shows the results of a systematic search for sequences enriched in several brain regions over other tissues. Genes known to be enriched in particular regions, like the dopamine transporter in the ventral tegmental area, validate the utility of this tool. Some EST sequences also possess a restricted tissue distribution. For example, both the pituitary and the pineal contain several ESTs that are enriched in those regions and, given their similar restricted expression pattern to known genes, would indicate that they play an important role in the physiology of these tissues.
Using Gene Expression Information to Find Candidate Genes in Quantitative Trait Loci
Genes responsible for alcohol preference should be expressed in the nucleus accumbens; especially the nucleus accumbens shell region and/or the central nucleus of the amygdala (Koob 1999
Using Gene Expression Information to Prioritize Microarray Follow-Up Experiments Tissue expression information is a useful guide for the prioritization of follow-up experiments from gene expression studies. An example of the utility of our gene atlas in this role is from a rat model of addiction. Drug addicts and laboratory animals with drug experience, once made abstinent, very often return to consuming the drug once exposed to a previous environment related to drug taking (Stewart 1983 A comparison of the 6WD and 6EXT groups yields several transcripts for follow-up studies on the basis of their known function in particular tissues (Supplemental Table 4). ESTs of unknown function were initially ignored, as we had no further information about them. However, once we examined expression data across a variety of tissues, some ESTs that demonstrated interesting expression patterns could be moved up the priority list for further studies. For example, one sequence (Affymetrix probe set AF055714UTR#1_at, Accession no. AF055714 [GenBank] ), in addition to being the most down-regulated transcript in the comparison, also shows highly enriched expression in the nucleus accumbens core and shell of Sprague Dawley rats (Fig. 5). In addition to its regulation by the behavioral condition of interest, this tissue-restricted pattern suggests a prominent role in the function of the nucleus accumbens.
Comparison of the expression pattern of rat and human orthologous genes can help prioritize followup experiments from rat gene-expression studies, as well as determine whether or not rats should be used as models when studying the functions of particular genes. We therefore compiled a list of rat to human orthologs using HomoloGene (Supplemental Table 5). We then compared expression data from rat and human across tissues in our data sets. An example of the usefulness of this information is illustrated here. From the cocaine-craving experiment described above, the most differentially expressed gene list (Supplemental Table 4) is compared with a list of genes showing the highest expression correlations between human and rat (from Supplemental Table 5). One of the resulting sequences is BHF-1, also called NeuroD1, a basic helix-loop-helix protein that, upon a preliminary search of the human ortholog in Locus Link (http://www.ncbi.nlm.nih.gov/LocusLink/
We describe a data set and accompanying Web site of rat gene expression across multiple tissues in commonly used strains. Multiple examples of how this data set can reveal interesting potential functions of genes are illustrated. Our public human and mouse gene-expression data (http://expression.gnf.org Careful consideration of the potential experimental variables should be recognized in these large data sets. This is especially true with public data sets that likely contain samples taken from a variety of laboratories. Although standard Affymetrix procedures were used in the two laboratories where these microarray studies were done, some slight laboratory-specific or operator-specific differences may exist. To minimize these potential differences, care was taken in this study to only include arrays and samples of similar quality (percent present scores, background, actin, and GAPDH 3'/5' ratios). Another potential variable becomes more pronounced when increasingly refined dissections are taken, and when similar dissections are taken from different laboratories and combined into one database. For this reason, detailed descriptions of the dissection procedures should be given when dissection borders are not obvious.
Basal gene expression measurements across multiple tissues can be used as reference data to prioritize follow-up experiments of expression array studies. Multiple simultaneous measurements inherent in expression studies result in many false-positive expression changes. In addition to the false positives, in an organism such as the rat, in which most genes are uncharacterized, some decision has to be made whether or not to pursue further experiments with the uncharacterized transcripts. Researchers often perform sequence analysis to investigate the function of an uncharacterized transcript; however, identifying common domains from sequence only reveals a molecular function and usually reveals no information on a gene's physiological function. An additional source of information is the pattern of gene expression in normal physiological tissues, which often can help the researcher decide whether or not to pursue a particular uncharacterized transcript based on a more complete description of gene function. A scenario can also be imagined where drug targets or diagnostic markers are sought from an expression experiment. Of the desirable expression changes found, some priority should be given to those genes that are expressed in the target or diseased tissue over other organs (Welsh et al. 2003 Regional expression data from multiple species can add an additional level of analysis to results of a microarray study. One should focus on regional expression patterns that are conserved across species, or at least consistent with human expression patterns. In the rat-to-human comparisons shown here, previously unknown discrepancies in expression patterns between the species can be found with several genes (see Interleukin 18, as U133A probe set 206295_at in Supplemental Table 5). Caution should be taken, however, in making conclusions when a limited set of tissues is compared, as high or low correlations might be due to the absence of a tissue where the key function of a particular gene is performed. For example, the present data set is lacking a couple of key organs, such as liver and lung. But when considered carefully, these types of comparisons can help a researcher discern whether or not to study a particular gene in a rodent model if the goal is to predict human physiology.
This gene expression data set can also aid in the hunt for functions of uncharacterized genes by pairing their expression patterns to known genes. In the few examples shown here, genes with expression patterns restricted to particular tissues likely play an important role in the function of those tissues. Likewise, genes that are coexpressed with members of a biochemical pathway might play a role in that pathway. A similar recent study in the Malarial parasite Plasmodium falciparum, although using various life-cycle stages instead of different tissues, has proven the utility of such an approach (Le Roch et al. 2003
This data set can also help identify potential candidate genes from published rat QTL studies. The genomic interval given in the example above unveiled several interesting genes based on expression pattern alone (paste probe set identifiers from Supplemental Table 3 into the dialog box on expression.gnf.org/ratlas Lastly, the data set presented here can help to explain some of the phenotypic differences that have been known to exist between common rat strains. In our search for the best model for human disease, we will learn more about the disease by comparing the strains. Some of the phenotypic differences might be predicted from genes of known function. It might also be possible to exploit these strain differences to discover the functions of novel genes, or to discover additional functions of known genes. The public release of this data set coincides with the release of the rat genome sequence. It won't be long until full-transcriptome rat chips are commercially available and larger data sets than the one presented here are generated. These types of reference data sets will speed the characterization of rat gene function, and ultimately, the function of human genes that the rats are meant to model.
Tissue and Microarray Processing Tissues for expression studies were collected from a variety of sources (Supplemental Table 6). Tissues were homogenized in Trizol (Invitrogen), and total RNA was purified with Rneasy columns (Qiagen). Replicates consisted of either duplicate pools of tissue used for each array, or one tissue from one animal per array (see Supplemental Table 6). Five micrograms total RNA, or 0.2 µg poly(A+) RNA, was used for cDNA synthesis and cRNA amplification, and chips were hybridized to RGU34A arrays (Affymetrix) according to standard Affymetrix protocols (Affymetrix Expression Analysis Technical Manual, http://www.affymetrix.com/support/technical/manuals.affx
Training of Drug-Seeking and Nonseeking Rats
Analysis of Differential Gene Expression For display of differential expression between strains, MAS 5- and Resolver-processed data (see above) was further filtered in Rosetta Resolver's clustering algorithm according to the following criteria (detection P < 0.05 and present score required for at least three chips, coefficient of variation across all samples at least 0.5). Only 10 sequences remained after these three filtering steps.
Gene Coexpression/Tissue Enrichment For finding genes coexpressed with a set of striatal-enriched genes, GROW (Rosetta Resolver) was performed using this set as a seed set. GROW is a pattern-finding algorithm that searches for additional genes and experiments with a similar pattern to the seed set, and essentially perform a two-dimensional (across genes and tissues) Pearson correlation.
For finding genes with highly enriched expression in particular tissues, the RAtlas expression pattern interface from the Web site (http://expression.gnf.org/ratlas
Finding Candidate Genes in QTL
Ortholog Comparisons
We thank Mimi Hayakawa for technical assistance. Also, thanks to Christine Sturchler (Novartis) for data file acquisition and Lisa Tarantino for critically reading the manuscript. We thank Teresa Reyes, Tamas Bartfai, Pietro Sanna, Athina Markou, Trevor Young, and Martin Alda for providing some of the tissues. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2161804.
4 Corresponding author.
[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GEO (http://www.ncbi.nlm.nih.gov/geo/
Baldessarini, R.J. 1996. Drugs and the treatment of psychiatric disorders. In The pharmacological basis of therapeutics, 9th Ed. (eds. P.B. Molinoff and R.W. Ruddon), pp. 431-459. McGraw-Hill, New York. Bell, S.M., Reynolds, J.G., Thiele, T.E., Gan, J., Figlewicz, D.P., and Woods, S.C. 1998. Effects of third intracerebroventricular injections of corticotropin-releasing factor (CRF) on ethanol drinking and food intake. Psychopharmacology 139: 128-135.[CrossRef][Medline] Bice, P., Foroud, T., Bo, R., Castelluccio, P., Lumeng, L., Li, T.-K., and Carr, L.G. 1998. Genomic screen for QTLs underlying alcohol consumption in the P and NP rat lines. Mamm. Genome 9: 949-955.[CrossRef][Medline]
Brundege, J.M. and Williams, J.T. 2002. Differential modulation of nucleus accumbens synapses. J. Neurophysiol. 2002. 88: 142-151. Carr, L.G., Foroud, T., Bice, P., Gobbett, T., Ivashina, J., Edenberg, H., Lumeng, L., and Li, T.K. 1998. A quantitative trait locus for alcohol consumption in selectively bred rat lines. Alcohol Clin. Exp. Res. 22: 884-887.[CrossRef][Medline] Chung, Y.H., Shin, C.M., Kim, M.J., and Cha, C.I. 2000. Immunohistochemical study on the distribution of six members of the Kv1 channel subunits in the rat basal ganglia. Brain Res. 875: 164-170.[Medline] Ehlers, C.L., Chaplin, R.I., Wall, T.L., Lumeng, L., Li, T.K., Owens, M.J., and Nemeroff, C.B. 1992. Corticotropin releasing factor (CRF): Studies in alcohol preferring and non-preferring rats. Psychopharmacology 106: 359-364.[CrossRef][Medline] Franklin, A., Kao, A., Tapscott, S., and Unis, A. 2001. NeuroD homologue expression during cortical development in the human brain. J. Child Neurol. 16: 849-853.[Medline]
Henry, C. and Garcia, R. 2002. Prefrontal cortex long-term potentiation, but no long-term depression, is associated with the maintenence of extinction of learned fear in mice. Neuroscience 22: 577-583. Hoffman, B.B., Lefkowitz, R.J., and Taylor, P. 1996. Neurotransmission. In The pharmacological basis of therapeutics, 9th Ed. (eds. P.B. Molinoff and R.W. Ruddon), pp. 105-139. McGraw-Hill, New York. Horikawa, Y., Oda, N., Cox, N.J., Li, X., Orho-Melander, M., Hara, M., Hinokio, Y., Linder, T.H., Mashima, H., Schwarz, P.E., et al. 2000. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat. Genet. 26: 163-175.[CrossRef][Medline] Hugot, J., Chamaillard, M., Zouali, H., Lesage, S., Cezard, J.P., Belaiche, J., Almer, S., Tysk, C., O'Morain, C.A., Gassull, M., et al. 2001. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411: 599-603.[CrossRef][Medline] Jacob, H.J. and Kwitek, A.E. 2002. Rat genetics: Attaching physiology and pharmacology to the genome. Nat. Rev. Genet. 3: 33-42.[CrossRef][Medline]
Koob, G.F. 1999. The role of the striatopallidal and extended amygdala systems in drug addiction. Ann. NY Acad. Sci. 877: 445-460. Lahmame, A. and Armario, A. 1996. Differential responsiveness of inbred strains of rats to antidepressants in the forced swimming test: Are Wistar Kyoto rats an animal model of subsensitivity to antidepressants? Psychopharmacology 123: 191-198.[CrossRef][Medline]
Le Roch, K.G., Zhou, Y., Blair, P.L., Grainger, M., Moch, J.K., Haynes, J.D., De La Vega, P., Holder, A.A., Batalov, S., Carucci, D.J., et al. 2003. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301: 1503-1508.
Liang, T., Spence, J., Liu, L., Strother, W.N., Chang, H.W., Ellison, J.A., Lumeng, L., Li, T.K., Foroud, T., and Carr, L.G. 2003.
Liu, M., Pleasure, S.J., Collins, A.E., Noebels, J.L., Naya, F.J., Tsai, M.J., and Lowenstein, D.H. 2000. Loss of BETA2/NeuroD leads to malformation of the dentate gyrus and epilepsy. Proc. Natl. Acad. Sci. 97: 865-870. McBride, W.J. 2002. Central nucleus of the amygdala and the effects of alcohol and alcohol-drinking behavior in rodents. Pharmacol. Biochem. Behav. 71: 509-515.[CrossRef][Medline]
Mootha, V.K., Lepage, P., Miller, K., Bunkenborg, J., Reich, M., Hjerrild, M., Delmonte, T., Villeneuve, A., Sladek, R., Xu, F., et. al. 2003. Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc. Natl. Acad. Sci. 100: 605-610. Ogura, Y., Bonin, D.K., Inohara, N., Nicolae, D.L., Chen, F.F., Ramos, R., Britton, H., Moran, T., Karaliuskas, R., Duerr, R.H., et al. 2001. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411: 603-606.[CrossRef][Medline] Olive, M.F., Mehmert, K.K., Koenig, H.N., Camarini, R., Kim, J.A., Nannini, M.A., Ou, C.J., and Hodge, C.W. 2002. A role for corticotropin releasing factor (CRF) in ethanol consumption, sensitivity, and reward as revealed by CRF-deficient mice. Psychopharmacology 165: 181-187.
Rajagopalan, D. 2003. A comparison of statistical methods for analysis of high density oligonucleotide array data. Bioinformatics 19: 1469-1476. Rittenhouse, P.A., Lopez-Rubalcava, C., Stanwood, G.D., and Lucki, I. 2002. Amplified behavioral and endocrine responses to forced swim stress in the Wistar-Kyoto rat. Psychoneuroendocrinology 27: 303-318.[CrossRef][Medline]
Rivier, C.L., Grigoriadis, D.E., and Rivier, J.E. 2003. Role of corticotropin-releasing factor receptors type 1 and 2 in modulating the rat adrenocorticotropin response to stressors. Endocrinology 144: 2396-2403. Schmidt, E.F., Sutton, M.A., Schad, C.A., Karanian, D.A., Brodkin, E.S., and Self, D.W. 2001. Extinction training regulates tyrosine hydroxylase during withdrawal from cocaine self-administration. J. Neurosci. 21: RC137:1-RC135. Self, D.W. and Stein, L. 1992. The D1 agonists SKF 82958 and SKF 77434 are self-administered by rats. Brain Res. 582: 349-352.[CrossRef][Medline] Staudt, L.M. and Brown, P.O. 2000. Genomic views of the immune system. Annu. Rev. Immunol. 18: 829-859.[CrossRef][Medline] Stewart, J. 1983. Conditioned and unconditioned drug effects in relapse to opiate and stimulant drug self-adminstration. Prog. Neuropsychopharmacol. Biol. Psychiatry 7: 591-597.[CrossRef][Medline]
Stuart, J.M., Segal, E., Koller, D., and Kim, S.K. 2003. A Gene-coexpression network for global discovery of conserved genetic modules. Science 302: 249-255.
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et. al. 2002. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. 99: 4465-4470. Sutton, M.A., Schmidt, E.F., Choi, K.H., Schad, C.A., Whisler, K., Simmons, D., Karanian, D.A., Monteggia, L.M., Neve, R.L., and Self, D.W. 2003. Extinction-induced upregulation in AMPA receptors reduces cocaine-seeking behaviour. Nature 421: 70-75.[CrossRef][Medline] Tejani-Butt, S.M., Pare, W.P., and Yang, J. 1994. Effect of repeated novel stressors on depressive behavior and brain norepinephrine receptor system in Sprague-Dawley and Wistar Kyoto (WKY) rats. 1994. Brain Res. 649: 27-35.[CrossRef][Medline] Terenina-Rigaldie, E., Moisan, M.P., Colas, A., Beauge, F., Shah, K.V., Jones, B.C., and Mormede, P. 2003. Genetics of behaviour: Phenotypic and molecular study of rats derived from high- and low-alcohol consuming lines. Pharmacogenetics 13: 543-554.[CrossRef][Medline]
Welsh, J.B., Sapinoso, L.M., Kern, S.G., Brown, D.A., Liu, T., Bauskin, A.R., Ward, R.L., Hawkins, N.J., Quinn, D.I., Russell, P., et. al. 2003. Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum. Proc. Natl. Acad. Sci. 100: 3410-3415.
http://expression.gnf.org; GNF GeneAtlas Web site. http://expression.gnf.org/ratlas; RAtlas link from GNF GeneAtlas Web site. http://www.ncbi.nlm.nih.gov/LocusLink/; Locus Link. http://www.affymetrix.com/support/technical/manuals.affx; Affymetrix Web site technical manuals. http://www.ncbi.nlm.nih.gov/HomoloGene; Homologene. http://www.affymetrix.com; Affymetrix main page. http://genome.ucsc.edu; UCSC Genome Bioinformatics Site. http://www.rosettabio.com/publications/default.htm; Rosetta Biosoftware, publications. http://ratmap.gen.gu.se/; Ratmap, The Rat Genome Database. http://www.ncbi.nlm.nih.gov/PubMed/; PubMed. http://www.ncbi.nlm.nih.gov/geo/; Gene Expression Omnibus home page. http://symatlas.gnf.org; GNF GeneAtlas Web site.
Received January 19, 2004;
accepted in revised format February 11, 2004.
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||