|
|
|
|
Genome Res. 13:1828-1837, 2003 ©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00 Letter A Gene Recommender Algorithm to Identify Coexpressed Genes in C. elegans1 Department of Statistics, Stanford University, Stanford, California 94305, USA 2 Stanford Medical Informatics, MSOB X-215, Stanford, California 94305, USA 3 Departments of Developmental Biology and Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
One of the most important uses of whole-genome expression data is for the discovery of new genes with similar function to a given list of genes (the query) already known to have closely related function. We have developed an algorithm, called the gene recommender, that ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated. We used the gene recommender to find other genes coexpressed with several sets of query genes, including genes known to function in the retinoblastoma complex. Genetic experiments confirmed that one gene (JC8.6) identified by the gene recommender acts with lin-35 Rb to regulate vulval cell fates, and that another gene (wrm-1) acts antagonistically. We find that the gene recommender returns lists of genes with better precision, for fixed levels of recall, than lists generated using the C. elegans expression topomap.
The genome sequences of several animals have now been determined, revealing that the majority of genes have never been studied before (C. elegans Sequencing Consortium 1998
A previous report compiled data from a large number of C. elegans DNA microarray experiments, and then used a variant of multidimensional scaling to generate a gene expression terrain map (a topomap; Kim et al. 2001
The multidimensional scaling algorithm used to generate the gene expression topomap (Kim et al. 2001
In addition to noise, we must contend with multifunctional genes. Suppose that all of our query genes are expressed in muscle and that some, but not all, of these genes are also expressed in neurons. We would then expect the entire query list to be coregulated in experiments relevant to muscle expression but to be split in experiments related to neurons. Informed by the query list, the gene recommender algorithm would likely include the muscle experiments, exclude the neuronal experiments, and thus produce a hit list consisting of new candidate muscle genes but not neuronal genes. A global clustering approach such as the method used to construct the gene expression topomap would be expected to place a multifunctional gene with just one of its relevant groups, either muscle or neuronal. The problem addressed by the gene recommender closely resembles that of recommending movies, books, or web documents similar to those in a given list. Commonalities among these problems of diverse origin are outlined at http://www-stat.stanford.edu/~owen/transposable We tested the gene recommender using a query of five C. elegans genes involved in the retinoblastoma (Rb) complex. The gene recommender produced a short list of genes that are highly coexpressed with the five Rb query genes: three known to interact with Rb, five involved in the chromatin structure or the cell cycle (functions similar to those of Rb), and two that we show to have functions related to lin-35 Rb in RNA interference experiments. We compared the performance of the gene recommender algorithm to that of the gene expression topomap, and found that the gene recommender produced candidate lists that were shorter and more concentrated with the query genes.
We present a search algorithm called gene recommender to find new genes that are coexpressed with a given set of genes using data from a large number of C. elegans microarray experiments. The expression data set consists of 553 DNA microarray hybridizations, including a diverse set of experiments that profile expression changes during development, aging, following stress, in various mutants, and under various growth conditions (Kim et al. 2001 The gene recommender algorithm takes a list of query genes and scores each gene in the genome based on how similar its expression profile is to the expression profiles of the query genes. We use the term "cassette" to refer to a group of genes with a common function of interest. When some, but perhaps not all, of the cassette genes are known to us, we can use the known members as a query to the gene recommender and obtain a rank ordering of all genes. High-ranking genes are then strong candidates for membership in the cassette. The highest-ranking genes constitute a hit list analogous to the high-ranking web pages produced by a search engine. In both cases, there is usually not a sharp demarcation between relevant and irrelevant hits. The gene recommender first assigns a numerical score, called a Z score, to each experiment measuring the extent to which the query genes tend to cluster within that experiment (Methods). The high-scoring experiments are taken to be the ones that are most relevant to the query. The low-scoring ones may be irrelevant to the query and detrimental to the search for new genes. We use only the high-scoring experiments to rank genes according to their correlation with the query genes. To find the threshold score separating high-scoring from low-scoring experiments, we re-compute the gene ranking using a variety of thresholds and then select a threshold for which the query genes come closest to the top of the list (Methods).
We tested the performance of the gene recommender algorithm using four query lists (Table 1; http://pmgm2.stanford.edu/~kimlab/cassettes
For the sake of brevity, we only describe the results from the Rb query here; the gene recommender performed well with the other queries (presented on the Supplemental Web site). The retinoblastoma complex is a transcription factor complex that is conserved from worms to humans, and is involved in regulating the cell cycle (Dyson 1998
Coregulation of the Rb Query List in DNA Microarray Experiments
A second method to show that there is strong coregulation of the query genes in the microarray data is to determine that each of the genes in the query has a high score in leave-one-out experiments. For each query, one of the genes in the query was left out, the algorithm was rerun on the remaining genes, and the rank obtained by the gene that was left out was then scored. The ranks are scored as percentiles with 100 corresponding to the gene most similar to the query and 0 corresponding to the least similar. As we would expect, the histogram for random queries is nearly uniform over the range from 0% to 100% (Fig. 2). In contrast, the histogram for the real queries has a very large spike between the 95th and 100th percentile and a very small number of genes with much lower scores. This result shows that the genes in the query lists are coregulated, and that the gene recommender algorithm can accurately identify genes based on their level of coregulation.
An Rb Hit List Generated by the Gene Recommender We then used gene recommender to identify other genes that are coexpressed with the five genes in the Rb query list. The gene recommender used 320 of the 553 experiments, including every experiment in which none of the five query genes were missing. The gene recommender was able to cluster the five Rb query genes in a small group of 13 genes (shown in red in Table 2). As a control, the gene recommender did not succeed in clustering query genes into comparably small groups when it used five random samples of five genes (Table 3). This finding is consistent with the random queries having smaller experiment scores (Fig. 2), and indicates that the new candidate genes for the Rb query identified by the gene recommender are unlikely to be due to chance.
Among the top 20 ranked genes in the hit list generated using the Rb query, five are from the Rb query set itself. Three of the remaining 15 genes are also known to interact with Rb. In hindsight, they could reasonably have been included in the query group. Similarly, the top genes in the hit list generated by the gene recommender for the MSP query included some MSP genes that had been overlooked (Supplemental Table 2). The gene at the top of the Rb hit list, dpl-1, has a mutant phenotype similar to that of lin-35 Rb in worms and encodes a protein similar to mammalian DP1, a known Rb-binding protein (Lu and Horvitz 1998
Of the remaining 12 candidate genes, there are currently no data directly confirming whether or not they interact with the Rb complex. However, it is interesting that this set of genes is highly enriched for genes involved in regulation of chromatin structure and the cell cycle, which are functions related to those of the Rb complex. F55A3.7 encodes a protein similar to S. cerevisiae general chromatin factor Spt16p (Rowley et al. 1991
To examine how the genes in the Rb pathway are regulated, we analyzed the expression of the top 20 genes from the Rb hit list in published microarray experiments (Fig. 3). Expression of these genes is enriched in the germ line, and is stronger in oocytes than in sperm (Reinke et al. 2000
RNAi Analysis To analyze the function of the genes in the Rb hit list, we used RNA interference (RNAi) to induce their loss-of-function phenotypes. During C. elegans vulval development, the Rb complex and a redundant pathway both regulate cell divisions (Ferguson and Horvitz 1989
We used RNAi to specifically inhibit expression of the top 51 genes in the Rb hit list (Table 2 and Supplemental Web site). We induced RNAi by feeding bacterial strains that express dsRNA corresponding to genes in the Rb hit list. We tested the RNAi phenotype for each candidate gene in wild-type worms, a class A mutant strain (lin-8[n111]), and a class B mutant strain (lin-9[n112]). In these experiments, we found that the RNAi treatment did not induce a mutant phenotype in many cases, indicating that this assay may miss some new genes with a class B synMuv phenotype. Specifically, 30 of the 51 genes in the gene recommender hit list were previously known to have phenotypes that we would score as mutant in our RNAi assay (Supplemental Table 1). Of these 30, our RNAi experiments agreed with the previous mutant phenotype for seven genes, showed a weaker mutant phenotype for six genes, and exhibited no mutant phenotype for 17 genes. The RNAi experiments included four out of the five genes included in the Rb query. lin-35(RNAi) and lin-53(RNAi) had the expected class B synMuv phenotype. lin-36(RNAi) and hda-1(RNAi) appeared wild-type in all strains, even though loss-of-function mutations in these two genes result in a class B synMuv phenotype (Ferguson and Horvitz 1989
Of the three genes known to encode Rb-binding proteins, one (dpl-1) had a class B synMuv phenotype in RNAi experiments, consistent with previous results, whereas the other two (K12D12.1 and mcm-7) showed no mutant phenotype using any of the strains (Ceol and Horvitz 2001
Among the remaining 43 genes that we tested, two showed a phenotype indicating that they interact with the lin-35 Rb pathway. RNAi analysis of the 42nd gene in the Rb list (JC8.6) elicited a synMuv phenotype very similar to that of lin-35 Rb (Fig. 4). JC8.6 encodes a protein similar to mammalian tesmin and Arabidopsis TSO1. Although little is known about the function of tesmin, TSO1 plays a role in plant meristem cell division (Sugihara et al. 1999
Gene Recommender Generates a More Specific Hit List Than Does the Gene Expression Topomap One advantage of targeted clustering over global clustering algorithms is that experiments that do not contribute useful clustering information can be removed; for example, experiments that do not show coordinate regulation of the query list. Either noise or multifunctionality of some query genes can lead to an experiment's removal. To demonstrate the advantages of targeted clustering, we compared hit lists generated by the gene recommender to some hit lists generated using the gene expression topomap. To derive a hit list from the topomap, we located the query genes on the topomap, and then ranked all genes according to their distance from the centroid (average) of the query gene locations. First, we compared the hit lists produced by the gene recommender and the topomap using a method borrowed from information retrieval. If we knew the complete set of genes in the genome associated with the Rb pathway (the true Rb cassette), then we could compute the precision and recall for any hit list. Precision is the fraction of true Rb genes in a hit list out of the total number of genes in that hit list. Recall is the fraction of true Rb genes in the hit list out of all of the true Rb genes in the genome. There is a precisionrecall tradeoff, because increasing the size of a hit list usually lowers precision, but cannot lower recall. For us, precision is more important than recall. Higher precision means a greater chance that subsequent experiments will confirm predictions made by the gene recommender. In contrast, high recall is important when one is more interested in finding all or almost all genes relevant to a query. Because the true status of whether a gene interacts with the Rb complex is usually unknown, we judge the precision of a hit list by the proportion of the Rb query genes near the top of the list. Specifically, we construct hit lists containing just enough of the highest-ranking genes to obtain a given number of query genes, such that a shorter list is evidence of a more precise result. The true precision cannot be worse than our estimate, but it could be better due to true unknown Rb genes in the list. When comparing algorithms, small differences in estimated precision could arise from our inability to count true Rb genes that were not in the query. On the assumption that Rb genes are rare, large differences in estimated precision, like those shown below, cannot plausibly be due to uncounted true Rb genes. The set of five Rb query genes are localized in a broad area of mount 11 in the gene expression topomap. This broad area includes not only the five Rb query genes, but also 337 other genes (at 100% recall). In comparison, the top 13 genes from the gene recommender contained all five of the Rb query genes. To capture at least two Rb query genes requires the top six genes of the gene recommender list, but requires the top 138 genes from the topomap list (50% recall; Table 5). In addition to the Rb query list, the gene recommender provided a shorter list of candidate genes for each of the four sets of query genes compared to the list generated by the gene expression topomap (Supplemental Web site). These results demonstrate that the clusters generated by the gene recommender algorithm are tighter than those created by the gene expression topomap.
If a candidate gene showed tighter clustering with genes in a second pathway or if there were more experiments showing clustering with a second pathway, then a global clustering approach would include the candidate gene together with the genes in the second pathway instead of with the query genes. However, the gene recommender could cluster the candidate gene along with the genes in the Rb complex, because it only scores interactions with the query genes. We determined whether the gene recommender found genes that were missed by the gene expression topomap. Among the top 15 candidate genes in the gene recommender hit list, 11 are clustered along with the Rb genes in mount 11 on the gene expression topomap. However, four (K12D12.1, T16G12.5, F55A3.7, and drp-1) do not cluster with the Rb genes at all but are clustered together in mount 5 (Kim et al. 2001
The entire C. elegans research community is aided by functional genomics approaches, such as whole-genome expression profiling, global RNAi analysis, and high-throughput yeast two-hybrid analysis (Kim 2001 As many researchers use microarray data as the basis for designing genetic experiments, it is critical to develop better algorithms in order to better discern inherent biological patterns. Better algorithms in this setting have greater specificity and result in fewer false positives, thus saving time and expense in follow-up experiments. Here, we present an algorithm called a gene recommender to identify genes that are coexpressed with a given set of genes of interest. For identifying new genes coexpressed with a known set of genes, this algorithm has a number of advantages over global approaches used previously. First, the gene recommender selects for microarray experiments that are informative (i.e., showing tight coregulation of the query genes) and ignores uninformative experiments that would otherwise add noise to the calculations. As a result, it generates hit lists that are shorter and more concentrated with query genes than lists generated by the gene expression topomap. Second, the gene recommender can find interactions for genes that are in multiple clusters. Genes that are multifunctional, that are expressed at different times during development, or that are expressed in different tissues may interact with multiple different pathways. Global clustering approaches such as hierarchical clustering or multidimensional scaling would place such multifunctional genes into the strongest cluster and would thus lose interactions with other clusters. For example, the Rb hit list generated by the gene recommender included four genes (K12D12.1, T16G12.5, F55A3.7, and drp-1) that were not clustered with Rb by the gene expression topomap, but were instead clustered together along with other genes in mount 5.
While our work was in progress, a similar strategy (termed the signature algorithm) was developed independently by Ihmels et al. (2002
In addition to the examples shown here, the gene recommender can be used by the research community to find interacting genes, both in C. elegans and for other organisms once gene expression databases have been assembled (http://pmgm2.stanford.edu/~kimlab/cassettes In some cases, a single query list might be divided into subgroups that have distinct gene expression profiles. For example, the six genes in the meiotic recombination/DNA repair query could be split into a repair group and a recombination group. In some instances, it might be useful to run the gene recommender on the separate groups instead of the entire list; using the entire list could cause both subgroups to be averaged together, resulting in a loss of specificity. For the analysis discussed here, we found that queries ranging from five to 41 genes were reliably distinct from queries of the same size using random genes. But we found that queries using only three genes were oftentimes not distinct from random queries. These conclusions are specific to the data set used in the present study, and thus these findings regarding usable sizes of queries may change as more C. elegans expression data are added or for calculations on expression from other organisms. By performing the controls reported here, one can evaluate whether the hit list generated by the gene recommender using a real query is significantly different from a control using random data.
New Genes That Interact With the Rb Complex
wrm-1 encodes a
Neither WRM-1 activity nor Rb activity are thought to be directly controlled by transcriptional regulation: WRM-1 is a homolog of
There are examples of other Wnt signaling pathways that are known to antagonize the Rb pathway. In mouse breast cancers, Wnt acts as an oncogene by turning on a pathway involving
Algorithm Within the basic outline of our algorithm there is scope for variation. The choices we made were influenced by several factors. First, we wanted our method to be usable even with a large amount of missing data. Second, we put greater priority on precision than recall. Third, we put greater priority on finding cassette members not in the query list than on identifying possibly incorrect members of the query list. Some other choices were made for statistical simplicity and computational efficiency. For example, some of the simple statistics we use have distributions that are easily analyzed in ideal settings, giving a rough guide to, and a benchmark for, their behavior on real data. We think that the choices we made are appropriate, but we do not claim that any set of choices is compelling.
Normalization
The data are normalized by taking their ranks within rows. Let pi be the number of non-missing values among Yij for j=1,..., p. Let Rij be the rank of Yij among the non-missing values in Yi1,...,Yip. That is, Rij=1 for the smallest non-missing value, 2 for the second smallest, and so on. The transformed values are
To simplify notation, we now suppose that Yij are themselves the rank transformed values previously denoted by Y'ij. Rank transformation has several advantages. First, it diminishes the effects of outliers. Second, the data values for each gene have mean zero and variance 1/3. As a consequence, the correlation between the (ranked) expression levels of genes i and i' is linearly related to the sum
The non-missing data in each row could be replaced by quantiles for distributions other than U(-1,1). For example, normal scores
Experiment Scoring
For the j'th experiment, let
Q. The score itself is taken to be missing if kj is too small. The minimum value of kj is usually 5, but when the number |Q| of genes in Q is below 5, then |Q| becomes the lower limit on kj.
The informative experiments are considered to be those with ZE(j) far from zero. This score combines a preference for experiments with extreme expression levels (very large or very small As a rough guide, if experiment j is completely irrelevant to Q, then we would expect the values Yij to be a random sample of k values without replacement from the uniform distribution on (-1 and 1). The distribution of ZE(j) is then very nearly Student's t on k-1 degrees of freedom (assuming k<<n), which in turn is close to N(0,1) unless k is very small.
Gene Scoring There is some randomness in whether Yij will cluster near -1 or 1, due to "sign noise" in the original expression data. If two experimenters make different choices for which sample goes in the red versus green channels, then those experimenters' expression measurements will tend to have the same magnitude but opposite signs. This sign noise can have a severe effect on correlations but tends to have less effect on the uncentered correlation.
The score SG(i) is our measure of the extent to which gene i matches the query Q. We also compute a Z score
![]() for which neither Q,j nor Yij are missing. For nonquery genes, i Q, if Yij are randomly assigned independently of Q,j, then ZG(i) has approximately an N(0,1) distribution. We use SG as our guide to biological significance and ZG as a rough guide to statistical significance. In the absence of missing data, the ratio ZG(i)/SG(i) is the same for all genes i. When there are many missing values and their number varies from gene to gene, then |ZG| tends to be much larger for genes with fewer missing values.
Threshold Selection While it is intuitively reasonable that including irrelevant and noisy experiments can degrade the performance of gene scoring, there is not a good statistical argument to select a priori a Z-score above which an experiment will improve the search performance, and below which an experiment will degrade the search. The combined effect of several not-quite significant experiments may be informative, because lack of significance is a lack of proven relevance, and not necessarily a lack of relevance. Our approach is to explore a small grid of threshold values. We select the threshold Z to minimize the number of genes i Q that score higher than the median score of genes i Q. Our rationale is that a good set of experiments should bring the known members of Q to the top of the list. Our interest in specificity and belief that there may only be a small number of true unknown Q members lead us to prefer a small number of non-Q genes near the top of the list. Though the threshold was chosen to bring the original query to the top of the list, we see from Figure 2 that the query genes get high ranks even in leave-one-out experiments.
Comparing Hit Lists
Likelihood Ratios
0 are the mean and standard deviation computed from the SG(i) scores of the background distribution, and µQ and Q are the mean and standard deviation computed from the SG(i) scores of the query genes. We report the likelihood ratio eL(i) for each gene. We include all nonquery genes into the background distribution. Because some genes may have been placed erroneously into the query set, we only include query genes into the query distribution if their SG(i) scores in the 90th percentile among all genes. These ratios should be used with caution, because the normal approximation makes the likelihood ratios nonmonotonic with respect to SG(i). Extremely high-scoring genes can have smaller likelihood ratios than genes with correspondingly smaller scores. However this causes little problem, because the likelihood ratios are used to find an intuitive cutoff between the background and query distributions where the ratios are usually monotonically increasing in SG(i). We also count and record the number of non-Q genes, if any, scoring higher than all Q members as well as the number scoring higher than at least one Q member.
Similar Algorithms
RNAi Technique
Competing Interests Statement
We thank Laura Lazzeroni for helpful discussions. This work was supported by grants from NIH GMS and NCRR to S.K.K. and by the NSF (DMS-0072445) to A.B.O. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1125403.
4 Corresponding authors. E-MAIL art{at}stat.stanford.edu; FAX (650)725-8977. E-MAIL kim{at}cmgm.stanford.edu; FAX (650)725-7739. [Supplemental material is available online at www.genome.org.]
Bhat, U.G., Raychaudhuri, P., and Beck, W.T. 1999. Functional interaction between human topoisomerase II Brown, P.O. and Botstein, D. 1999. Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21: 33-37.[CrossRef][Medline]
The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282: 2012-2018. Ceol, C.J. and Horvitz, H.R. 2001. dpl-1 DP and efl-1 E2F act with lin-35 Rb to antagonize Ras signaling in C. elegans vulval development. Mol. Cell 7: 461-473.[CrossRef][Medline] Chase, D., Serafinas, C., Ashcroft, N., Kosinski, M., Longo, D., Ferris, D.K., and Golden, A. 2000. The polo-like kinase PLK-1 is required for nuclear envelope breakdown and the completion of meiosis in Caenorhabditis elegans. Genesis 26: 26-41.[CrossRef][Medline]
Dyson, N. 1998. The regulation of E2F by pRB-family proteins. Genes & Dev. 12: 2245-2262.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95: 14863-14868.
Eisenmann, D.M., Maloof, J.N., Simske, J.S., Kenyon, C., and Kim, S.K. 1998. The
Ferguson, E.L. and Horvitz, H.R. 1989. The multivulva phenotype of certain Caenorhabditis elegans mutants results from defects in two functionally redundant pathways. Genetics 123: 109-121. Fraser, A.G., Kamath, R.S., Zipperlen, P., Martinez-Campos, M., Sohrmann, M., and Ahringer, J. 2000. Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408: 325-330.[CrossRef][Medline] Friedman, J.H. and Meulman, J.J. 2002. Clustering objects on susbets of attributes. Technical Report, Stanford University, Statistics. Gasch, A.P. and Eisen, M.B. 2002. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3: RESEARCH0059.1-RESEARCH0059.22.
Hagstrom, K.A., Holmes, V.F., Cozzarelli, N.R., and Meyer, B.J. 2002. C. elegans condensin promotes mitotic chromosome architecture, centromere organization, and sister chromatid segregation during mitosis and meiosis. Genes & Dev. 16: 729-742. Hauser, B.A., He, J.Q., Park, S.O., and Gasser, C.S. 2000. TSO1 is a novel protein that modulates cytokinesis and cell expansion in Arabidopsis. Development 127: 2219-2226.[Abstract] Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., et al. 2000. Functional discovery via a compendium of expression profiles. Cell 102: 109-126.[CrossRef][Medline] Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., and Barkai, N. 2002. Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31: 370-377.[CrossRef][Medline]
Jiang, M., Ryu, J., Kiraly, M., Duke, K., Reinke, V., and Kim, S.K. 2001. Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans. Proc. Natl. Acad. Sci. 98: 218-223. Kim, S.K. 2001. Http://C. elegans: Mining the functional genomic landscape. Nat. Rev. Genet. 2: 681-689.[CrossRef][Medline]
Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Wylie, B.N., and Davidson, G.S. 2001. A gene expression map for C. elegans. Science 293: 2087-2092. Kornfeld, K. 1997. Vulval development in Caenorhabditis elegans. Trends Genet. 13: 55-61.[CrossRef][Medline] Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.[CrossRef][Medline] Lu, X. and Horvitz, H.R. 1998. lin-35 and lin-53, two genes that antagonize a C. elegans Ras pathway, encode proteins similar to Rb and its binding protein RbAp48. Cell 95: 981-991.[CrossRef][Medline] Lund, J., Tedesco, P., Duke, K., Wang, J., Kim, S.K., and Johnson, T.E. 2002. Transcriptional profile of aging in C. elegans. Curr. Biol. 12: 1566-1573.[CrossRef][Medline]
Mateos, A., Dopazo, J., Jansen, R., Tu, Y., Gerstein, M., and Stolovitzky, G. 2002. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res. 12: 1703-1715.
Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H., Remington, K.A., et al. 2000. A whole-genome assembly of Drosophila. Science 287: 2196-2204. Nusse, R., van Ooyen, A., Cox, D., Fung, Y.K., and Varmus, H. 1984. Mode of proviral activation of a putative mammary oncogene (int-1) on mouse chromosome 15. Nature 307: 131-136.[CrossRef][Medline]
Oegema, K., Desai, A., Rybina, S., Kirkham, M., and Hyman, A.A. 2001. Functional analysis of kinetochore assembly in Caenorhabditis elegans. J. Cell. Biol. 153: 1209-1226. Pavlidis, P., Lewis, D.P., and Noble, W.S. 2002. Exploring gene expression data with class scores. Pac. Symp. Biocomput.: 474-485. Piano, F., Schetter, A.J., Morton, D.G., Gunsalus, K.C., Reinke, V., Kim, S.K., and Kemphues, K.J. 2002. Gene clustering based on RNAi phenotypes of ovary-enriched genes in C. elegans. Curr. Biol. 12: 1959-1964.[CrossRef][Medline] Reinke, V., Smith, H.E., Nance, J., Wang, J., Van Doren, C., Begley, R., Jones, S.J., Davis, E.B., Scherer, S., Ward, S., et al. 2000. A global profile of germline gene expression in C. elegans. Mol. Cell 6: 605-616.[CrossRef][Medline] Rocheleau, C.E., Yasuda, J., Shin, T.H., Lin, R., Sawa, H., Okano, H., Priess, J.R., Davis, R.J., and Mello, C.C. 1999. WRM-1 activates the LIT-1 protein kinase to transduce anterior/posterior polarity signals in C. elegans. Cell 97: 717-726.[CrossRef][Medline]
Rowley, A., Singer, R.A., and Johnston, G.C. 1991. CDC68, a yeast gene that affects regulation of cell proliferation and transcription, encodes a protein with a highly acidic carboxyl terminus. Mol. Cell. Biol. 11: 5718-5726. Roy, P.J., Stuart, J.M., Lund, J., and Kim, S.K. 2002. Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature 418: 975-979.[Medline]
Schaller, A., Martin, F., and Muller, B. 1997. Characterization of the calf thymus hairpin-binding factor involved in histone pre-mRNA 3' end processing. J. Biol. Chem. 272: 10435-10441. Solari, F. and Ahringer, J. 2000. NURD-complex genes antagonize Ras-induced vulval development in Caenorhabditis elegans. Curr. Biol. 10: 223-226.[CrossRef][Medline] Song, J.Y., Leung, T., Ehler, L.K., Wang, C., and Liu, Z. 2000. Regulation of meristem organization and cell division by TSO1, an Arabidopsis gene with cysteine-rich repeats. Development 127: 2207-2217.[Abstract]
Sterner, J.M., Dew-Knight, S., Musahl, C., Kornbluth, S., and Horowitz, J.M. 1998. Negative regulation of DNA replication by the retinoblastoma protein is mediated by its association with MCM7. Mol. Cell. Biol. 18: 2748-2757. Sugihara, T., Wadhwa, R., Kaul, S.C., and Mitsui, Y. 1999. A novel testis-specific metallothionein-like protein, tesmin, is an early marker of male germ cell differentiation. Genomics 57: 130-136.[CrossRef][Medline]
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. 2001. The sequence of the human genome. Science 291: 1304-1351. Walhout, A.J., Reboul, J., Shtanko, O., Bertin, N., Vaglio, P., Ge, H., Lee, H., Doucette-Stamm, L., Gunsalus, K.C., Schetter, A.J., et al. 2002. Integrating interactome, phenome, and transcriptome mapping data for the C. elegans germline. Curr. Biol. 12: 1952-1958.[CrossRef][Medline]
Wang, J. and Kim, S.K. 2003. Global analysis of dauergene expression in Caenorhabditis elegans. Development 130: 1621-1634.
Willert, K. and Nusse, R. 1998. Yeung, K.Y., Haynor, D.R., and Ruzzo, W.L. 2001. Validating clustering for gene expression data. Bioinformatics 17: 309-318. |