|
|
|
|
Vol. 12, Issue 10, 1582-1590, October 2002
METHODS
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how "functionally coherent" the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function.
| |
INTRODUCTION |
|---|
|
|
|---|
The availability of genomic sequence and genome-scale data
sets for expression, regulation, and proteomics is
shifting the focus of data analysis from individual genes to families
of genes. Frequently, the analysis of genome-scale experiments results
in the definition of gene groups. For example, gene expression (Eisen et al. 1998
), protein sequence (Altschul et al. 1990
, 1997
), deletion phenotypes (Winzeler et al. 1999
; Hughes et al. 2000
), and
yeast-2-hybrid screens (Uetz et al. 2000
) can all be used to produce
sets of related genes. Given a set of genes, it is important to
recognize if there is a common functional feature, or if the set is in
some way entirely novel. The large number of genes and their multiple functions prohibit easy manual assessment of common function. A
computational method that detects common function in a set of genes
would be useful, therefore, for assessing the significance of an
experimentally derived gene set and prioritizing those groups that
deserve follow-up. For example, such a method could be used to rapidly
screen large numbers of gene expression clusters and identify
functionally interesting ones.
The published literature contains virtually every important biological
development, and much of the literature is accessible in electronic
form
often as full text, and almost always in abstract form
(http://www.ncbi.nlm.nih.gov/PubMed/). Article abstracts about genes
can be exploited to predict biological function (Raychaudhuri et al.
2002
). We assert that the biological literature (here we use PubMed
abstracts) contains the necessary information for assessing whether a
group of genes represents a common biological function.
In this paper we propose a novel computational method, neighbor
divergence, that rapidly assesses whether a set of genes shares a
common biological function by automatic analysis of scientific text. It
requires only a corpus of articles relevant to all of the genes being
studied (e.g., all genes appearing on an expression array) and an index
associating the articles to appropriate genes. Such reference lists are
often available from genomic databases (Gelbart et al. 1997
; Cherry et
al. 1998
; Bairoch and Apweiler 1999
; Blake et al. 2002
) or
can be compiled automatically by scanning titles and abstracts of
articles for gene names (Jenssen et al. 2001
).
An alternative approach to assessing the functional coherence of a gene
group is to cross-reference it against predefined groups of related
genes that have been compiled automatically from the literature or by
manual annotation. Jenssen and colleagues used co-occurrence of gene
names in abstracts to create networks of related genes automatically
form literature (Jenssen et al. 2001
). They showed that those groups
were useful in gene expression analysis. The Gene Ontology (GO)
Consortium and Munich Information Center for Protein Sequences (MIPS)
provide vocabularies of function and assign the relevant terms to genes
from multiple organisms (Ashburner et al. 2000
; Mewes et al. 2000
).
Genes that are assigned the same term constitute a functional group of
genes. However, such resources may not be comprehensive and up to date
at any given time, and it is also laborious to maintain the vocabulary and the gene assignments. Our approach requires only a set of references associated with genes. It requires no precompiled lexicons of biological function, previous annotations, or co-occurrence in the
literature. It is kept current and up to date if it is provided a
current literature base. Furthermore, this method can be applied to any
arbitrary set of genes, as long as an index of gene-article
associations is provided.
Recognizing coherent gene groups from the literature is a difficult problem because some genes have been extensively studied, whereas others have only been recently discovered. In addition, most genes have multiple functions. The literature about genes reflects these differences. A given gene may have many relevant documents or none, and the documents about it may cover a wide spectrum of functions. Consequently, the available text can skew performance of text analysis algorithms. However, individual articles tend to address functions very specifically; it is this specificity that we exploit in our approach.
We use statistical natural language processing (NLP) methods to access
and interpret biological text (Manning and Schütze 1999
). Statistical NLP techniques have already been shown to
be useful in annotating individual genes (Tamames et al. 1998
;
Eisenhaber and Bork 1999
; Fleischmann et al. 1999
;
Raychaudhuri et al. 2002
), determining gene or protein interactions
(Blaschke et al. 1999
; Thomas et al. 2000
; Jenssen et al. 2001
;
Stephens et al. 2001
), and assigning keywords to genes or groups of
genes (Andrade and Valencia 1997
; Shatkay et al. 2000
; Masys
et al. 2001
).
The intuition behind neighbor divergence involves recognizing articles that are about the function represented in the group. If a group of genes shares some specific function, such as "autophagy", an article germane to that function will refer to at least one of the genes in the group. Furthermore, other similar articles that pertain to the same function will tend to refer to the same gene or to other genes in the group.
Neighbor divergence assigns a functional coherence score to a group of
genes on the basis of the literature. It uses semantic neighbors; two
articles are semantic neighbors if there is similar word usage in each
of them (Manning and Schütze 1999
). First, semantic
neighbors are precomputed for each article in the corpus. Given a gene
group, each article's relevance to the group is scored by counting the
number of neighbors that have references to genes in the group. If the
group represents a coherent biological function, articles that discuss
that function will have many referring neighbors and therefore will
score high (see Fig. 1). Articles that
address biological functions that are irrelevant to the group function will score low. If there are many high-scoring articles, the group likely represents genes with shared function. Neighbor divergence determines whether a function is represented in a gene group from the
distribution of article scores. Specifically, the neighbor divergence
measure of functional coherence of a gene group is an
information-theoretic measure of the difference between the empirical
distribution of article scores and a theoretical distribution of scores
that would be expected with a noncoherent group of genes.
|
To evaluate neighbor divergence and to compare it with other approaches, we used 19 groups of yeast genes, each representing a different function. We also devised 1900 decoy random yeast gene groups. We tested methods by scoring all groups. A good method should assign high scores to functional groups and low scores to random groups. We report the percentile of the functional groups relative to the 1900 groups as a measure of success; a score that exceeds all random group scores is in the 100th percentile. Also, we calculate the precision and recall of a method at different score cutoff levels. The precision is the number of functional groups scoring above the cutoff divided by the number of total groups scoring above the cutoff. The recall is the number of functional groups scoring above the cutoff divided by the total number of functional groups. A good method achieves 100% recall at 100% precision.
We also examined how removing legitimate genes and replacing them with irrelevant genes in the gene group affects the score. If the score falls off monotonically, then the score is well behaved and even partial groups have some signal. The neighbor divergence method can then also be used to refine gene groups, by adding and replacing genes to increase the functional coherence score.
Gene expression clustering algorithms generate a large number of
clusters, many of which are spurious. We tested our method's ability
to recognize 10 yeast gene expression clusters that were manually
recognized by investigators as representative of a common function
(Eisen et al. 1998
). This is a real-world test of the sensitivity of
neighbor divergence in detecting meaningful groupings derived from
experimental data.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Gold Standard for Benchmarking Method Performance
To assess the performance of neighbor divergence and other methods,
we selected 19 functional yeast gene groups as a gold standard that
were defined by an independent body (Ashburner et al. 2000
). These
groups varied in size and content (Table
1A). This diversity is representative of
gene groups that experimental procedures may derive. Also, many of the
genes were members of more than a single functional gene group (Table
1B), which underscores the multiple functionality that many genes have.
We created 1900 random yeast gene groups as a negative set. This may be
a poor negative set because experimentally derived gene sets are rarely completely random. However, this set is sufficient to use in comparing the different methods and establishing a performance baseline for
neighbor divergence.
|
|
Performance of Neighbor Divergence
Neighbor divergence achieves 79% recall (15 of 19 functional groups) at 100% precision; this is equivalent to 79% sensitivity at 100% specificity. In Figure 2 we have plotted the precision and recall at different cutoff levels for neighbor divergence and other methods for comparison. Because the cutoff score is selected to be more stringent, some functional groups are not obtained and therefore recall is lower. However, most random groups fail to make the cutoff and the precision is higher. In Table 2 we have listed the percentile of the score assigned by the method for the different functional groups relative to the 1900 random groups. Neighbor divergence assigned 15 of the 19 functional groups scores that exceeded all of the 1900 random groups; another 3 functional groups had scores exceeding 98% of the random groups (Table 2).
|
|
Neighbor divergence performance is robust to different size gene groups. Smaller groups usually contain fewer genes, fewer articles, and consequently are more difficult to discover. Despite that, neighbor divergence is able to assign relatively high scores to these groups (Table 2).
In Figure 3 we have plotted the distribution of neighbor divergence scores for the 1900 random gene groups and the 19 functional gene groups. Although there is some overlap, most functional groups have scores that are about an order of magnitude higher than the highest score assigned to a random gene group.
|
Calculating Neighbor Divergence Scores With Article Score Distributions
The neighbor divergence measure of functional coherence in a gene group is a measure of the disparity between the empirical distribution of article scores and a theoretical distribution of article scores. We use a Poisson distribution to approximate this theoretical distribution of article scores for a noncoherent gene group. As an example, we have scored all of the articles against one functional gene group and plotted the resulting empirical distribution of scores (see Fig. 4A). If the score distribution is different from the Poisson, then the gene group likely represents a biological function. The log ratio of probability in both distributions is plotted for each article score in Figure 4B. Very high scoring articles are relevant to the group's function and are overrepresented relative to the Poisson distribution.
|
Performance of Naïve Word Divergence Method
For purposes of comparison, we developed and tested a naïve word divergence method that is based on an intuitive statistical NLP strategy. Abstracts are divided into those that refer to group genes and those that do not. A probability distribution of words in abstracts referring to group genes is calculated from counts and compared with the distribution of words in the other articles. Word divergence is an information-theoretic measure of the disparity between the two word distributions. If a subset of rare words is used significantly more inside the group than it is outside the group, then these words may be indicative of some biological function within the gene group. Therefore, word divergence should be sensitive to the presence of biological function in the gene group.
Word divergence only achieves 10.5% recall (2 of 19 functional groups) at 8.3% precision on the same data set (Fig. 2); this is equivalent to 10.5% sensitivity at 98.9% specificity. This method performs relatively poorly. Although an individual article may address a single aspect of a gene's function, different articles referring to the same gene may discuss many different biological functions (Fig. 1). Consequently, pooling all of the articles referring to a gene results in an uninformative distribution of words. If all articles written about a gene addressed the same function, this method would have been more successful.
Performance of Other Article-Scoring Approaches
The best article score and best article p-value are similar to neighbor divergence in that all articles are scored for relevance against the gene group by counting the number of referring semantic neighbors. In these methods, however, only the single "best" article score is used as a score for the group. These methods perform better than word divergence because they do not combine signals from many different articles, but rather consider the articles individually. Best article score achieves 58% recall at 8.3% precision (93.7% specificity), and best article p-value performs comparably, achieving 58% recall at 9.1% precision (94.2% specificity) (Fig. 2).
These methods search for articles that have semantic content that is relevant to the group. The advantage of this approach is that articles are treated as individuals. This approach is more appropriate for the problem because genes are often multifaceted, but scientific articles tend to be focused on the subject they are addressing. The best article score method is limited because large groups would be expected to have larger scores on average. To correct for this, we have tried computing a p-value for the best score instead. The p-values seem to overcompensate for larger groups, however.
Both methods are limited by their use of scores of only a single article; this ignores other high-scoring articles that should be abundant if the gene group represents a function. The neighbor divergence method relies on the referring neighbor principle also, but in contrast obtains greater statistical power by considering the scores of all articles and not just the extreme-valued ones.
Performance of Neighbor Divergence Without Filter
Abstracts referring to well-studied genes often have semantic neighbors that refer to the same gene. If such a gene is in the group, an abstract referring to the gene may receive a spuriously high score because many of its article neighbors refer to the same gene. That abstract may not, however, be relevant to the group function. A sufficient number of such high-scoring abstracts can increase the neighbor divergence score. To reduce potential false-positive gene groups produced by this effect, our implementation of neighbor divergence includes a filter in determining the semantic neighbors. When calculating semantic neighbors for an article, only articles that refer to different genes are considered. Without the filter (neighbor divergence-no filter), performance is reduced to 68% recall at 100% precision (Fig. 2).
Understanding the Gene Group's Function
Neighbor divergence determines whether a group of genes has a
coherent function. It does not tell us the function. Because all of the
articles are scored by neighbor divergence for a given gene group, the
easiest way to determine a group's function is to examine the
higher-scoring articles manually or automatically. For example, in the
ion homeostasis functional group, the highest scoring article is titled
"Resolution of subunit interactions and cytoplasmic subcomplexes of
the yeast vacuolar proton-translocating ATPase" (Tomashek et al.
1996
). The highest scoring article for the autophagy gene group is
titled "Structural and functional analyses of APG5, a gene
involved in autophagy in yeast" (Kametaka et al. 1996
). Both of these
articles contain clues to the nature of the gene group. These and other
high-scoring articles indicate the shared function. The high-scoring
articles could be collected and examined manually to determine group function.
Alternatively, keywords for the group that describe the function of the
group could be determined automatically. Investigators have already
developed algorithms to find keywords in collections of documents that
could be applied to these high-scoring articles to determine functional
keywords (Andrade and Valencia 1997
).
Corrupting Functional Groups
We examined the robustness of the scores to removal of genes and replacement with random genes. As this procedure is conducted, scores slowly decrease. About half of the genes for the two functional groups examined can be removed while still maintaining a reasonably strong signal (see Fig. 5). Incomplete gene functional sets can be detected, although their scores will be lower. Therefore, partial functional groups derived from experimental screens are still discernable.
|
Furthermore, the more representative a gene group is of a specific function, the greater the neighbor divergence score. This indicates that, as scores are optimized by addition and removal of genes, more ideal functional gene groups can be obtained. There is then the possibility of using neighbor divergence in bioinformatics algorithms to automatically define gene groups in the context of experimental data.
Application of Functional Coherence Scoring to Manually Labeled Gene Expression Clusters
Eisen and colleagues (1998)
collected expression measurements on
yeast genes under 79 diverse conditions. They used a hierarchical clustering algorithm to identify groups of genes with coherent gene
expression patterns. A few of the gene clusters contained many genes
with similar function. These published clusters were manually
identified and labeled with a summary label. We hypothesized that our
method could rapidly identify the functionally coherent groups of
genes. We reevaluated the functional coherence of these clusters
automatically with neighbor divergence. Our results are presented in
Table 3. We found that 7 of the 10 clusters
had very high functional coherence scores.
|
For three of the clusters, the functional coherence score was poor. The
"spindle pole body assembly and function" cluster contained 11 yeast genes; we found that only 3 of these genes were among the 32 listed "spindle pole" genes in the Comprehensive Yeast Genome
Database (CYGD) (Mewes et al. 2000
). Similarly, the "mitochondrial
ribosome" cluster contained 22 genes; only 10 of these genes were
among the 49 "mitochondrial ribosome" genes listed by CYGD. Also,
the "mRNA splicing" cluster contained 14 genes; we found only 3 of
these genes among the 38 listed "mRNA splicing" yeast genes in
CYGD. Many of the genes in these clusters do not represent the
annotated function. Although these clusters are suggestive, they are
not coherent functional groups based on our scoring criteria; they
contain less than half of the genes with the reported function.
Accordingly, the functional coherence scores are low. In fact, it may
be that these clusters represent a novel association of genes that
should be pursued and validated for their functional implications.
Future Directions
There is growing interest in enhancing biological data analysis by
using the published literature as a knowledge source to guide
bioinformatics algorithms. Inclusion of literature has been shown to
directly augment biological data analysis, such as sequence homology
searches, sequence-based assignment of cellular compartment, and gene
expression analysis (MacCallum et al. 2000
; Shatkay et al. 2000
; Chang
et al. 2001
; Jenssen et al. 2001
; Stapley et al. 2002
). Many analytical
approaches, such as those based on supervised and unsupervised machine
learning, aim to define groups of genes based on patterns in
experimental data (Raychaudhuri et al. 2001
). Neighbor divergence can
be a critical piece in connecting such data analysis algorithms to the
scientific literature. New algorithms can be written that search for
groups with consistent signal in the experimental data that also have
high functional coherence. For example, a clustering algorithm can be
rewritten to identify groups of genes with similarities in expression
and also similarities in function as assessed from the literature; the
solution is to modify the objective function in gene group searches to
include similarity of the literature for a group as well as
experimental similarity. The neighbor divergence score may have other
applications in defining new functional groups, annotating genes, and
organizing genes in a functional hierarchy.
The work that we have presented here is limited in that it only uses
article abstracts and not the whole text of articles. A more complete
implementation of this method would leverage the full text of articles;
these are now becoming available on line (Roberts et al. 2001
). Our
method relies on abstracts focusing on specific subjects. Inclusion of
full text articles will probably be most effective if the text is
broken into smaller, more specific semantic units, perhaps individual paragraphs.
| |
METHODS |
|---|
|
|
|---|
Neighbor Divergence Algorithm
Data Types: Document Corpus and Reference List
The neighbor divergence calculation for a gene group requires a corpus of documents relevant to all genes in the organism and a reference list indicating the articles that are germane to each gene. Here, all documents are PubMed abstracts. Only the title and abstract fields in the PubMed records are used. From these documents, we find unique tokens by tokenizing on white space, punctuation, and common nonalphanumeric characters such as hyphens and parentheses. Only those tokens that were present in >4 abstracts and <10,000 abstracts were considered as vocabulary words. Abstracts are converted into vectors of word counts in which each dimension represents a specific word.Identifying Semantic Neighbors for Corpus Articles
For each article, the k most similar articles, including the original article, are precomputed. Here we use k = 20. To quantify the similarity between two documents, we used the cosine between the two weighted document word vectors. Word vectors of articles were first converted into inverse document frequency-weighted word vectors (Manning and Schütze 1999
|
Scoring Article Relative to Gene Groups
Given a gene group, neighbor divergence then assigns a score, Si, to each article i. The score is a count of semantic neighbors that refer to group genes. Groups that represent genetic functions will induce many articles to have high scores. Practically, most articles in the data set refer to multiple genes rather than a single one. Neighboring articles with some genes referring to gene groups are counted fractionally.
|
|
Calculating a Theoretical Distribution of Scores
If the gene group has no coherent functional structure, the semantic neighbors of any given article should refer to group genes independently with a probability q. Each of these trials should be independent. A Poisson distribution estimates this distribution accurately for small values of q. In this case:
|
= 20*q. For a given gene group, we estimate
q, the fraction of articles referring to group genes, by
summing all of the fractional references, fr, of all articles
and dividing by the number of articles, N.
Quantifying the Difference Between the Empirical Score Distribution and the Theoretical One
An empirical distribution of the article scores is computed for the gene group. If the gene group contains no functional coherence, the distribution of scores should be similar to the Poisson distribution. The functional coherence of the gene group is scored as the Kullback-Leiber (KL) divergence between the empirical distribution and the Poisson distribution.KL Divergence
To quantify the difference between two distributions, we use KL divergence or relative entropy (Manning and Schütze 1999
|
Other Methods to Score Functional Coherence
Word Divergence: A Baseline Method for Comparison
As a baseline, we test an alternate method, word divergence. This method requires calculation of two distributions of words. The first distribution is computed from words in abstracts referring to genes within the group; counts of each word are divided by the total number of words these abstracts. A second distribution is computed similarly for all abstracts referring to genes outside the group. Both distributions are smoothed with Dirichlet priors, assuming 300 prior words distributed according to a baseline distribution; the baseline distribution of each word is computed by dividing its count in all abstracts by the total count of all words in all abstracts. The KL divergence of these two distributions of words is then computed as a measure of functional coherence; the gene group distribution is treated as the observed distribution.Best Article Score and Best P-Value
These scoring schemes are also based on scoring articles against gene groups as describedearlier. Here, we used the highest article score as a measure of a gene group's functional coherence (best article score). In a different approach, we used the negative log of the p-value for the best article score (best article p-value). To calculate the p-value of an article, we use the Poisson distribution. The p-value of an article is the summed probability of an article having equal or more referring neighbors than it has.Neighbor Divergence-No Filter
This method is identical to neighbor divergence except the filter applied in selection of semantic neighbors is not used.Evaluation
Data Types
All experiments described in the following section are conducted in Saccharomyces cerevisiae. We used a reference list that contained PubMed abstract references to yeast genes from the Saccharomyces Genome Database (Cherry et al. 1998Assembling the Functional Gene Groups
To test our method, we assembled gold standard functional gene groups from GO (Ashburner et al. 2000Assembling the Decoy Random Gene Groups
We assembled 1900 random gene groups as decoy gene groups. For each gold standard functional gene group, 100 random gene groups of the same size were created.Evaluating Methods to Identify Common Biological Function
In this study we evaluated five different methods to score the functional coherence of a gene group: (1) word divergence, (2) best article score, (3) best article p-value, (4) neighbor divergence, and (5) neighbor divergence-no filter. Each method was used to score the 1900 decoy gene groups and the 19 functional gene groups. The percentile for the score of each of the 19 functional groups relative to the 1900 random gene groups was calculated. Also, for different cutoff scores, precision and recall values were calculated for the gene groups.Corruption Studies
For two of the gene groups, "ion homeostasis" and "autophagy", we sequentially removed genes in random order and swapped in other genes. This process was repeated until only one original gene remained. Neighbor divergence score was calculated after each swap. This procedure was repeated 10 times, and the results were averaged together.
Computation
PubMed database queries and data preprocessing were implemented
using perl (Schwartz and Christianson 1997
), Python (Lutz and Ascher 1999
), and the biopython toolkit
(www.biopython.org). All mathematical computations were performed with
Matlab (Mathworks).
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://www.ncbi.nlm.nih.gov/PubMed/; of open-source bioinformatics Python modules.
http://www.biopython.org/; repository.
| |
ACKNOWLEDGMENTS |
|---|
R.B.A. is supported by NIH LM06244, GM61374, NSF DBI-9600637, and a grant from the Burroughs-Wellcome Foundation; S.R. is supported by NIH GM-07365. The authors also thank Kara Dolinski of SGD for providing a curated data set of yeast gene associated articles, and Patrick D. Sutphin, Joshua M. Stuart, and Meenakshy Chakravorty for assistance in manuscript preparation.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Corresponding author.
E-MAIL russ.altman{at}stanford.edu; FAX (650) 725-7944.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.116402.
| |
REFERENCES |
|---|
|
|
|---|
enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons.
Bioinformatics
16:
125-129Received January 29, 2002; accepted in revised form August 12, 2002.
This article has been cited by other articles:
![]() |
M. Kankainen, G. Brader, P. Toronen, E. T. Palva, and L. Holm Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana Nucleic Acids Res., October 6, 2006; 34(18): e124 - e124. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Santos, D. Eggle, and David. J. States Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction Bioinformatics, April 15, 2005; 21(8): 1653 - 1658. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Herrero, J. M. Vaquerizas, F. Al-Shahrour, L. Conde, A. Mateos, J. S. R. Diaz-Uriarte, and J. Dopazo New challenges in gene expression data analysis and the extended GEPAS Nucleic Acids Res., July 1, 2004; 32(suppl_2): W485 - W491. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Wilkinson and B. A. Huberman A method for finding communities of related genes PNAS, April 6, 2004; 101(suppl_1): 5241 - 5248. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Karaoz, T. M. Murali, S. Letovsky, Y. Zheng, C. Ding, C. R. Cantor, and S. Kasif Whole-genome annotation by using evidence integration in functional-linkage networks PNAS, March 2, 2004; 101(9): 2888 - 2893. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Raychaudhuri, J. T. Chang, F. Imam, and R. B. Altman The computational analysis of scientific literature to define and recognize gene expression clusters Nucleic Acids Res., August 1, 2003; 31(15): 4553 - 4560. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Albert, S. Gaudan, H. Knigge, A. Raetsch, A. Delgado, B. Huhse, H. Kirsch, M. Albers, D. Rebholz-Schuhmann, and M. Koegl Computer-Assisted Generation of a Protein-Interaction Database for Nuclear Receptors Mol. Endocrinol., August 1, 2003; 17(8): 1555 - 1567. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. G. Troyanskaya, K. Dolinski, A. B. Owen, R. B. Altman, and D. Botstein A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) PNAS, July 8, 2003; 100(14): 8348 - 8353. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||