|
|
|
|
Vol. 11, Issue 7, 1227-1236, July 2001
METHODS
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
We have developed a statistical regression modeling approach to
discover genes that are differentially expressed between two predefined
sample groups in DNA microarray experiments. Our model is based on
well-defined assumptions, uses rigorous and well-characterized statistical measures, and accounts for the heterogeneity and genomic complexity of the data. In contrast to cluster analysis, which attempts
to define groups of genes and/or samples that share common overall
expression profiles, our modeling approach uses known sample group
membership to focus on expression profiles of individual genes in a
sensitive and robust manner. Further, this approach can be used to test
statistical hypotheses about gene expression. To demonstrate this
methodology, we compared the expression profiles of 11 acute myeloid
leukemia (AML) and 27 acute lymphoblastic leukemia (ALL) samples from a
previous study (Golub et al. 1999
) and found 141 genes differentially
expressed between AML and ALL with a 1% significance at the genomic
level. Using this modeling approach to compare different sample groups
within the AML samples, we identified a group of genes whose expression
profiles correlated with that of thrombopoietin and found that genes
whose expression associated with AML treatment outcome lie in recurrent
chromosomal locations. Our results are compared with those obtained
using t-tests or Wilcoxon rank sum statistics.
| |
INTRODUCTION |
|---|
|
|
|---|
The development of oligonucleotide microarray
technologies allows scientists to monitor the mRNA transcript levels of
thousands of genes in a single experiment. Indeed, several groups have
already begun to simultaneously examine the expression profiles of
entire genomes for organisms such as yeast whose complete DNA sequences are known (Lashkari et al. 1997
; Chu et al. 1998
; Spellman et al. 1998
;
Ferea et al. 1999
). This power of examination and discovery moves well
beyond the traditional experimental approach of focusing on one gene at
a time. Nevertheless, the tremendous amount of data that can be
obtained from microarray studies presents a challenge for data analysis
(Brent 2000
).
At present, the most commonly used computational approach for analyzing
microarray data is cluster analysis. Cluster analysis groups genes or
samples into "clusters" based on similar expression profiles and
provides clues to the function or regulation of genes or similarity of
samples via shared cluster membership (Tamayo et al. 1999
; Tavazoie et
al. 1999
; Gaasterland and Bekiranov 2000
). Several clustering methods
have been usefully applied to analyzing genome-wide expression data and
can be classified largely into three categories. The tree-based
approach uses distance measures between genes such as correlation
coefficients to group genes into a hierarchical tree (Eisen et al.
1998
). The second category clusters genes so that within-cluster
variation is minimized and between-cluster variation is maximized
(Tamayo et al. 1999
; Tavazoie et al. 1999
). The third category groups
genes into blocks, in which the correlation is maximized and between
which the correlation is minimized (Ben-Dor et al. 1999
).
The power of cluster analysis for microarray studies lies in
discovering gene transcripts or samples that show similar expression profiles. Examples include identification of transcripts that appear to
be coregulated over a time course (Chu et al. 1998
; Spellman et al.
1998
), or uncovering previously unknown sample groupings (Alon et al.
1999
; Alizadeh et al. 2000
). However, identification of "like"
groups is not necessarily the objective in a microarray study. For
example, microarrays present a high-throughput method to discover genes
that are differentially expressed between predefined sample groups,
such as normal versus cancerous tissues (Alon et al. 1999
; Coller et
al. 2000
). Cluster analysis is not a sensitive method for this type of
study because it focuses on group similarities, not differences within
each individual gene. Furthermore, clustering algorithms such as those
listed above are also unable to take advantage of preexisting knowledge
of the data, such as the sample groupings.
The technique that has been most commonly applied for group comparisons
from microarray studies is to simply look for genes with a twofold or
higher difference between the mean intensities for each group (DeRisi
et al. 1997
). However, relative mean comparisons fail to account for
sample variation, may require ad hoc data manipulation (e.g., to avoid
divide-by-zero errors), and ignore the fact that differences in
expression level of <100% can exert meaningful biological effects.
Indeed, scientists would rarely use similar criteria when focusing
their analysis on a single gene, such as comparing a panel of Northern
blots or enzymatic assays between healthy and cancer tissue samples.
Classic statistical approaches used for detecting differences between
two groups include the parametric t-test and the nonparametric Wilcoxon rank sum (Snedecor and Cochran 1980
). Recently, the
t-test was used to compare expression profiles in microarray
experiments (Arfin et al. 2000
; Tanaka et al. 2000
). One must bear in
mind three important issues when applying such standard statistical tests to microarray data analysis. First, the t-test assumes
normality and constant variance for every gene across all samples.
These assumptions are certainly inappropriate for a subset of genes despite any given transformation. Second, these tests cannot take advantage of the genomic data when correcting for heterogeneity between
samples. Third, it is essential to correct for the high false-positive
rate resulting from multiple comparisons. Otherwise, if a typical
P-value of 0.05 were used to signify differential expression
for individual genes between two groups, one would expect to find 50 positives for every 1000 genes under examination, even though none of
these genes are differentially expressed.
In this manuscript, we introduce a well-founded and robust statistical
procedure that compares the expression profiles of individual genes
between two sample groups while taking into consideration the
complexity of the genomic data. This methodology makes no distributional assumptions about the data and accounts for high false-positive error rate resulting from multiple comparisons. To
demonstrate the statistical modeling technique, we examined expression
profiles from 38 leukemia patients, 27 of whom were diagnosed with
acute lymphoblastic leukemia (ALL) and 11 of whom were diagnosed with
acute myeloid leukemia (AML) (Golub et al. 1999
). Our results are
compared with those obtained with the t-test or Wilcoxon rank sum.
The findings show that our statistical modeling approach provides a sensitive
and robust means to extract relevant information from DNA microarrays.
| |
RESULTS |
|---|
|
|
|---|
Methodology
The first step in our statistical analysis of oligonucleotide-array expression profiles is preprocessing and/or transformation of the data. In the present work this includes removal of the spiked oligonucleotide controls. The second step is to estimate correction factors for sample-specific heterogeneity, as well as for chip-specific heterogeneity, and to use these factors to normalize the data. The final step is to perform a regression analysis to estimate the relevant model parameters (equation 1 in Methods) for each gene transcript using robust statistical techniques. The results are ranked by the absolute value of the Z-score for each transcript. The higher the Z-score, the greater the confidence level that the corresponding gene is differentially expressed between the two groups.
Our methodology is implemented in a software program. Interested investigaors may contact L.P.Z. for details.
Multiple Comparisons
At issue when performing a large number of statistical tests is the
high occurrence rate of false positives resulting from the multiple
comparisons. To address this concern, we propose to raise the
statistical threshold for declaring a transcript differentially
expressed to ensure that the significance level is applicable on the
genomic scale. A conservative choice to adjust the significance is the
Bonferroni's correction, which divides the desired significance, for
example, 1% or P-value = 0.01, by the total number of
statistical tests performed. In this work, we calculated the
significance value (i.e., P-value) for each probe set using a
modified Bonferroni's correction as proposed by Hochberg (Hochberg
1988
) (see Methods for details).
Applying Bonferroni's correction to data from Affymetrix Hu6800 GeneChip oligonucleotide arrays, which contain 7070 noncontrol probe sets for 6817 individual genes, the adjusted significance level for each probe set is 0.01/7070. Assuming that the Z-score follows the normal distribution, the corresponding 1% significance threshold at the genomic level is a Z-score of 4.8. Alternatively, one may adjust the significance by the total number of genes rather than the total number of probe sets. However, different probe sets for the same gene may yield dissimilar results, and either level of correction results in a rounded Z-score of 4.8 at the 1% significance level.
Leukemia Study
A previous study examined mRNA expression profiles from 38 leukemia
patients (27 ALL and 11 AML) to develop an expression-based classification method for acute leukemia (Golub et al. 1999
). Affymetrix Hu6800 GeneChips were used in the study. The data set from
this study was ideal for illustrating our modeling technique as it
contains a large number of patients and has been well characterized (Golub et al. 1999
). Furthermore, there is a great deal of literature concerning leukemia from which we can assess the validity of our findings.
Our statistical modeling approach identified 141 probe sets that were
differentially expressed between AML and ALL with a Z-score of
4.8 or higher. Twenty-four of these were detected at higher levels in
AML and the remainder were expressed preferentially in ALL. Tables 1
and 2 list the
top 25 differentially expressed probe sets in either sample group.
These tables also include the corresponding P-values and
ordering of the statistics given to each probe set by t-tests
with either equal or unequal variance, and by the Wilcoxon rank sum. As
expected, the ranked significance given to each gene by any of the
statistical tests did not appear to correlate with either relative or
absolute mean expression level differences. Tables 1 and 2 show that
parametric t-tests under equal variances yielded rather
different test statistics and ordering than our modeling approach. In
contrast, the ordering of the probe sets by t-tests performed
assuming unequal variances was very similar to that obtained in our
regression analysis. Although t-tests are efficient under the
assumption of equal variances, the results of this analysis appeared
very sensitive to this assumption. In cases of discrepancies between
t-tests with unequal variances and Z-scores, the
latter are considered to be more robust because the assumptions of
homogeneous variances within groups and normality made by the
t-test may be violated. Note that the differences of
P-values between the two statistics are associated with
distributions; the t-distribution with heavy tails gives more
conservative values than the asymptotic normal distribution we used to
translate Z-scores to P-values. The Wilcoxon rank sum
failed to identify any genes as differentially expressed at the 1%
significance level. These findings are not surprising because
nonparametric statistics may be too robust to yield any significant results.
|
|
We next applied the statistical modeling method to examine expression
profiles within subgroups of the 11 AML patients. Thrombopoietin (TPO)
is the major cytokine responsible for the transition of myeloid
progenitors into megakaryocytes (Caen et al. 1999
), but also plays a
more general role in the differentiation of hematopoietic stem cells
into all types of progenitors (Kaushansky 1999
). Furthermore, TPO is
known to be expressed in a number of AML cell lines (Graf et al. 1996
).
We noticed a sharp delineation of TPO expression profiles between
patients 28, 30, 32, 34, 36, and 38 versus patients 29, 31, 33, 35, and
37 and therefore compared these patient groups using our statistical
modeling technique. This approach identified eight transcripts with a
Z-score >4.8, with TPO itself yielding the highest ranking
(Table 3). In contrast, neither
t-tests nor Wilcoxon rank sum identified any gene with a
genomic significance level of 1% (Table 3). Of the 15 highest ranking
mRNAs from our analysis, three of the corresponding gene products are
known to be influenced by or interact directly with TPO, two have not
been characterized heavily but are highly homologous to proteins that interact with TPO, and eight others are involved in myeloid
hematopoiesis. Although we have no evidence for any biological
significance of the patient groups used in this comparison other than
TPO transcript level, we noted that the groupings appear to fall along
the lines of samples with high or low percentage of blasts (see
http://www.genome.wi.mit.edu/MPR). Interestingly, TPO can stimulate the
proliferation of AML blasts (Motoji et al. 1996
; Luo et al. 2000
).
|
We next examined the association of gene expression with the success or
failure of treatment. Among the 11 AML patients, 6 patients did not
respond to treatment (patients 28-33) and five patients survived
(patients 34-38) (see www.genome.wi.mit.edu/MPR [Golub et al.
1999
]). The 25 transcripts with the highest Z-scores from the
comparison of these groups are listed in Table
4, five of which had a Z-score
greater than 4.8. As above, neither t-tests nor Wilcoxon rank
sum identified any genes as differentially expressed between these
groups at a 1% significance level (Table 4). We examined the
chromosomal locations of the corresponding genes because chromosomal
abnormalities are prevalent in leukemia and often have prognostic
implications (El-Rifai et al. 1997
; Rowley 2000
). Almost all of the
genes listed in Table 4 lie in regions that have been identified
previously to contain abnormalities in AML or other forms of leukemia.
Furthermore, three of the genes are encoded within 5q11-31, four are
in the 2q region, two are within 1q32-26, and two others are found at
6p12-p11 (Table 4). The identification of five "mini-clusters" of
chromosomal locales in the top 25 genes from a random pool of 6800+
genes is striking. Of note, the region 5q11-31 is frequently lost in
AML and known to influence prognosis (Shipley et al. 1996
; El-Rifai et
al. 1997
; Van den Berghe and Michaux 1997
). Furthermore, Set
(Li et al. 1996
) and HoxA9 (Lawrence et al. 1999
) are known
to play a role in AML progression, and COL4A4 (Verfaillie et
al. 1992
), thioredoxin (Nilsson et al. 2000
; Soderberg et al. 2000
),
caspase-8 (Pervaiz et al. 1999
), integrin beta5 (Feng et al. 1999
),
-tubulin (Hirose and Takiguchi 1995
), and SPS2 (Soderberg
et al. 2000
) may well contribute to the disease. Although it should be
kept in mind that clinical outcome is influenced by a number of
nongenetic factors, including patient age, time of diagnosis, and
treatment protocol, the above findings are promising for the discovery
of prognostic indicators using genome-wide microarray analysis.
|
| |
DISCUSSION |
|---|
|
|
|---|
The Z-scores we propose for testing differences of mean expression levels between two groups are connected closely with classical t-tests or Wilcoxon rank sum statistics, but it is important to realize that there are subtle differences between these tests. The t-test requires that expression levels be normally distributed and homogeneous within groups, and may also require equal variances between the groups. In contrast, the estimating equation technique we used to calculate Z-scores does not require any distributional assumptions or homogeneity of variances (see Methods for details). In practice, Z-scores are expected to be similar to t-test statistics, particularly those calculated assuming unequal variances, when the distribution of expression levels can be approximated by the normal distribution. When these assumptions are violated, Z-scores will differ from t-statistics and will be more reliable for making statistical inferences. On the other hand, the Wilcoxon statistic for two-group comparisons is nonparametric and thus robust. However, its power is reduced, which could be of concern in light of small sample sizes in typical array studies. Indeed, the Wilcoxon test did not detect any genes as differentially expressed between AML and ALL at the 1% genomic significance level (Tables 1 and 2, data not shown). Finally, there is no obvious method, besides ad hoc corrections of the expression values, to adjust for heterogeneity among samples when using the t-test or Wilcoxon statistics. The regression paradigm we propose provides a natural correction for heterogeneity using all expression values.
It is important to note that we analyzed the leukemia data without applying any questionable filtering methods to the Affymetrix data. For example, we did not subtract a background noise level from the data, rescale any values other than to correct for between-chip heterogeneity, or remove genes based on fluorescent signal intensities or Affymetrix present/absent calls. These filtering techniques may be required to make the strongest associations when clustering data or when calculating fold changes in means. However, ad hoc filtering could remove potential genes of interest, especially those with modest expression levels, and therefore reduce the power of discovery. For example, the difference of only a few transcripts to zero transcripts per cell may become undetectable after applying filtering techniques, but could nevertheless have a very real biological significance or present a considerable opportunity to target a cell specifically for therapeutic treatment. To illustrate this point, we note that TPO was called absent by the Affymetrix software in every sample in the leukemia data set. Nevertheless, by dichotomizing the AML samples along the lines of TPO expression values, we were able to uncover a group of proteins that interact directly with TPO or perform similar cellular functions.
Another distinct advantage of statistical modeling is that these tests
take advantage of the random variations (i.e., "noise") in the
data. For example, the mean expression level of activation-induced C-type lectin (AICL) was threefold higher in AML than ALL, and the
absolute mean difference was substantial at 826 units. Considering that
AICL is expressed in a variety of hematopoietic-derived cell lines
(Hamann et al. 1997
), one might reasonably conclude that AICL was
indeed overexpressed in AML based on this evidence. However, our
modeling approach gave AICL a Z-score of 0.91. This apparent discrepancy is explained by the fact that one of the AICL samples in
the AML set had an intensity value more than fivefold higher than any
other. Excluding just this one sample, the relative and absolute mean
differences for AICL between AML and ALL were 1.3-fold and
94 ± 216, respectively. Clearly, simple comparisons of fold changes are insufficient for drawing proper conclusions.
Our modeling approach can be extended. First, we can incorporate
nonlinear models or apply other transformations to the observed expression levels to account for nonlinearity in fluorescent intensity. Second, the model (equation 1 in Methods) can be extended naturally to
incorporate additional covariates. For example, in a clinical study of
multiple patients, one may be interested in assessing the association
of expression profiles with several clinical variables. Third, one may
extend the model (equation 1) by incorporating nonparametric smoothing
function for a continuous covariate, for example, in the assessment of
nonlinear dose-response relationship. Fourth, as our knowledge
accumulates about the genetic regulatory circuitry of multiple genes,
we may be able to formulate a functional relationship among genes, via
postulating a "high-level" model for regression coefficients
(
) = (
1,
2,...,
J)
and
(
) = (
1,
2,...,
J),
in which
could be a common set of parameters characterizing the
entire genetic regulatory circuitry. One may then test how well such a
genetic circuitry model fits the data using estimating equations.
The main limitation of the current approach is associated with the calculation of P-values. As noted earlier, a Z-score of 4.8 is chosen to ensure that the genome-wide significance is controlled at 1% for the Affymetrix 6800 GeneChips. However, the calculation of the corresponding P-value relies on the asymptotic normal distribution for Z-scores. With small to modest sample sizes this normality may be questionable, and such a threshold value is overly conservative. Currently, we are developing simulation-based methods to evaluate the exact significance level. It is also important to note that for the purpose of discovery science with small sample sizes, the Z-score 4.8 threshold value should be treated as a tentative guideline. In the context of testing associations with a specific candidate gene, the accepted threshold value to ensure the false-error rate of 1% for a single gene is a Z-score of 2.58. Finally, we note that the Bonferroni's correction or modifications thereof do not take into account covariation of gene expression levels, resulting in conservative estimates for the P-values. Our future research will improve on Bonferroni's correction by acknowledging expression dependencies among genes.
The capability of simultaneously assessing the expression of thousands of gene transcripts provides an opportunity of monitoring cellular activity at the genomic level. We can therefore begin addressing complex pathways of basic physiology and disease etiology, the foundation of functional genomics. The development of the statistical method described here provides a tool for researchers to pursue functional genomics systematically and rigorously. Modeling can also be used to aid the design of efficient and robust functional genomic studies, and to develop methods that estimate sample sizes and powers required for expression studies. The use of rigorous statistical tools will help functional genomic studies yield much-needed information in understanding human biology and pathology.
| |
METHODS |
|---|
|
|
|---|
Leukemia Study
The Affymetrix 6800 GeneChip oligonucleotide arrays contain a
combined total of 7070 oligonucleotide probe sets (excluding controls)
for 6817 individual genes. Investigators at the Massachusetts Institute
of Technology gathered blood samples from 38 leukemia patients (27 ALL
and 11 AML) and used Affymetrix Hu6800 GeneChip oligonucleotide arrays
to assess gene expression profiles for each patient (Golub et al.
1999
). We used the training data set exclusively in this work.
Experimental protocols used to perform the microarray analysis and the
data values obtained are available to the public at
(http://waldo.wi.mit.edu/MPR/pubs.html).
Regression Model
An array of gene expression profiles may be conceptualized as a vector of
outcomes. Let Yk = (Y1k,
Y1k,..., YJk)`
denote the array, where Yjk denotes the expression
of the jth gene in the kth sample
(j = 1,2,...,J;
k = 1,2,..., K). Let xk denote a covariate associating with each kth sample. For
example, xk = 1 for the presence of a marker gene
and xk = 0 for its absence. We propose a
regression model for the expression level of the jth gene in
the kth sample:
|
(1) |
k,
k) are the
sample-specific additive and multiplicative heterogeneity factors,
respectively, and
jk is a random variable
reflecting variation due to sources other than the one identified by
the known covariate and the systematic heterogeneity between samples.
Because xk is binary, aj measures the mean expression level of the jth gene in normal samples
(xk = 0), and bj
measures the difference of averaged expression levels of the
jth gene between the two sample groups.
The heterogeneity factors,
(
k,
k), are introduced to
account for variations in preparing multiple mRNA samples. Such
corrections have been well conceived in comparing two samples. Under
the null hypothesis of no overall differential expression between these
two samples, one can adjust this heterogeneity by normalizing the
sample data to fall on the diagonal line, a common technique (Wodicka
et al. 1997
). An intercept may also be estimated to ensure the
numerical stability. If the intercept is different from zero, the
diagonal line is adjusted to compensate. Formalizing this correction,
one may assume that typical genome-wide expression patterns are stable,
and hence may use a linear model, µjk =
k +
kaj,
to characterize average expression values for every gene in every
sample. These heterogeneity factors are then estimated via the weighted
least square method (Carroll and Ruppert 1988
). Estimated heterogeneity
factors are used to adjust the observed expression level as

The random variation,
jk, is used to depict
variations due to all unknown sources. Specifically, this variation may
be associated with sampling preparations, cross-hybridization of genes,
or other anomalies on microarrays. The stochastic distribution of these
random variations is typically unknown and is unlikely to follow any
familiar distributions, such as the normal distribution. Hence, no
distribution assumption is made.
Analytic Strategy
The first step in the statistical analysis of oligonucleotide-array
expression profiles is preprocessing of the data, which includes
elimination of control genes and transformation of the data (e.g.,
logarithmic transformation) as desired. The second step is to examine
heterogeneity among samples by estimating additive and multiplicative
heterogeneity factors,
(
k,
k). The estimate is
obtained via minimizing the weighted least square,
j,k (Yjk
k
kaj)2w
). The weight is chosen so that the contribution of every gene is
standardized between 0 and 1. Consequently, the above weighted least
square equals the number of genes when samples are homogeneous. The
estimated parameters (
k,
k) are used to correct the data. Because we do not impose distributional assumptions about residuals, the third step is to use the weighted least square (Huber 1967
) to estimate gene-specific parameters (aj, bj) in the model (equation 1). The
corresponding robust standard errors for each gene are calculated using
estimating equation theory (Godambe 1960
; Liang and Zeger 1986
;
Prentice and Zhao 1991
). Z-scores for each gene are computed
as the ratio of mean difference between the two groups for each gene,
bj, over the standard error for the corresponding
gene, S.E.j.
Statistical Significance and Multiple Comparisons
To measure the significance of the findings, we translated
Z-scores into P-values under asymptotic normality. To
address the multiple comparison issue, we adjusted the threshold for
declaring genes differentially expressed using a modified Bonferroni's
correction proposed by Hochberg (1988)
. The Hochberg stepdown method
divides the P-values by the total number of comparisons with
equal or lesser test statistics; for 7070 probe sets, the 1% genomic
significance level for the probe set with the highest test statistic is
0.01/7070, the genomic significance threshold for the probe set with
the second highest test statistic is 0.01/7069, etc.
t-test and Wilcoxon Rank Sum Test
The t-test and Wilcoxon rank sum test were performed after correcting the data for heterogeneity using our regression approach. t-tests were performed assuming both equal and unequal variances between the sample groups. The functions used were those built into MATLAB (MathWorks). The P-values derived from these tests were adjusted using the modified Bonferroni's correction described above.
| |
ACKNOWLEDGMENTS |
|---|
We thank Tracy Bergemann, Chun Cheng, Robert Eisenman, and Jerry
Radich for comments on this manuscript. We also thank T.R. Golub and
colleagues at MIT for making their excellent AML/ALL data set (Golub et
al. 1999
) available in the public domain. This work was supported by
National Institute of Health grants HG02283, GM58897, and CA53996.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL lzhao{at}fhcrc.org; FAX (206) 667-2437.
Article and publication are at www.genome.org/cgi/doi/10.1101/gr.165101.
| |
REFERENCES |
|---|
|
|
|---|
5 integrin subunit promoter. Identification of a novel sequence mediating granulocyte-macrophage colony-stimulating factor-dependent repression of
5 integrin gene transcription.
J. Biol. Chem.
274:
1366-1374
-thrombin via activation of the ERK2-cPLA2 pathway.
Thromb. Haemost.
83:
610-616[Medline].Received September 15, 2000; accepted in revised form April 11, 2001.
This article has been cited by other articles:
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Nam, S.-B. Kim, S.-K. Kim, S. Yang, S.-Y. Kim, and I.-S. Chu ADGO: analysis of differentially expressed gene sets using composite GO annotation Bioinformatics, September 15, 2006; 22(18): 2249 - 2253. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Steinhoff and M. Vingron Normalization and quantification of differential expression in gene expression microarrays Brief Bioinform, June 1, 2006; 7(2): 166 - 177. [Abstract] [Full Text] [PDF] |