|
|
|
|
Genome Research
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Genes differentially expressed in different tissues, during development, or during specific pathologies are of foremost interest to both basic and pharmaceutical research. "Transcript profiles" or "digital Northerns" are generated routinely by partially sequencing thousands of randomly selected clones from relevant cDNA libraries. Differentially expressed genes can then be detected from variations in the counts of their cognate sequence tags. Here we present the first systematic study on the influence of random fluctuations and sampling size on the reliability of this kind of data. We establish a rigorous significance test and demonstrate its use on publicly available transcript profiles. The theory links the threshold of selection of putatively regulated genes (e.g., the number of pharmaceutical leads) to the fraction of false positive clones one is willing to risk. Our results delineate more precisely and extend the limits within which digital Northern data can be used.
| |
INTRODUCTION |
|---|
|
|
|---|
Very large-scale, single-pass partial sequencing of cDNA clones
from a large number of libraries has led to the identification of
~50,000 human genes (Adams et al. 1995
; Aaronson et al. 1996
; Hillier et al. 1996
). However, a precise function or
a complete transcript sequence are known for <5000 of these (Adams
et al. 1995
; Boguski and Schuler 1995
). In the absence of functional clues for most of the newly identified genes, evidence of differential expression is the most important criteria to prioritize the
exploitation of anonymous sequence data in both basic and
pharmaceutical (Nowak 1995
; Adams 1996
; Bains 1996
; Editorial 1996
)
research. For example, the study of expression profiles in various
tumors is central to the new Cancer Genome Anatomy project (Kuska 1996
;
O'Brien 1997
). In contrast to functional assays, the quantitative
analysis of gene expression level lends itself to large-scale
implementation. Two main approaches have been proposed (1)
"analog" methods based on hybridization to arrayed cDNA libraries
(Lennon and Lehrach 1991
; Gress et al. 1992
; Nguyen et al. 1995
; Schena
et al. 1995
; Zhao et al. 1995
) or oligonucleotide "chips" (Fodor
et al. 1991
; Southern et al. 1992
; Guo et al. 1994
; Matson et al.
1995
); and (2) "digital" methods, based on the generation of
sequence tags. This paper focuses on the latter. The sequence tag-based
method (Okubo et al. 1992
; Matsubara and Okubo 1994
) consists of
generating a large number (thousands) of expressed sequence tags (ESTs)
(Adams et al. 1991
; Wilcox et al. 1991
; Adams et al. 1992
; Khan et al. 1992
) from 3
-directed regional non-normalized cDNA libraries. Recently, Velculescu et al. (1995)
have introduced the serial analysis
of gene expression (SAGE). Although tags are 100-300 nucleotides in
length in the original EST approach, the SAGE method only requires nine
nucleotides, therefore allowing a larger throughput. In both protocols,
the number of tags is reported to be proportional to the abundance of
cognate transcripts in the tissue or cell type used to make the cDNA
library. The variation in the relative frequency of those tags, stored
in computer databases, is then used to point out the differential
expression of the corresponding genes: This is the concept of a
"digital Northern" comparison. In the absence of a sound
theoretical framework, the validity of the method has only been
verified for a handful of genes in the context of two cellular
differentiation systems (Lee et al. 1995
; Okubo et al. 1995
) inducible
in vitro. Yet, with a total number of human genes of ~80,000 or
more, it is not intuitive that sequencing a mere few thousand tags (a
typical experiment) from highly redundant non-normalized cDNA libraries
can produce a useful picture, or realistic "transcript profile,"
of a given tissue, development stage, or cell type. What variations in
tag numbers allow for a reliable inference about differential
expression? How many tags should be generated? Here we present the
statistical framework required to answer those questions and analyze
transcript profiles in a quantitative manner.
| |
RESULTS |
|---|
|
|
|---|
In Methods we establish the probability distribution governing the occurrence of the same rare event in duplicate experiments. This probability distribution is a general result applicable to a wide variety of experimental situations, although this paper focuses on its use to analyze digital gene expression patterns. The main and only mathematical assumption behind the derivation is that the observed events are rare and part of a large population of possible outcomes (the distribution of which is not specified). In the context of a digital Northern, one event is the observation of a given cDNA sequence tag, and the experiment consists of the random picking and partial sequencing of a number N of cDNA clones. Given the usual complexity (i.e., the number of different genes expressed) of cDNA libraries, observing a given cDNA qualifies as a rare event, as the abundance of most individual messages is of the order of a few percents or less.
Random Fluctuation vs. Significant Change in Tag Number: When to Infer Differential Expression
Let us randomly pick N = 1000 clones from a cDNA library and generate the corresponding sequence tags; a given message (e.g., interleukin-2) will be picked x (e.g., two) times, with x in a typical (0-10) range. If we now redo this experiment, that is, again pick 1000 clones and generate the tags, the same message will now be picked y (e.g., 3) times. If the experiments have been duplicated correctly and the clones selected at random, we expect x and y to be close, albeit often different because of random fluctuations. In the Methods section, we show that the expected probability of observing y occurrences of a clone already observed x times is given by the simple formula:
|
(1) |
within which we
expect to find y with a given probability, noted 1-2
,
where 2
is the significance level. For
small (e.g., 2.5% or
less), y values falling outside the [ymin,
ymax]
interval correspond to
p(y
x) << 1, therefore pointing
out very unlikely random fluctuations between the two experiments. The confidence intervals for the usual 1% and 5% significance levels are
given in Table 1.
|
The same confidence intervals listed in Table 1 can in fact be used to
analyze the results of sampling N clones from two different
libraries. Provided all experimental factors are well replicated,
significant discrepancies between x (from one library) and
y (from the other) will now characterize differentially
expressed genes, for example, the relative abundance of which is
unlikely to be the same in the two libraries. Simply reading Table 1, we see that variations in counts such as 7
0, or 2
12
are significant (P < 0.01) evidence of regulated gene
expression, whereas variations such as 3
0 or 8
16 are
not (P > 0.05). However, we do not advocate the use of
rigid significance thresholds to analyze digital transcript profiles,
as discussed below.
Influence of the Sampling Size
Surprisingly at first, p(y
x) in
Equation 1 does not involve the sampling size N, that is, the
total number of picked clones. The fluctuation probabilities, and
confidence intervals, depend only on the values of the observed counts.
To understand why, we must remember that Equation 1 governs the results
of strictly duplicated experiments. Given N clones are
sampled, the most likely tags to be picked up are, intuitively, those
corresponding to cDNA, the abundance of which is of the order of
1/N, or larger (according to Equation 3, the probability of
finding a given cDNA with 1/N abundance while picking up
N clones is 0.63, see also Equation 13). Choosing a sampling
size therefore corresponds to targeting a given subset of genes, the
level of expression of which allows their tags to occur at reasonable
frequencies.
As expected, more reliable inferences can be made on clones corresponding to larger absolute frequencies (i.e., the ones more often picked up). For example (see Table 1), a variation in counts from 1-3 (threefold increase) is not indicative of a significant (P < 0.05) increase, whereas a variation from 4-12 is significant at P < 0.05, and a variation from 7-21 is significant at P < 0.01. For a gene expressed at a given rate, increasing the sampling size N leads to higher tag counts, and allows more stringent statistical inference to be made, for the same proportional variation.
Most often in practice one wishes to compare digital Northerns or gene profiles that have been computed from the random picking of different numbers of clones, N1 and N2. The mathematical problem is now to establish the probability for a given cDNA (e.g., interleukin-2) to be picked up x times when the sampling size was N1 and y times when the sampling size was N2. Equation 1 then becomes (see Methods):
|
(2) |
x) are expected to characterize the
genes exhibiting regulated expression, the relative abundance of which
is unlikely to be the same in the two libraries.
Comparison with Fisher's (2 × 2) Exact Test
The (2 × 2) contingency tables arising from treatment versus
control experiments are traditionally analyzed with Fisher's exact
test (Siegel 1956
; Agresti 1996
). Differential EST count data can be
presented in a tabulated form so as to suggest the use of this test, as
follows:
| Brain cDNA library | Liver cDNA library | |
| Number of actin ESTs | 2 | 11 |
| Number of other ESTs | 998 | 1189 |
| Total clones
sampled |
1000 | 1200 |
The statistical significance according to Fisher's exact test
for such a result is 4.6% (two-tail P-value, i.e., the
probability for such a table to occur in the hypothesis that actin EST
frequencies are independent of the cDNA libraries). In comparison, the
P-value computed from the cumulative form (Equation 9, see
Methods) of Equation 2 (i.e., for the relative frequency of actin ESTs
to be the same in both libraries, given that at least 11 cognate ESTs
are observed in the liver library after two were observed in the brain
library) is 1.6%. Fisher's (2 × 2) exact test is always more
conservative than our test (e.g., Fisher's P-value of 1.6%
requires a 2
13 EST count transition in the above setting). Besides being too conservative, there is a more fundamental difficulty in using this test to analyze EST count data. The sampling scheme assumed by Fisher's exact test in principle requires the total number
of data values in the contingency table to be fixed, as well as both
the row marginal total and the column marginal totals. In our
prospective experimental situation, only the column marginals (i.e.,
the numbers of clones sampled from each library) are fixed. The
extension of Fisher's exact test to cases where only one set of
marginal totals is fixed (Tocher 1950
) is still controversial. In the
context of the above EST counting results, there is an additional
problem with the lack of homogeneity in the definition of the "other
EST" category. This category represents different subsets of
transcripts for different libraries.
The use of Fisher's (2 × 2) exact test is more natural for a different type of EST data analysis: the study of library-dependent alternative transcripts of the same gene (i.e., splice or polyadenylation variants) (D. Gautheret, O. Poirot, F. Lopez, S. Audic, and J.-M. Claverie, in prep.). Here, the results for an hypothetical gene G1 may look as follows:
| G1-related transcripts in brain library | G1-related transcripts in liver library | |
| Long-form mRNA | 2 | 10 |
| Short-form mRNA | 8 | 3 |
| Total
G1-related clones |
10 | 13 |
where the alternative categories are unambiguously defined and refer to the same objects. For example, the above results constitute good evidence that G1 is expressed in different forms in those tissues (Fisher's exact test two-tail P-value = 1.2%).
False Leads in the Selection of Candidate Genes
A crucial measure of the power of statistical significance tests
is their rate of false alarm, that is, how often random fluctuations are expected to be mistaken for significant differences in the results.
When analyzing the transcript profiles from two different libraries, a
false alarm would cause a gene to be deemed differentially transcribed,
whereas in fact it is not. The rate of false alarm is therefore a
direct estimate of the fraction of false leads, when searching for
differentially expressed genes on the basis of differences in tag
counts. The rates of false alarm associated with the
P < 0.01 and P < 0.05 confidence intervals
listed in Table 1 have been computed by Monte-Carlo simulation on the
basis of two experimental sequence tag distributions (Table
2; Fig. 1). The rate of false
alarms associated with the use of Equation 1 (in fact, its cumulative
form Equation 9, see Methods) is very small for genes represented by
small tag counts and slowly increases for higher tag counts, without
ever exceeding the selected significance level. Such good behavior
validates the use of the confidence intervals (Table 1) computed from
Equation 1 and Equation 9 to assess the statistical significance of
variations in digital Northern data. The curves labeled "window"
characterize the very similar behavior of a slightly less conservative
derivation of the same test (see Methods, Equation 15). For comparison,
Figure 1 also presents the behavior of another test, based on an
inappropriate application of Ricker's confidence intervals (Ricker
1937
) (see Methods).
|
|
| |
DISCUSSION |
|---|
|
|
|---|
An appropriate statistical test is now at our disposal to begin
analyzing digital gene expression profiles in a more quantitative way.
For example, the test can be used to determine how many genes appear
regulated at various confidence levels using the data from a typical
experiment (e.g., sampling a thousand clones). We analyzed the data
gathered by Okubo et al. (1995)
on the human promyelocytic leukemia
cell line HL60 induced by dimethylsulfoxide (DMSO) or tetradecanoylphorbolacetate (TPA). Table 3 shows the
21 EST classes the occurrences of which exhibit significant variations
at the 1% level. Most of the corresponding genes make biological sense in term of differentiation along the granulocyte or monocyte pathways.
|
This example serves to discuss a subtle point in the interpretation of
the P values computed from Equation 1, 2, and 9. Rigorously, these equations apply to the case where a given gene (e.g., lipocortin) would have been selected for scrutiny before looking at the differences in cognate tag counts between libraries. When comparing two libraries without specifying in advance the transcripts we want to follow, and
then focusing a posteriori on any of those exhibiting significant variations, the average number of expected false positive
Nfalse is
Nfalse = PNspecies, where
Nspecies is the number of different transcript
species encountered and p is a given significance level. For
instance, in the experiment analyzed in Table 3,
Nspecies is of the order of 600 (Okubo et al. 1995
).
It is therefore possible that up to four
(600 × 7 × 10
3) out of the 21 transcript
species listed in Table 3 are not truly differentially expressed.
Therefore, when two libraries are compared without prior gene selection, the use of a predetermined significance threshold is not advisable. The P values computed from Equation 1, 2, and 9 should simply be used to rank all observed variations by order of decreasing statistical significance (analogous to how "similarity hits" are listed after database searches). The end-users can then make their own choice about the number of candidate target genes to be retained from the top of the list, bearing in mind the corresponding number of expected false positives.
Although the present interpretation of a digital Northern focuses on
the genes exhibiting the most spectacular differential expressions,
there is already ample evidence that small changes can cause drastic
effects. Disease states caused by haploinsufficiency and trisomy
suggest that 2
1 or 2
3 proportional changes in expression level may be of biological significance. Table 1 shows that
there is no theoretical limit to the detection of such small variations
from the comparison of digital expression patterns. Simply, the
sampling size has to be increased enough for the required numbers of
cDNA tags to reach a significance threshold (for instance 40
60, for a confidence level of 95%).
Analog hybridization-based methods (Fodor et al. 1991
; Lennon and
Lehrach 1991
; Gress et al. 1992
; Southern et al. 1992
; Guo et al. 1994
;
Matson et al. 1995
; Nguyen et al. 1995
; Schena et al. 1995
; Zhao et al.
1995
) are traditionally opposed to digital tag-counting methods (Okubo
et al. 1992
; Matsubara and Okubo 1994
; Lee et al. 1995
; Okubo et al.
1995
; Velculescu et al. 1995
) for the analysis of differential gene
expression. Both types of methods are sensitive to the quality of
the original messenger RNA preparation and/or cDNA libraries. Analog
methods promise higher throughput, lower cost, and have the capacity of
studying transcripts on a much wider scale of abundance. They are
therefore expected to supersede digital methods. On the down side,
however, hybridization signals are not easily reproducible, and can be
affected by many unknown properties such as the cDNA library
complexity, as well as clone and sequence specific features (e.g.,
insert size, nucleotide composition, presence of repeats, secondary
structure, triple helix interaction, etc.). Therefore, the
hybridization-based methods require an estimation of the dispersion of
the signal associated with each clone (i.e., enough repetitions of each
experiment), and multiple standardization and calibration procedures to
allow the meaningful comparison of hybridization patterns obtained from various sources (tissues, cell types, etc.) or from different membranes
or chips. This is far from routine and has yet to be worked out. In
contrast, and thanks to the unique properties of the Poisson
distribution, digital methods have the capacity of providing a
quantitative assessment of differential expression without the
repetition or the standardization of individual tag-counting experiments. The statistical analysis presented here provides an
objective method to analyze digital transcript profile data, and adapts
it to fit (1) the number of leads one wants to be followed; (2) the
fraction of false clues to be tolerated; and (3) the level of
modulation in gene expression considered of biological interest.
A program is available on our web site (http://igs-server.cnrs-mrs.fr) to compute the confidence intervals corresponding to arbitrary significance levels and sampling size N1 and N2.
| |
METHODS |
|---|
|
|
|---|
Let us denote p(x) the probability to observe
x sequence tags of the same gene (i.e., from the 3
end of
the same transcript) when N cDNA clones are picked randomly.
For each transcript representing a small (i.e., less than 5%) fraction
of the library and N
1000, p(x) will
closely follow the Poisson distribution:
|
(3) |
is the actual (albeit unknown) number of transcript of this
type per N clones in the library. If we duplicate this experiment (i.e., once again randomly pick N clones of the
same library and generate sequence tags), we will now observe
y occurrences of the same transcript. What is the probability
of the various y values? An approximate solution consists in
using x as the maximum likelihood estimate for
and
compute the probability for y occurrences given a Poisson
distribution of mean
= x:
|
(4) |
x) = p(x
y)
should hold provided that an equal number N of clones is
sampled in both experiments. Equation 4 is not the correct formula,
because we have not yet taken into account the fluctuation of
x around the unknown mean
. To account for the fact that
the actual value of
is unknown, we have to integrate Equation 4
over all possible
values:
|
(5) |
x) in Equation 5 is
the probability that the actual abundance of a given transcript is
given that x occurrences of a cognate tag have been
observed in one experiment. The second term in the integral is the
probability of drawing y occurrences given a Poisson
distribution of mean
:
|
(6) |
x) can be written as
|
(7) |
). The least constrained
hypothesis (i.e., with the least information content), is to attribute
an equal a priori probability to all
values in the [0,
]
range. Incorporating such a flat prior in Equation 5 leads to
|
(8) |
function for integer arguments we
observe that
|
|
(1) |
x) can be evaluated regardless of
the distribution of each transcript (provided it is rare) within a cDNA
library.
To compute the confidence intervals listed in Table 1, we made use of the cumulative distributions:
|
(9a) |
|
(9b) |
such as
C(y ymin
x)
and
D(y
ymax
x)
. Given that an event is observed x times in one experiment, the number y of occurrences of this event in a duplicate
experiment is expected to fall within the interval
[ymin, ymax]
with a probability of 1-2
. Equation 9, a and b, can therefore
serve as a significance test when comparing, for instance, the results of sampling N clones from two different libraries. For 2
small (e.g., 5% or less), y values falling outside the
[ymin, ymax]
interval correspond to
p(y
x) << 1, and point out
significant differences between the two experiments. They should
include differentially expressed genes, for example, for which
is
different in the two libraries.
Generalization to Different Sampling Sizes
When different numbers of clones N1 and N2 are sequenced from the same library, Equation 5 becomes
|
(10) |
1 and
2 are
forced in the same ratio as N1 and
N2. Using the same bayesian argument as before (Equation 7) leads to
|
(11) |
|
|
(2) |
Ricker's Confidence Interval
The confidence interval computed from Equation 1 (and its
cumulative form, Equation 9, a and b) is different from one introduced previously by Ricker (1937)
although, at first, the two may appear to
be related.
Given x occurrences of a sequence tag, Ricker's formula
defines a confidence interval [
min,
max]x for
(again the actual
number of transcripts of this type per N clones in the
library) such as
|
(12a) |
|
(12b) |
is typically 5% or 1%. Ricker's confidence intervals for
various values of x are given in Table 1. Those intervals are
close to those computed from Equation 1, but delineate the range of
likely
values, not y (the number of occurrences of the
same event in a duplicated experiment). It is possible for x
and y to fall outside each other's Ricker's confidence
interval [
min,
max], while still being
nonsignificant fluctuations around the same
value. The confidence
intervals computed from Equation 12, a and b, are therefore too narrow
to properly define significant discrepancies between x and
y. The false alarm rate associated with the use of Ricker's
confidence intervals is too high (Fig. 1).
However, an interesting use of Equation 12, a and b, is the estimation
of the range of possible frequencies [
min,
max]x = 0 for cDNAs not yet
encountered after picking N clones. For example, the 95%
confidence interval is given by:
|
(13) |
Influence of the Prior Distribution
In the bayesian context, it is prudent to assess the influence of
the prior hypothesis used to derive Equation 1 and Equation 2. The flat
p(
) prior allowing equiprobable
values in the [0 ,
] range might appear too broad and unrealistic. Nevertheless, it
is the most intuitively neutral distribution one can use. The quick
convergence of the Poisson distribution rends the contribution of
extreme
values negligible as soon as
x
or
y
increase. To verify this point, more reasonable distributions for
p(
) can be constructed by confining the accessible
values within a window [
min,
max]x centered around the already observed value x. Such a window can for instance be Ricker's
confidence intervals as defined in the previous section (Equation 12, a
and b). We then confine the only permitted values of
to be in
this interval, with an equal probability; therefore,
|
(14) |
|
(15) |
|
(16) |
min
0 and
max
, we recover
the initial Equation 1. We note in passing that the other limit,
min =
max = x, corresponds to the
most stringent "Dirac prior":
|
(17) |
= x and turning
p(y
x) into the simple Poisson
distribution (Equation 4).
The confidence intervals for the usual 1% and 5% significance levels
are given in Table 1 for both the flat and the window prior
p(d =
). There is little difference, with
the test derived from using a flat prior being a bit more conservative,
as expected. On the down side, Figure 1 shows that the test derived
from the window p(
) prior gives rise to a higher rate of
false alarm.
| |
ACKNOWLEDGMENTS |
|---|
We thank Drs. J. Weissenbach, D. Gautheret, R. Ewing, and C. Abergel for critically reading the manuscript. This work was sponsored by a collaborative research grant from Incyte Pharmaceuticals, Inc.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL jmc{at}igs.cnrs-mrs.fr; FAX 334 91 16 45 49.
| |
REFERENCES |
|---|
|
|
|---|
untranslated sequences of human cDNAs for rapid chromosome assignment and conversion to STSs: Implications for an expression map of the genome.
Nucleic Acids Res.
19:
1837-1843Received March 12, 1997; accepted in revised form August 22, 1997.
This article has been cited by other articles:
![]() |
M. Semon and K. H. Wolfe Preferential subfunctionalization of slow-evolving genes after allopolyploidization in Xenopus laevis PNAS, June 17, 2008; 105(24): 8333 - 8338. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Morin, M. D. O'Connor, M. Griffith, F. Kuchenbauer, A. Delaney, A.-L. Prabhu, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells Genome Res., April 1, 2008; 18(4): 610 - 621. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. H. Wong, M. J. Sullivan, and G. Cagney Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments Brief Bioinform, March 1, 2008; 9(2): 156 - 165. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nagel, L. K. Culley, Y. Lu, E. Liu, P. D. Matthews, J. F. Stevens, and J. E. Page EST Analysis of Hop Glandular Trichomes Identifies an O-Methyltransferase That Catalyzes the Biosynthesis of Xanthohumol PLANT CELL, January 1, 2008; 20(1): 186 - 200. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Cho-Vega, S. Tsavachidis, K.-A. Do, J. Nakagawa, L. J. Medeiros, and T. J. McDonnell Dicarbonyl/L-Xylulose Reductase: A Potential Biomarker Identified by Laser-Capture Microdissection-Micro Serial Analysis of Gene Expression of Human Prostate Adenocarcinoma Cancer Epidemiol. Biomarkers Prev., December 1, 2007; 16(12): 2615 - 2622. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Costa, C. L. Borges, A. M. Bailao, G. V. Meirelles, Y. A. Mendonca, S. F. I. M. Dantas, F. P. de Faria, M. S. S. Felipe, E. E. W. I. Molinari-Madlum, M. J. S. Mendes-Giannini, et al. Transcriptome profiling of Paracoccidioides brasiliensis yeast-phase cells recovered from infected mice brings new insights into fungal response upon host interaction Microbiology, December 1, 2007; 153(12): 4194 - 4207. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Hane, R. G.T. Lowe, P. S. Solomon, K.-C. Tan, C. L. Schoch, J. W. Spatafora, P. W. Crous, C. Kodira, B. W. Birren, J. E. Galagan, et al. Dothideomycete Plant Interactions Illuminated by Genome Sequencing and EST Analysis of the Wheat Pathogen Stagonospora nodorum PLANT CELL, November 1, 2007; 19(11): 3347 - 3368. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-A.L. Stanton, A.B. Macgregor, C. Mason, M. Dameh, and D.P.L. Green Building comparative gene expression databases for the mouse preimplantation embryo using a pipeline approach to UniGene Mol. Hum. Reprod., October 1, 2007; 13(10): 713 - 720. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Huang, E. D. Ple |