|
|
|
|
Vol. 10, Issue 1, 1-3, January 2000
INSIGHT/OUTLOOK
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ARTICLE |
|---|
|
|
|---|
The conundrum is familiar. You are sent back in time to the Middle Ages with no artifact from the present, brought before the local ruler, and given 24 hours to prove you are indeed from the future, to impress the ruler and his advisors in some way, before you are executed in some suitably hideous fashion. What do you do?
Toying with this conundrum reveals how little we know in a practical
sense about the everyday items that surround us. Can you fix your car
and your computer? My guess is that few, if any, readers can
do so. And so it was with some trepidation that Cold Spring Harbor
Laboratory agreed to host a short course in the Fall of 1999, funded in
part by the National Cancer Institute, in which students, primarily
biologists, would not only print, use, and analyze DNA microarrays but
would start the course by building the machines used to print the
arrays. For some time, Patrick Brown and colleagues (Chu et al. 1998
;
DeRisi et al. 1997
; Lashkari et al. 1997
) at Stanford had been
advocating the idea that smaller laboratories could enter the fray and
hype surrounding these emerging microarray technologies by building
machines rather than by buying them, a self-help philosophy that was
strengthened by the Brown laboratory's web-based publication in June
1998 of the MGuide, a step-by-step guide to construct the
arrayer, complete with parts list. Indeed, a number of laboratories
have gone ahead and built their own machines.
Commercial vendors already offer some solutions for investigators interested in studying changes in genome-wide gene expression. Efforts by Steve Fodor and others at Affymetrix (Santa Clara, CA) in the early 1990s had led to the development of the GeneChip technology, in which relatively costly photolithographic techniques are used to fabricate high-density microarrays of short sequences of single-stranded DNA base by base, but academic laboratories in particular find the technology both expensive and restrictive, the latter reflecting the fact that all of the arrays have to be manufactured by Affymetrix, presumably with a strong commercial perspective as to what genes (and from which species) are being arrayed. Because these arrays are composed of short (20-24 mer) oligonucleotides, they have application not only in monitoring "global" gene expression but also in the resequencing of genomic DNA, identification of single nucleotide polymorphisms (SNPs), and genotyping and will therefore have wide application in pharmacogenomics. But with current arrays sporting 40 features per gene (20 positive oligos designed to cover the length of the gene, and 20 mismatch controls containing identical sequences but containing a single centrally located mismatch), the Affymetrix approach can be considered to be overkill for many applications.
The second strategy revolves around commercialization of various
aspects of the Stanford technique. For example, Synteni, a company that
licensed the Stanford technology and was acquired by Incyte (Palo Alto,
CA) in 1998, together with several other competitors, print their own
arrays, where larger DNA fragments several hundreds of nucleotides in
length are prepared by PCR in advance and then coated onto various flat
substrates, primarily glass or nylon. These companies either sell
arrays or array services, an approach that suffers from similar
restrictions to the Affymetrix approach in terms of which genes the
companies decide to array. Many of these products consist of low
thousands, hundreds, or even tens of arrayed sequences. Meanwhile, a
third approach, midway between the second strategy and the purist
Stanford approach, is to buy an arrayer from a commercial vendor such
as Cartesian Technologies (Irvine, CA), and then make the DNA chips
de novo. This offers flexibility to the investigator in terms
of which sequences are arrayed, and the technical support of the vendor in case the printing robot breaks down or becomes unaligned
printing tens of thousands of discrete DNA "features" requires that these arrayers are tightly aligned in both horizontal directions. However, these arrayers have specifications no better and are currently at least
twice the cost of home-built machines. This brings us back to the
Stanford approach
build the machines from scratch. And to our own
trepidation, could a group of 16 biologists
selected from a pool of
>125 applicants on the basis of their biological interests rather
than their machining skills
actually build the machines, albeit with
expert guidance from members and former members of the Brown and
Botstein laboratories in Stanford, such that they could be used to
print high-density DNA microarrays (Table 1)?
|
As is usual for Cold Spring Harbor courses, the students included laboratory heads, senior scientists, and post-docs, plus two from Britain, and one each from Sweden, Germany, and New Zealand, with the remainder coming from academic laboratories in the United States with widespread interest in topics ranging from the cell cycle, origins of replication, cancer (and the development of anti-cancer vaccines), signal transduction, apoptosis and neurobiology. Preference was given to individuals whose applications strongly suggested that they would move swiftly to develop and apply this technology at their home institutions and make it available to other investigators. The explicit intention was to spread the application of these techniques as widely as possible, both geographically and scientifically.
The students assembled at Cold Spring Harbor Laboratory on the night of
October 19 to begin the 2-week course, and began building the arrayers
the next morning. With one arrayer built in advance by Vishy Iyer and
Jo DeRisi, a lead instructor in the course, serving as a guide, the
students were able to build three complete machines by the third day of
the course
these were long 16 hour days
despite "teething
problems" in terms of broken or malfunctioning components (Fig.
1). Predictably, the students learned more from the
problems that they encountered than an error-free assembly of the
equipment might have offered.
|
By the fourth and fifth days, the course was printing duplicate arrays of the entire 6200-gene set of Saccharomyces cerevisiae, chips valued in excess of several tens of thousands of dollars by current commercial prices, using clones provided by Stanford. With four machines in operation, the course laboratory, for 1 week in October at least, probably represented the largest chip printing facility anywhere, with a hypothetical annual capacity for >100,000-150,000 twenty-eight thousand spot arrays. The quality of these homemade microarrays may vary rather more widely than commercially available arrays, but with the cost differential so large between the two approaches, various kinds of error can be significantly reduced by increasing the number of replicate arrays or even by altering the pattern of printing.
With sufficient arrays printed and available for experimentation, the students were ready to prepare samples for hybridization. Regardless of how DNA microarrays are fabricated, at this point methods for using these arrays start to coalesce, particularly in terms of gene expression analysis. Because of the enormous variation in the number of mRNA molecules being analyzed, and because of the complexities of the hybridization kinetics of individual DNA sequences, microarrays are used to measure the ratio between a reference and a sample, typically labeled with green and red fluorescent dyes, rather than the absolute quantity of transcript. It is for this reason that raw array data are typically represented as a grid of dots of varying intensities of red, yellow and green. The individual spot represents a marker for a given gene or sequence whereas the intensity of the red or green spots indicates the degree of expression difference between sample and reference but gives no information as to whether this is an abundantly or poorly expressed gene; bright yellow spots simply indicate good hybridization of equal numbers of red- and green-labeled molecules and imply no change in gene expression.
The students were able to use equipment loaned by various vendors to
scan the slides and began the process of analyzing the data. One of the
instructors, Michael Eisen, has been at the forefront of the
development of a suite of freely available software tools, including
ScanAlyze and Cluster, which help investigators work with the raw data
(Fig. 2). Low-quality spots need to be identified, whether arising from inconsistencies in the surface, poor printing, or
poor hybridization. Spot intensity may vary across individual spots,
and so various kinds of averaging have to be done, taking into account
background signal. The clustering algorithms as developed by Eisen and
others (1998)
essentially allow the extraction of patterns of gene
expression from a large quantity of data sets and use various
strategies to help the investigator in visualizing large numbers of
gene expression ratios. An elegant analogy used by these investigators
to underscore the process is to take a Raphael painting, slice and dice
the painting into thousands of randomly rearranged strips, and then
attempt to reconstruct the original
one knows the pattern is there,
but how does one (re)discover it? And the principal theme emerging from
microarray experiments is that groups of genes that are functionally
related tend to be coregulated at the transcriptional level.
|
Microarray technology has been criticized for diverging from the
current trend for hypothesis-driven research. It strikes me that this
is unfair
investigators using array technology seem to me to be the
equivalent of the nineteenth century zoologists and botanists who
traveled the world collecting everything they could lay their hands on.
Provided the collection is done well, the data are then available for
others to study and draw inferences from. And if it accelerates the
process of identifying genes of unknown function by virtue of their
expression profiles, this is surely only a good thing. It is clear that
as the amount of gene expression data and whole genomic information
grows, it is vital that sufficient effort is spent on trying to develop
ways in which data generated can be compared between laboratories. The
challenges ahead lie as much in the development of sophisticated databases and advanced bioinformatics to mine reliable information from
disparate data sets as in the relatively straightforward preparation of
the arrays themselves.
The course allowed Brown, DeRisi, Eisen, and their colleagues to
communicate their passionate conviction to a captive audience that
arrays allow researchers both detailed and simultaneously holistic
views of how organisms function. There is no doubt that we are going to
witness a plethora of whole genome studies as the technology develops,
and as more investigators begin to array not only DNA but antibodies
and other proteins and to study noncoding DNA, transcription
factor binding, protein-protein interactions and other macromolecular
interactions. Although I cannot guarantee that any of these students
would survive the judgement of the ruler in the familiar conundrum
mentioned above, I am convinced that in the "new world" of
microarrays (Brown and Botstein 1999
), they will at least be capable of
troubleshooting their arrayer (and, perhaps, their car and computer too).
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL stewart{at}cshl.org; FAX (516) 367-8845.
| |
REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||