|
|
|
|
Vol. 12, Issue 7, 1017-1018, July 2002
INSIGHT/OUTLOOK
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
INTRODUCTION |
|---|
|
|
|---|
It is a great time to be a biologist. The raw information encoded in the genomes of humans and a host of model organisms is being extracted, but we are still far from understanding what all this raw data means. Browsing a genome and looking at all of the sequences known only by gene prediction ID numbers is both humbling and exciting. Ultimately, the biggest revolution ushered in by ongoing genome projects may well be large-scale biology, not novel techniques. After all, sequencing has been around for decades and creative scientists exploited sequencing for cottage industry gene-by-gene analysis from the beginning. The genomic era is handing us the books of life, but the genomic era is also a story of building the infrastructure and management skills necessary to bring big science to biology. We will need new strategies and more collaborative interactions to systematically explore sequenced genomes.
Simin et al. (2002)
generated a collection of expressed sequence tags
(ESTs) from Drosophila embryos and then determined the spatial
pattern of expression of the corresponding transcripts by in situ
hybridization. This effectively points out how a big dataset can help
us map functions onto sequenced genomes. The manuscript is a harbinger
of the future for the development of a flexible robotic process that
can be applied not only to genomic problems, but for a range of
techniques that fall under the banner of biology.
Drosophilists are justifiably proud of how much we have learned about
patterning during development through the analysis of mutant
phenotypes. The classic screens for embryonic lethality showed that
genes are required in restricted domains in the embryo and act to
progressively subdivide the embryo into not only tissue layers, but
also into distinct structures along the anterior-posterior and
dorsal-ventral planes (Jürgens et al. 1984
;
Nüsslein-Volhard et al. 1984
; Wieschaus et al. 1984
). These
genes, which are all regulators of information flow through genetic
networks, also show spatially restricted expression by in situ
hybridization. Understanding the network of regulators is terrific, but
a longstanding question has been the identity of the genes that do the
work of development. Simin et al. (2002)
and related studies report a host of restricted patterns of transcripts, such as ribosomal protein-encoding mRNAs, which one would not expect to find in a genetic
screen for a particular pattern defect, because the wild-type genes are
either required for all cells (as in the case of the ribosome) or have
subtle phenotypes obscured by the action of other genes (Kopczynski
et al. 1998
; Liang and Biggin 1998
; Simin et al. 2002
). These patterned
transcripts identify good candidates for the terminal differentiation
(or effector) genes that are likely to tweak cells in the subtle ways
required for their function as part of the organism. Systematic
analysis of spatial mRNA patterns is likely to be an important tool in
parsing the nucleic acid-binding proteins that decode the genome from those that ultimately build cells with slightly differing roles.
This study also holds interest for those who are curious about the
current status of genome annotation. ESTs in combination with in situ
hybridization experiments provide high-quality biological evidence for
predicted genes and unique evidence for genes missed in the early
rounds of genome annotation. Interestingly, 11% of the EST sequences
characterized in this study failed to align with pre-existing
Drosophila ESTs (Andrews et al. 2000
; Rubin et al. 2000
) or
predicted genes (Adams et al. 2000
). Mounting evidence from this and
other manuscripts clearly indicates that genome annotation is difficult
(Ashburner 2000
; Gaasterland and Oprea 2001
). Whereas none of us would
willingly go back to the pre-annotated genome, it is quite clear that
genome annotation should be taken with a grain of salt. It is also
quite clear that in the rush to proteomics, we should not forget to
expend efforts to finish mapping biological evidence (such as
full-length cDNAs, array data, spatial patterns of expression, and
mutant phenotypes) onto the finished genome sequence. Genome annotation
should be seen as a process, not an event.
There is a serious and unresolved issue of how to track and make sense
of large sets of biological information, such as spatial and temporal
gene expression profiles. The authors have joined the growing list of
laboratories that list Gene Ontology terms (Ashburner et al. 2000
) in
addition to the standard DNA ID numbers. This is important, but there
is room for improvement. FlyBase (1999)
, the outstanding database of
the Drosophila genome will need to develop ways to readily
incorporate large data sets such as seen in Simin et al. (2002)
.
Authors will need to help too, by including FlyBase ID numbers in their
manuscripts and associated supplemental tables. However, none of these
IDs are ultimately stable. As we dig deeper into issues such as
alternative splicing and overlapping transcripts in metazoan genomes,
not to mention evolving gene models for genes, using IDs for genes is
increasingly problematic. Perhaps the sequence itself is the only real
tag. It would be good practice to routinely associate any feature, such
as an in situ or gene-expression profile, with the corresponding genomic sequence. The result will be a sequenced genome that will be a
book and also a card catalog for a library. Finally, although lists of
genes and associated patterns will be of interest to most Drosophilists
working on the development of the embryo, photos of all of the in situ
patterns would be valuable. Experts on a particular tissue or pattern
would be able to extract additional information from the primary data.
Developing databases for storing and especially querying in situ
hybridization patterns, not just the descriptions, will be important.
Of key interest these days too, given the current large (and growing)
size of datasets, is the automation of experimental procedures required
for extracting such data. My view of laboratory automation consists of
machine-aided manual labor, that is, a standard laboratory populated
with a few PCR machines, a liquid handling station, and a microarrayer
to handle the most onerous repetitive steps. With the exception of the
microscopic analysis (a major bottleneck), robotics was used throughout
the Simin et al. (2002)
project. The range of steps that were automated
is really quite impressive. The authors used a robotic laboratory that
can seamlessly handle many of the tasks performed by a seasoned laboratory worker (Fig. 1). The laboratory
web site lists developed protocols for micro- and macro-array
generation, plasmid preps, PCR reactions, RFLP analysis, agarose gel
analyses, filter and bead-binding assays, ELISAs, density gradient
fractionation analyses, in vitro transcription and translation, and in
situ hybridization. Additionally, the robotic laboratory is not
constrained by a linear hand off of materials between one machine and
another. Because the tasks are broken down into modules, the system can
be programmed to perform new protocols by daisy chaining standard
tasks. The layout of the machines and the tracking system also improves
flexibility and throughput. Multiple protocols can be intertwined, such
that while one process is in a long incubation step (for 16 h in the case of in situ hybridization), another unrelated job can be run.
|
The exploration of new ways to maximize throughput is an exciting part of biology today. Time will tell how this plays out, but this new era will almost certainly involve more extensive collaborations among specialized groups of biologists, computer scientists, mathematicians, and engineers who concentrate on maximizing speed. Having a robotics laboratory that is limited more by the imagination and commitment of the staff and collaborators, and less by the list of protocols or the process pipeline, is clearly moving automation in the right direction.
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://metherall.genetics.utah.edu/HTR.html; Source for Figure 1.
| |
FOOTNOTES |
|---|
E-MAIL oliver{at}helix.nih.gov; FAX (301) 496-5239.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.295602.
| |
REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||