Vol. 9, Issue 2, 189-194, February 1999
RESOURCE
An Effective Approach for Analyzing "Prefinished" Genomic Sequence Data
Peter M.
Kuehl,1,2,3
Jane M.
Weisemann,3
Jeffrey W.
Touchman,2
Eric D.
Green,2 and
Mark S.
Boguski3,4
1 University of Maryland, Department of Molecular and
Cellular Biology, Baltimore, Maryland 21201; 2 Genome
Technology Branch, National Human Genome Research Institute, National
Institutes of Health, Bethesda, Maryland 20892; 3 National
Center for Biotechnology Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894 USA
Ongoing efforts to sequence the human genome are already generating
large amounts of data, with substantial increases anticipated over the
next few years. In most cases, a shotgun sequencing strategy is being
used, which rapidly yields most of the primary sequence in incompletely
assembled sequence contigs ("prefinished" sequence) and more
slowly produces the final, completely assembled sequence ("finished" sequence). Thus, in general, prefinished sequence is
produced in excess of finished sequence, and this trend is certain to
continue and even accelerate over the next few years. Even at a
prefinished stage, genomic sequence represents a rich source of
important biological information that is of great interest to many
investigators. However, analyzing such data is a challenging and
daunting task, both because of its sheer volume and because it can
change on a day-by-day basis. To facilitate the discovery and
characterization of genes and other important elements within prefinished sequence, we have developed an analytical strategy and
system that uses readily available software tools in new combinations. Implementation of this strategy for the analysis of prefinished sequence data from human chromosome 7 has demonstrated that this is a
convenient, inexpensive, and extensible solution to the problem of
analyzing the large amounts of preliminary data being produced by
large-scale sequencing efforts. Our approach is accessible to any
investigator who wishes to assimilate additional information about
particular sequence data en route to developing richer annotations of a
finished sequence.
[Our software system is available via an extensive web supplement to
this article at http://www.ncbi.nlm.nih.gov/Kuehl/prefinished.]
4
Corresponding author.
9:189-194 ©1999 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/99 $5.00