Vol. 10, Issue 4, 543-546, April 2000
METHODS
Drosophila Genomic Sequence Annotation Using the BLOCKS+ Database
Jorja G.
Henikoff,1 and
Steven
Henikoff1,2
2 Howard Hughes Medical Institute, 1 Fred
Hutchinson Cancer Research Center, Seattle, Washington 98109-1024 USA
A simple and general homology-based method for gene finding was
applied to the 2.9-Mb Drosophila melanogaster Adh region, the
target sequence of the Genome Annotation Assessment Project (GASP).
Each strand of the entire sequence was used as query of the
BLOCKS+ database of conserved regions of proteins. This
led to functional assignments for more than one-third of the genes and
two-thirds of the transposons. Considering the enormous size of the
query, the fact that only two false-positive matches were reported
emphasizes the high selectivity of protein family-based methods for
gene finding. We used the search results to improve BLOCKS+ by identifying compositionally biased blocks. Our
results confirm that protein family databases can be used effectively
in automated sequence annotation efforts.
1
Corresponding author.
10:543-546 ©2000 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/00 $5.00