|
|
|
|
Genome Research
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Sequencing centers such as the Human Genome Center at LBNL are producing an ever-increasing flood of genetic data. Annotation can greatly enhance the biological value of these sequences. Useful annotations include possible gene locations, homologies to known genes, and gene signals such as promoters and splice sites. Genotator is a workbench for automated sequence annotation and annotation browsing. The back end runs a series of sequence analysis tools on a DNA sequence, handling the various input and output formats required by the tools. Genotator currently runs five different gene-finding programs, three homology searches, and searches for promoters, splice sites, and ORFs. The results of the analyses run by Genotator can be viewed with the interactive graphical browser. The browser displays color-coded sequence annotations on a canvas that can be scrolled and zoomed, allowing the annotated sequence to be explored at multiple levels of detail. The user can view the actual DNA sequence in a separate window; when a region is selected in the map display, it is highlighted automatically in the sequence display, and vice versa. By displaying the output of all of the sequence analyses, Genotator provides an intuitive way to identify the significant regions (for example, probable exons) in a sequence. Users can interactively add personal annotations to label regions of interest. Additional capabilities of Genotator include primer design and pattern searching.
[Further details for obtaining Genotator are available at http://www.cshl.org/gr.]
| |
ARTICLE |
|---|
|
|
|---|
| |
The Need for Automated Annotation |
|---|
Sequencing centers such as the Human Genome
Center at LBNL are producing an ever-increasing flood of genetic data.
It is increasingly being accepted that annotation of these sequences
can greatly enhance their biological value. In the past, annotation of
sequences has been done primarily by hand (e.g., Lewis et al. 1995
).
This approach to annotation is, and will continue to be, extremely valuable. However, the rate of sequencing has accelerated to the point
that it is impossible to have biologists personally annotate every new
base. An automated approach to sequence annotation is clearly
necessary.
Many researchers are developing tools for analyzing DNA sequences. These tools include programs that look for homologies to sequence in a database, predict possible exons, find repeats, and identify gene signals such as promoters and splice sites. Many of these sequence analysis tools can offer useful insight into the biological significance and possible function of a new sequence; however, they tend to suffer from several shortcomings. First, each sequence analysis program has its own output format (and often its own input format as well). This makes it difficult to compare the results of multiple programs. Second, although the ability to predict the locations of exons and other genetic signals continues to improve, it would still be rash to place absolute faith in the predictions of any one program. If, however, several different programs, with different approaches, make the same prediction on a sequence region, our confidence in the prediction is increased. Another limitation of many sequence analysis programs is that the output is textual rather than graphical, which makes it hard to quickly identify the significant regions of a genomic sequence. Some programs do have graphical displays, but that does not ease the problem of comparing the output of several different programs. Finally, most sequence analysis programs are not a solution to the problem of automated annotation, because they don't provide many of the features that one would want in such a tool, such as the ability to add personal annotations or to inspect the sequence at an arbitrary level of detail.
We have developed a sequence annotation workbench, Genotator, that addresses these shortcomings. Genotator provides a flexible, transparent system for automatically running a series of sequence analysis programs on genetic sequences. It also has a graphical display that allows users to view all of the annotations and add or delete their own. Genotator's display allows annotated sequences to be examined at multiple levels of detail, from an overview of the entire sequence down to individual bases.
| |
Background and Related Work |
|---|
A number of other researchers have developed tools that overlap to some extent with Genotator's functionality. One of the earliest was ACeDB (Durbin Thierry-Mieg 1991). ACeDB was developed as a database and graphical display tool for storing and analyzing data from the C. elegans sequencing project. It continues to be used by the groups involved with that and other sequencing projects. In a number of ways, ACeDB provides more functionality than Genotator. Its underlying database is much more sophisticated than Genotator's, and includes types of information (e.g., paper references) that are beyond the scope of Genotator. In other ways, ACeDB lacked the functionality to make it a complete system for sequence annotation. For example, it cannot be run automatically on a set of sequences to find database homologies, possible promoters, and so forth, in these sequences. I initially considered modifying ACeDB to enable it to work as an annotation tool, but found that this was not the best way to approach the problem, both because ACeDB was designed to be a database for sequencing projects rather than an annotation workbench, and because its code style is baroque, sparsely commented, and idiosyncratic.
Genome Topographer (T.G. Marr, unpubl.) is another example of an ACeDB-like program that includes a database to hold genome-related data plus displays to allow various views of the data. Like ACeDB, Genome Topographer was not designed as an interactive annotation tool.
Others have written tools more specifically designed for sequence
annotation. These include GeneQuiz, SCAN, the BCM search launcher, and
GAIA. GeneQuiz (Scharf et al. 1994
), like Genotator, automatically runs
a series of sequence analysis tools, including BLAST and FASTA. The
results are displayed as structured text. Darrell Ricke's SCAN program
(D.O. Ricke, J.M. Buckingham, A.C. Munk, N. Liu, D.C. Bruce, J.F. Chao,
Y. Shi, R. Lobb, E.H. Saunders, H.-C. Chi, J.R. Wu, N.A. Doggett, M.R.
Altherr, L.L. Deaven, and R.K. Moyzis, in prep.) has a back end similar
in some respects to Genotator, although it concentrates more on
database homology searches and less on exon prediction. The displays
are mostly structured text, some with hyperlinks. The BCM Search
Launcher (Smith et al. 1996
) provides a point from which to access
various sequence (and structure) analysis tools available on the World Wide Web. The user can request any of a variety of such searches; the
results of each search are displayed separately as hyperlinked structured text. GAIA (Genome Annotation
and Information Analysis) (L.C. Bailey, J. Schug, S. Fischer, M. Gibson, J. Crabtree, D.B. Searls, and G.C.
Overton, in prep.), which is being developed at the University of
Pennsylvania, is perhaps the most similar system to Genotator in terms
of its goals, organization, and features. Sequences are submitted to
ATLAS, the data management portion of GAIA, and then annotated
automatically by CARTA. The annotated sequence is displayed with Java
applets (based on the bioWidget components). Although GAIA calls only
one exon prediction program (GRAIL), rather than several as Genotator
does, GAIA includes some types of features (e.g., poly(A) signals) that
are not reported by Genotator.
Recently there has been interest in developing Java displays for visualization of sequences and related information. Groups working on such displays include the Berkeley Drosophila Genome group, (G. Helt and G. Rubin, unpubl.), European Molecular Biology Laboratory (EMBL), and the Computational Biology and Informatics Laboratory at the University of Pennsylvania. The bioWidget Consortium (http://www.biowidgets.org/) involves some of the groups interested in collaborative development of Java displays for bioinformatics. Most of this work has focused on graphical displays rather than on back-end software for sequence analysis. Genotator offers a combined system, which runs a sequence through various analysis tools and then displays the results. The next section describes how Genotator is organized.
| |
The Organization of Genotator |
|---|
Genotator consists of three main portions
a set of sequence
analysis programs, a database, and a graphical browser
as well as the
"glue" that links the three components (see Fig.
1). These components will be described briefly here
and in more detail in subsequent sections.
|
| |
Genotator's Database |
|---|
Genotator's database currently exists as a Unix directory
hierarchy of flat files. Each flat file contains one type of annotation (e.g., GRAIL exons) in a simple tabular format called ACE format (as
used by ACeDB databases; Durbin and Thierry-Mieg 1991
). ACE format was
chosen because it is easy for programs to parse and is also
human-readable (unlike some formats such as ASN1 that are difficult for
humans to parse). The directory hierarchy is organized as shown in
Figure 2.
|
| |
The Genotator Back End |
|---|
The Genotator back end runs several gene finders, homology
searches (using BLAST) (Altschul et al. 1990
), and signal searches, and
saves the results in .ace format. Genotator therefore automates the
tedious process of running a dozen different sequence analysis programs
with a dozen different input and output formats.
Out of the many available sequence analysis tools, I chose a reasonable subset to integrate into Genotator. Exclusion of some tools from Genotator's collection is not meant to imply that such tools are inferior. Offsite users who set up Genotator at their site (see on-line supplement Appendix A at http://cshl.org/gr for accessing Genotator, system requirements and programs used) are free to modify the code to integrate their favorite sequence analysis tools. Also, various laboratories are sequencing the DNA of various organisms; I set up Genotator to work on human or Drosophila (which are the organisms being sequenced at LBNL). Users can specify from which organism their sequence is (if left unspecified, human is assumed).
The analysis programs called by Genotator fall into three main
categories: the gene finders [Genie (Kulp et al. 1996
), GRAIL (Xu et
al. 1994
), GeneFinder (Green 1994
), xpound (Thomas and Skolnick 1994
),
and GeneMark (Borodovsky and McInich 1993)]; the database homology
searches [BLASTN (Altschul et al. 1990
) on dbEST and database of human
or Drosophila repeat sequences; BLASTX on GenPept (Benson et
al. 1993
)], and sequence feature predictors [start/stop codons, open
reading frames (ORFs), promoters (M.G. Reese and F.H. Eeckman,
unpubl.), splice sites (M.G. Reese, F.H. Eeckman, D. Kulp, and D. Haussler, unpubl.), and tRNA genes (Lowe and Eddy 1997
)]. The promoter
and splice site predictors and the Genie gene finder were developed by
members of our group at LBNL. Most of the other programs are freely
available (see Appendix A). For each analysis program, there is a perl
filter that parses the results, filters out the insignificant ones, and
saves the significant annotations in .ace files, from which they can be read by the browser.
Figure 3 shows a simplified view of the annotation process used by Genotator. First the incoming sequence is cleaned up (nonstandard characters are converted to Ns; long lines are broken up) and converted to FASTA format, which is used as the input format for many of the sequence analysis tools. The sequence is BLASTed against a database of human (or Drosophila) repeats and the repeats that are found are masked out with xblast. The masked sequence is then BLASTed against databases of expressed sequence tag (EST) sequences and GenPept (translated coding regions from GenBank). The BLAST hits are filtered and stored both in .ace format and in a file for Blixem (a BLAST hit viewer from the Sanger Centre). Issues having to do with BLAST hits are discussed in the next section.
|
The next phase of processing involves converting the sequence to the appropriate input format for each of five gene prediction tools, running the tools (using parameters appropriate for human or Drosophila sequence), and parsing the results. Stop codons and ORFs are also found and their positions recorded. Martin Reese's neural network programs are run to find potential promoters and splice sites. tRNAscan-SE is run to look for potential tRNA genes (although these are found so rarely that they are not displayed in the graphical output).
| |
Filtering BLAST Hits |
|---|
When using BLAST (or any other sequence homology search program)
to compare a sequence with a large database of known sequences, one
must tackle the issue of identifying the biologically significant hits
while minimizing the number of uninteresting hits that must be waded
through. The best way to do this is still being debated. We have
arrived at some compromise solutions. BLASTX hits against GenPept are
run through a BLAST hit postprocessor, MSPcrunch (Sonnhammer and Durbin
1994
), which eliminates some of the less significant hits and does some
assembly of fragmented hits. Processing the BLASTX hits with MSPcrunch
also enables us to browse the hits with Blixem (Sonnhammer and Durbin
1994
), a BLAST hit viewer can be invoked from Genotator. A disadvantage
of running MSPcrunch on the hits is that the actual alignments of the
hits (showing all the bases that matched) are not preserved. Mostly
because of this limitation, we chose not to use MSPcrunch on the BLASTN output, but rather to filter out insignificant BLASTN hits by setting a
minimum percentage sequence identity (which can be changed by the user
when a sequence is run through Genotator).
Another step we take to try to maximize the information content of the
reported BLAST hits is to search first for repeat sequences (such as
Alu repeats, which are ubiquitous throughout the human genome;
there are also repeat sequences found in the genomes of other
organisms). The repeat sequences are then masked with xblast (Claverie
and States 1993
), and the other BLASTs are run on the masked sequence
so that the hits that are found do not include repeat sequences.
| |
The Genotator Front End |
|---|
The front end is described in the next two sections, Running Genotator and The Genotator Browser.
| |
Running Genotator |
|---|
Genotator can be run via command-line arguments or with the easy-to-use graphical user interface (GUI) shown in Figure 4. The GUI is written in Tkperl.
|
The GUI is designed to minimize the number of choices the user has to make; in most cases, the user can simply click "Start annotation" and everything will proceed automatically.
The command-line interface is useful when the user wants to annotate several sequences at the same time. It can be invoked with no arguments to run the standard analyses, or it can be called with various command-line options to alter its behavior.
We have also developed a Web front end to Genotator that looks much like the GUI. Like the other approaches to running Genotator, the Web interface allows the user to specify a sequence to be annotated and to select which analyses are to be performed. Once a sequence is submitted, the back end runs the analyses as usual and saves the results in the database, where they can be viewed with the Genotator browser.
When Genotator is invoked, its first step is to check the availability of all the sequence analysis programs it knows about. Any that are missing are not offered to the user as choices. Genotator can run with any subset of the suite of sequence analysis programs it is capable of calling. It is written in such a way that new analysis tools can be integrated fairly easily. (Integrating a new tool would involve creating filters to convert the input and output formats, and adding new functions to the back end and front end to run the tool and display the results.)
| |
The Genotator Browser |
|---|
After a sequence has been run through Genotator, the Genotator browser provides an interactive graphical view of the annotations. The main display in the browser shows a horizontal axis representing the sequence, with forward-strand annotations displayed above the axis and reverse-strand annotations below the axis. Each type of annotation (for example, GRAIL exons) is displayed in its own row, in its own color.
The Genotator browser is built on top of the bioTkperl widgets (G. Helt, unpubl.) developed by Gregg Helt of the UC Berkeley Drosophila
Genome Center, [which were in turn inspired by the bioTk widgets
developed by David Searls (Searls 1995
)]. It can be invoked with the
name of an annotated sequence file as an argument. If it is invoked
with no arguments, a list of annotated sequences is displayed, with the
sequences annotated by the invoking user listed first. Once a sequence
has been selected, all of its annotations are loaded and displayed in
the map display.
| |
Map Display |
|---|
As described above, the map display shows color-coded sequence
annotations for both strands. The display can be zoomed and scrolled to
examine interesting regions in more detail. Clicking on an
annotation rectangle displays additional information in the text
window at the top of the browser. This includes the start and end
positions of the annotation, possibly a score, and other information. For example, if a BLAST hit is clicked, the text window might read, "BLASTX GenPept hit from 864 to 1112 with
sequence gp
K01228
HUMCG1PA1_1 (33%
identity)." This concise description identifies the database
sequence that was hit (gp
K01228
HUMCG1PA1_1 is its GenPept ID), the region that was found to be similar to this
database sequence (bases 864-1112), and the percentage sequence identity for the hit (33%).
BLAST hits can be double-clicked to view them in more detail. For BLASTN hits (against nucleotide sequences), the complete alignment pops up in a separate window. BLASTX hits against GenPept can be viewed in Blixem.
In Figure 5 the Genotator browser is shown displaying
the annotations on HUMTFPB (Mackman et al. 1989
), a human tissue factor gene sequence obtained from GenBank. (Splice site predictions and
start/stop codons are not displayed until they are explicitly turned
on.) (A hyperlinked version of Appendix A and color versions of Figures
5, 6, and 8, below, are available at http://cshl.org/gr.)
|
|
|
In Figure 5, the user has clicked on one of the red GenPept BLAST hits. The browser put a black frame around the hit and printed information about the hit in the box labeled "Annotation."
| |
Sequence Display |
|---|
The Genotator browser can display the actual DNA sequence (or its complement) in a separate window; this is shown in Figure 6. Interaction between the map and sequence displays is bidirectional. When a user selects an annotation in the map display, the corresponding region is highlighted in the appropriate color in the sequence display. Here, for example, the selected GenPept hit is highlighted in red in the sequence display. When a region is selected in the sequence display, it is boxed in the map display.
| |
Genotator's Display Helps Users Identify Regions of Interest |
|---|
One of the advantages of Genotator's graphical display is that it is quickly apparent which sequence regions are likely to be interesting. The arrangement of the display also allows users to assess the relative significance of predictions. For example, if one gene finder predicts an exon in some region, but there are no other exon predictions or BLAST hits in that region, it is unlikely to be a true exon. On the other hand, a sequence region for which all of the gene finders predict an exon, and that contains a BLAST hit to an EST or GenPept sequence, can be judged likely to be a true exon.
The lineup of exon predictions displayed by Genotator was the
inspiration for GeneNomi (N.L. Harris, M.G. Reese, and F.H. Eeckman,
unpubl.), a method for combining information from several different
predictions to make conservative exon predictions. GeneNomi starts with
the exons predicted by Genotator's suite of gene finders, takes the
overlapping portions of the predicted exons (which are weighted by the
measured accuracy of the gene finding method used to predict each
exon), and refines the end points of the consensus exons by looking for
splice sites or start/stop codons. GeneNomi was tested on a
standardized data set of 305 "clean" gene sequences carefully
selected from GenBank (Kulp et al. 1996
). By combining several sources
of information, GeneNomi was able to come up with slightly better
predictions than the best gene finder used by itself. The fact that its
predictions were only a slight improvement suggests that we are not yet
at the point where a single consensus exon prediction would inspire
confidence. It is more useful for a biologist to see the predictions of
all of the gene finders lined up (plus the BLAST hits, splice sites,
and other supporting features) and to make an informed decision about
which exons are most believable. (GeneNomi was developed for research
purposes; its predictions are not currently included in the Genotator
display.)
After using the Genotator browser to identify probable exons or other interesting features, biologists may choose to confirm these predictions at the bench. (For their convenience, Genotator can also select primers.) By looking at Genotator's predictions, one may minimize the number of sequence regions that need to be checked.
| |
Other Features of Genotator |
|---|
Genotator offers several features (besides those already described, such as scrolling and zooming) that enhance the functionality of the map and sequence displays. The user can ask Genotator to generate a text report of all annotations. Any selected region can be written out in FASTA format for further analysis. Other features are discussed in the next four sections. (Graphical illustrations of these features can bee seen in the on-line version of this manuscript at http://www.cshl.org/gr.)
| |
Adding Personal Annotations |
|---|
The Genotator browser allows users to add new annotations to either the map or the sequence display. These personal annotations are saved along with the precomputed annotations. Figure 7 shows the interface for dealing with personal annotations. To add a personal annotation to the map or sequence display, the user selects some region of the sequence, types the annotation text in the text box, and then clicks "Add Annotation to Map" or "Add Annotation to Sequence." The color of each personal annotation can be specified independently. Clicking on the button that says "forestgreen" brings up a menu of color choices.
|
Annotations that refer to a sizable portion of the sequence are generally added to the map; those referring to a small region (such as a primer) are more appropriately added to the sequence. All personal annotations are saved in the database along with the automatically generated annotations. Examples of personal annotations can be seen in the map display in Figure 5 ("Personal annotation" and "Reverse strand annotation") and the sequence display in Figure 6 ("personal annotation in sequence").
| |
Exploring ORFs |
|---|
Genotator shows ORFs in the map display with different offsets and colors for each frame (1, 2, or 3). Additionally, the sequence display can be made to highlight stop codons in one or all frames, in three colors corresponding to their frame.
| |
Primer Selection |
|---|
To help the user design primers for a region of interest,
Genotator can call Primer3, a primer selection program developed at the
Whitehead Institute (Rozen and Skaletsky 1996
). Genotator users can
select a sequence region, select "Design Primers" from the menu,
and change any of the default Primer3 options if desired. Once the user
is satisfied with the option settings, the best forward and reverse
primers are printed to the terminal (so that they can be cut and pasted
into a primer order form) and are also indicated in the sequence
display.
| |
Searching for Patterns |
|---|
Another feature lets users look for sequence patterns (such as restriction sites) or regular expressions in a sequence. For example, suppose you wanted to find all instances of an A followed by either or a C or a G followed by one or more Ts followed by an A. The Unix-style regular expression for that pattern is "A[CG]T+A." Genotator will locate and highlight all subsequences that match the specified pattern.
| |
Genotator Applications |
|---|
Genotator is being used by several groups at LBNL, as well as off-site at Stanford, Berkeley, and other universities and genome centers. At LBNL, the primary users have included biologists who are looking for genes in newly sequenced regions of the human genome, as well as researchers (including the developers of Genie) who are investigating new methods of gene finding. Genotator provides an intuitive visual way to compare the performance of various gene finders. If the sequence being studied already has known exons, the predictions of the gene finders can easily be checked against these exons.
Figure 8 shows a sequence that a group of biologists at LBNL (C. Collins and T. Cloutier, unpubl.) annotated using Genotator. The sequence, h78_1_c10, is a 3523-bp subclone from human chromosome 7. Personal annotations have been added to indicate regions of interest. The regions marked GCAP indicate where homologies to two retinal guanylyl cyclase activator proteins were found coincident to predicted exons. These predicted exons may therefore belong to some new gene associated in some way with the photoreceptor membrane.
| |
Client-Server Architecture for Genotator |
|---|
We are working on a client-server version of Genotator that will enable remote users to annotate sequences via a flexible, transparent distributed architecture. This architecture will be described in more detail elsewhere.
| |
Conclusions |
|---|
As more and more genomic sequence data have become available, interest in sequence annotation has grown. A number of researchers are working on automatic annotation. Genotator is one approach to sequence annotation. Its back end automates the tedious process of running multiple sequence analysis programs, and its front end is an interactive graphical annotation browser that offers insight into the possible significance of a new sequence. Genotator is being tested by biologists and computer scientists at LBNL and elsewhere, and it has been found to be a useful tool both for annotating sequences and for studying gene-finding methods.
| |
ACKNOWLEDGMENTS |
|---|
I am grateful for the assistance of Martin Reese, who wrote several of the sequence analysis programs called by Genotator, helped me debug earlier versions of Genotator, and offered insightful suggestions on this paper; Gregg Helt, who wrote the bioTkperl widgets as well as a graphical browser (AnnotP1) that inspired many of the features in the Genotator browser; Frank Eeckman, who supported this work as group leader of the Human Genome Informatics group and also commented on a draft of this paper; Tom Cloutier and Kelly Frazer, who as active Genotator users have offered many useful suggestions about features and helped me shake out some bugs; Colin Collins, who enthusiastically supported the use of Genotator in his group and allowed me to use his data in my example; David Kulp, who co-authored Genie and also helped test Genotator; Suzanna Lewis, director of the Berkeley Drosophila Genome Project informatics group; and Judith R. Harris, who suggested the name Genotator.
Genotator is available free of charge to non-profit institutions only. For information on obtaining it, please write to the author (nlharris{at}lbl.gov). See Appendix A at http://cshl.org/gr for system requirements.
| |
FOOTNOTES |
|---|
E-MAIL nlharris{at}lbl.gov; FAX (510) 486-4711.
| |
REFERENCES |
|---|
|
|
|---|
An integrated interface to molecular biology data base search and analysis services available on the World Wide Web.
Genome Res.
6:
454-462 [Medline].Received January 23, 1997; accepted in revised form May 28, 1997.
This article has been cited by other articles:
![]() |
K. Bryson, V. Loux, R. Bossy, P. Nicolas, S. Chaillou, M. van de Guchte, S. Penaud, E. Maguin, M. Hoebeke, P. Bessieres, et al. AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system Nucleic Acids Res., July 19, 2006; 34(12): 3533 - 3545. [Abstract] [Full Text] [PDF] |
||||
![]() |
M S Fernandes, V Pierron, D Michalovich, S Astle, S Thornton, H Peltoketo, E W-F Lam, B Gellersen, I Huhtaniemi, J Allen, et al. Regulated expression of putative membrane progestin receptor homologues in human endometrium and gestational tissues J. Endocrinol., October 1, 2005; 187(1): 89 - 101. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. H. Van Domselaar, P. Stothard, S. Shrivastava, J. A. Cruz, A. Guo, X. Dong, P. Lu, D. Szafron, R. Greiner, and D. S. Wishart BASys: a web server for automated bacterial genome annotation Nucleic Acids Res., July 1, 2005; 33(suppl_2): W455 - W459. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lu, D. Szafron, R. Greiner, D. S. Wishart, A. Fyshe, B. Pearcy, B. Poulin, R. Eisner, D. Ngo, and N. Lamb PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization Nucleic Acids Res., January 1, 2005; 33(suppl_1): D147 - D153. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Szafron, P. Lu, R. Greiner, D. S. Wishart, B. Poulin, R. Eisner, Z. Lu, J. Anvik, C. Macdonell, A. Fyshe, et al. Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations Nucleic Acids Res., July 1, 2004; 32(suppl_2): W365 - W371. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Riethman, A. Ambrosini, C. Castaneda, J. Finklestein, X.-L. Hu, U. Mudunuri, S. Paul, and J. Wei Mapping and Initial Analysis of Human Subtelomeric Sequence Assemblies Genome Res., January 1, 2004; 14(1): 18 - 28. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. BAO, S. R. BABU, C. M. ROBERTS, A. K. MARTIN, K. GOWAN, G. S. EISENBARTH, and P. R. FAIN Single Nucleotide Polymorphism Study of IDDM 17 in a Bedouin Arab Family Ann. N.Y. Acad. Sci., November 1, 2003; 1005(1): 348 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Liu, T. I. Bonner, T. Nguyen, J. L. Lyons, S. L. Christian, and E. S. Gershon DNannotator: annotation software tool kit for regional genomic sequences Nucleic Acids Res., July 1, 2003; 31(13): 3729 - 3735. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Orlacchio, T. Kawarai, E. Rogaeva, Y.Q. Song, A.D. Paterson, G. Bernardi, and P.H. St. George-Hyslop Clinical and genetic study of a large Italian family linked to SPG12 locus Neurology, November 12, 2002; 59(9): 1395 - 1401. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. D. Stein, C. Mungall, S. Shu, M. Caudy, M. Mangone, A. Day, E. Nickerson, J. E. Stajich, T. W. Harris, A. Arva, et al. The Generic Genome Browser: A Building Block for a Model Organism System Database Genome Res., October 1, 2002; 12(10): 1599 - 1610. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. Wilson, C. Riemer, D. W. Martindale, P. Schnupf, A. P. Boright, T. L. Cheung, D. M. Hardy, S. Schwartz, S. W. Scherer, L.-C. Tsui, et al. Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5 Nucleic Acids Res., March 15, 2001; 29(6): 1352 - 1365. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Endrizzi, V. Hadinoto, J. D. Growney, W. Miller, and W. F. Dietrich Genomic Sequence Analysis of the Mouse Naip Gene Array Genome Res., August 1, 2000; 10(8): 1095 - 1102. [Abstract] [Full Text] |
||||
![]() |
N. Jareborg and R. Durbin Alfresco---A Workbench for Comparative Genomic Sequence Analysis Genome Res., August 1, 2000; 10(8): 1148 - 1157. [Abstract] [Full Text] |
||||
![]() |
B. C. Schutte, B. C. Bjork, K. B. Coppage, M. I. Malik, S. G. Gregory, D. J. Scott, L. M. Brentzell, Y. Watanabe, M. J. Dixon, and J. C. Murray A Preliminary Gene Map for the Van der Woude Syndrome Critical Region Derived from 900 kb of Genomic Sequence at 1q32-q41 Genome Res., January 1, 2000; 10(1): 81 - 94. [Abstract] [Full Text] |
||||
![]() |
L. Florea, G. Hartzell, Z. Zhang, G. M. Rubin, and W. Miller A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence Genome Res., September 1, 1998; 8(9): 967 - 974. [Abstract] [Full Text] |
||||
![]() |
C. Collins, J. M. Rommens, D. Kowbel, T. Godfrey, M. Tanner, S.-i. Hwang, D. Polikoff, G. Nonet, J. Cochran, K. Myambo, et al. Positional cloning of ZNF217 and NABC1: Genes amplified at 20q13.2 and overexpressed in breast carcinoma PNAS, July 21, 1998; 95(15): 8703 - 8708. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. C. Bailey Jr., S. Fischer, J. Schug, J. Crabtree, M. Gibson, and G. C. Overton GAIA: Framework Annotation of Genomic Sequence Genome Res., March 1, 1998; 8(3): 234 - 250. [Abstract] [Full Text] |
||||
![]() |
J. Jiang and H. J. Jacob EbEST: An Automated Tool Using Expressed Sequence Tags to Delineate Gene Structure Genome Res., March 1, 1998; 8(3): 268 - 275. [Abstract] [Full Text] |
||||
![]() |
M. K. Lee, E. D. Lynch, and M.-C. King SeqHelp: A Program to Analyze Molecular Sequences Utilizing Common Computational Resources Genome Res., March 1, 1998; 8(3): 306 - 312. [Abstract] [Full Text] |
||||
![]() |
M. A. Ansari-Lari, J. C. Oeltjen, S. Schwartz, Z. Zhang, D. M. Muzny, J. Lu, J. H. Gorrell, A. C. Chinault, J. W. Belmont, W. Miller, et al. Comparative Sequence Analysis of a Gene-Rich Cluster at Human Chromosome 12p13 and its Syntenic Region in Mouse Chromosome 6 Genome Res., January 1, 1998; 8(1): 29 - 40. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||