|
|
|
|
Vol. 10, Issue 4, 516-522, April 2000
METHODS
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using Fgenes (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify ~90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The Fgenesh program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf.html.
| |
INTRODUCTION |
|---|
|
|
|---|
Many bacterial, as well as several eukaryotic,
complete genomes have been sequenced, and Drosophila, mouse,
and human genome sequencing is being pursued aggressively. The first
challenge in analyzing sequence data is finding the genes. Knowledge of gene sequences has led to a new way of performing biological studies called functional genomics. The second major challenge is to find out
what all of these new genes do, how they interact, and how they are
regulated (Wadman 1998
). Comparisons among genes of different genomes
can provide additional insight into the details of gene structure and
function. To meet these challenges we need advanced gene-finding
algorithms and computer systems utilizing all available information,
such as similarity with known proteins or ESTs to increase the accuracy
of genome annotation. We cannot precisely predict all gene components
because of limitations in our knowledge of complex biological processes
and signals regulating gene expression. In this respect, the analysis
of 2.9 Mb of Drosophila sequence by several gene-finding
approaches gives us a unique opportunity to define the reliability and
limitations of our predictions and provides a strategy for the
interpretation of predicted results in the analysis of new genomic
sequences. Current gene identification approaches (Burge and Karlin
1998
) use dynamic programming and pattern-based or probabilistic scheme
for scoring potential gene variants. They employ the best signal and
content recognizers and an optimization technique developed previously
(Burge and Karlin 1977
; Brunak et al. 1991
; Fickett and Tung 1992
;
Guigó et al. 1992
; Snyder and Stormo 1993
; Krogh et al. 1994
;
Stormo and Haussler 1994
; Solovyev et al. 1994
). We tested two gene
prediction approaches developed in our group, Fgene
(pattern based human gene prediction) and Fgenesh (hidden
Markov model(HMM)) based gene prediction with Drosophila gene
parameters. The optimal strategy to annotate long genomic sequences and
predict new genes was investigated. The best results were produced by
organism-specific Fgenesh program that can accurately
predict ~80% of verified exons. The overpredicted exons (~10%)
can be false positives or belong to genes that do not have
corresponding ESTs or proteins and have not been predicted by
GENSCAN. Some of them represent the retroviruses genes
which we included in our annotation.
| |
METHODS |
|---|
|
|
|---|
For identification of potential protein-coding regions in the
Adh region of Drosophila sequence we have used three
gene prediction programs (Fgenes, Fgenesh,
and Fgenesh+) developed in our group. Fgenesh
is a HMM based algorithm with the parameters trained on the set 1600 of
Drosophila genes annotated in GenBank (Benson et al. 1999
).
Fgenesh+ is a variant of Fgenesh that takes
into account some information about similar proteins.
Fgenes is the program based on discriminant functions
trained to predict human genes. We included the last program because we
observed that when predicting human genes, the same exons predicted by
the Fgenes and Fgenesh approaches are
accurate with a specificity of ~90%-95% (Solovyev and
Salamov,1999
). Therefore, using both programs we can find a subset of
reliable exons that can be used to start an experimental gene
verification study.
Our approach to the annotation was based on applying basic gene prediction tools and using the BLAST program to improve the accuracy of gene prediction when similar proteins are found for ab initio predicted exons. For most genes only small fragments of mRNA sequences are presented in databases and complete cDNAs are known only for a fraction of these genes. Our experience shows that the use of short EST fragments does not improve the accuracy of predictions. Therefore, we decided not to use EST information to make additional improvements testing the system to predict genes in which the information about the transcript sequences is practically absent. The suggested scheme is designed to expedite initial analysis of large-scale genomic sequences and can be the first step in a complex system that might apply additional information to improve the quality of gene annotation.
General Scheme of Analysis
| 1. | The large genomic sequence (2.9 Mb) was divided into six contiguous subsequences, ~0.5Mb each. Fgenesh and Fgenes were run on all regions of the sequence and the points of division were selected within the fragments, which were free of predicted genes. Fgenesh variants were developed to predict genes of a sequence of any practical length and applied to the analysis of human Chromosome 22 (http://genomic.sanger.ac.uk/inf/infodb.shtml). |
| 2. | Repetitive sequences were masked in the sequence using
RepeatMasker (Smit 1999 |
| 3. | Prediction of genes on masked sequences was done with Fgenes and Fgenesh. |
| 4. | For each predicted exon similarity searches were run using the
BLAST program (Altschul et al. 1997 |
| 5. | For genomic regions containing predicted exons with significant protein similarity, we recomputed gene predictions using a special program Fgenesh+. |
| 6. | The total pool of predicted genes was based on annotations, with priority given to the genes predicted by Fgenesh+ (i.e., we removed all predictions which overlapped with Fgenesh+ exons). |
We have presented three annotations to demonstrate different possibilities to use the predicted genes. The major CGG1 annotation comprised the nonambiguous gene set. The genes were included according to the following criteria (descending in priority): (1) All genes were predicted by Fgenesh+; (2) genes were predicted identically by both Fgenes and Fgenesh programs; and (3) in the regions of overlapped (but not exactly coincide) predictions, only one predicted gene was included with priority given to the genes producing longer proteins. The annotation CGG2 is intended to provide a subset of reliable exons. It comprised the set of all exons predicted by Fgenesh+ augmented by the exons, identically predicted by both programs (Fgenes and Fgenesh). The annotation CGG3 included all exon candidates predicted by Fgenesh+, Fgenes, and Fgenesh genes.
Gene Identification Programs
Fgenes
Fgenes (Find genes) is the multiple gene prediction program based on dynamic programming. It uses discriminant classifiers to generate a set of exon candidates. Similar discriminant functions were developed initially in Fexh (Find exon), Fgeneh (Find gene) programs (h stands for the version that analyzes human genes), and described in detail earlier (Solovyev and Lawrence 1993| 1. | Create a list of potential exons, selecting all ORF: ATG...GT, AG-GT, AG. . . . .Stop with exon scores higher than the specific thresholds depending on GC content (four groups); 2. Order all exon candidates according to their 3'-end positions; 3. Select for each exon maximal score path (compatible exons combination) ending on the particular exon using dynamic programming approach similar to that of Guigó (1999); 4. Add promoter or poly(A) scores (if predicted) to terminal exons. Run time of the algorithm grows approximately linearly with the sequence length. |
Fgenesh
Fgenesh is the HMM-based gene-finding program with the algorithm similar to Genie (Kulp et al. 1996Fgenesh+
Fgenesh+ is a version of Fgenesh, which uses additional information from the available protein homolog. When exons predicted by Fgenesh show high similarity to a protein from the database, it is often advantageous to use this information to improve the prediction accuracy. Fgenesh+ requires an additional file with protein homolog and aligns all predicted potential exons with that protein using the Smith-Waterman algorithm, as implemented in the sim program (Huang and Miller 1991
CC < 0.90) and
which had protein homologs from another organism. The identity between encoded proteins and homologs varied between 99% and 40%. The prediction accuracy of this set is presented in Table
1. The results show that if the alignment covers the
significant parts of both proteins, Fgenesh+ usually
increases the accuracy relative to Fgenesh that is not
depending significantly on the level of identity (for ID >40%).
This property makes knowledge of proteins from even distant organisms
useful for improving the accuracy of gene identification.
|
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Fgenes predicted 384 genes (202 in reverse chain), with
3.9 being the average number of exons per gene. The average size of the
genes was 5.4 kb (from ATG to stop codon, including introns) and the
average size of intergenic regions was 7.6 kb. Of these genes, 207 had
sequence similarities on both protein and EST levels, 405 exons had
similarities with only proteins, and 335 exons had similarities with
only ESTs (with E-value < 10
5).
Fgenesh predicted 530 genes (269 in reverse chain), with
3.2 being the average number of exons per gene. The average size of the
genes was 2 kb and the average size of intergenic region was 5.5 kb. Of
these exons, 252 had sequence similarities on both protein and EST
levels, 601 exons had similarities only with proteins, and 390 exons
had similarities with only ESTs (with E-value < 10
5).
We used Fgenesh+ to improve the accuracy of prediction for
49 genes. Of these genes, 37 were predicted using D. melanogaster's own proteins already deposited in protein
databases. Analysis of these predictions demonstrates that even for
such cases, prediction of accurate gene structure may not be trivial,
although in most cases Fgenesh+ improved the prediction
accuracy relative to ab initio methods. For example, in the region of
the Beaten path protein (2505534-2530156 bp) Fgenesh
predicts three genes (Fig. 1A). The first and last
predicted genes have common exons with the real gene, but the second
predicted gene is in reverse strand and located inside of first intron.
Such splitting is probably caused by the relatively large size of the
first intron (~20 kb). Prediction becomes completely accurate using
a proper protein product of the gene (Fig. 1b). In total, from 49 genes, only 24 coincided completely with the genes annotated in std3 set. Four predicted retrovirus-related genes from transposons were not
annotated because annotators excluded transposon sequences. Of the
remaining 21 predicted genes, most agreed with annotations, with slight
discrepancies in one to two exons. Below are listed some cases where we
beleive our predictions are more correct than the annotations in the
std3 set.
| 1. | Kuzbanian gene (34,358-130,401), a disintegrin-like
metalloprotease (Rooke et al. 1996 |
| 2. | p38b gene (274,751-275,848), stress-activated MAP kinase (Han
et al. 1998 |
| 3. | Adhr gene (1,111,284-1,112,578) is alcohol
dehydrogenase-related gene (Brogna and Ashburner 1997 |
| 4. | TfIIS gene (1,549,142-1,550,149), RNA polymerase II
elongation factor (Marshall et al. 1990 |
|
Because the annotators used GENSCAN for unsupported
protein and EST gene identification, we can anticipate that the annotation contains some false-positive and false-negative predictions. In fact, 39 genes in std3 were annotated only on the basis of high
score predictions in GENSCAN (Ashburner et al. 1999
). Table 2 shows the prediction accuracy for our
annotation sets (based on combination of Fgenes and
Fgenesh), some of the best predictions from another groups
(taken from initial analysis of Reeves et al. 2000
), along with the
results for Fgenesh alone. Our results shows that
HMM-based Fgenesh program with Drosophila
parameters performed better than pattern-based Fgenes,
discriminant functions of which were developed for prediction of human
genes. Even though it was technically incorrect to use the human
version of Fgenes, we were able to demonstrate that
applying two different approaches for prediction can generate a set of
exons with expected properties.
| 1. | The main annotation CGG1 predicts ~87% of real coding nucleotides and ~23% of false positives (there might be some errors in coding due to the absence of experimental data in many regions); 89% of exons are predicted exactly or with overlapping exons. These data show that ab initio predictions can provide information about almost all of the protein coding genes (only 13% of the coding region was not predicted) and can serve as a basis for further experimental analysis. |
| 2. | The annotation CGG2 contains ~50% of coding exons but has ~20% fewer false positive exons. These exons can be used to start experimental gene verification. |
| 3. | The annotation CGG3 included ~70% of correct exons and 92% of all coding nucleotides. Such redundant annotation can be useful in identifying some genes with additional selection filters, (i.e., analysis of similarity with some important proteins or some experimental procedures). |
|
It is interesting to note that the use of two programs provided stable prediction accuracy on both (std1 and std3) sets. The Genie program demonstrated a 20% decrease in sensitivity (Table 2). Because we have no version of Fgenes with all parameters computed for Drosophila genes, we tried to find an optimal variant using one program. We discovered that the Fgenesh predictions provided the best accuracy. In this simple variant we took a set of predicted genes and discarded the low-scoring genes (with an average gene score <15). This resulted in 88% accurate coding nucleotide predictions with only 14% false positives on the std3 set (Table 2).
Our results demonstrate that most of the annotated genes in std3 are at
least partially covered by predictions. For example, only five genes
from std3 do not overlap with Fgenesh predictions (two of
them are also included in the std1 set). Of these five genes, four are
located inside introns of other genes; four are single-exon genes
(three are inside intron genes). Therefore, one of the limitations of
current gene-finding programs is that they cannot detect nested genes,
that is, genes located inside introns of other genes. This is one in
the future directions for improvement in gene-finding software.
Although this is probably a rare event for the human genome, for
organisms like Drosophila it presents a real problem. For
example, annotators identified 17 examples of such cases in the
Adh region. (Ashburner et al. 1999
). Another drawback of the
current gene-finding programs is that predictions of terminal exons are
generally much worse than the internal ones. This results in the
splitting up of some actual genes and/or joining some other multiple
genes into a single gene. Several examples of such situations can be
clearly seen in our Genes in Pictures interactive system
(Seledtsov and Solovyev 1999
) (Fig. 2) developed to
present information about gene structures described in GenBank
(collecting information about a gene from many entries) or annotated
using gene prediction programs. In total, on std3, 63% of internal
exons from the CGG1 annotation are predicted exactly, with
54% specificity, whereas the corresponding numbers for initial exons
are 58% and 44%, and for terminal exons 53% and 40%, respectively.
On std1 Fgenesh predicts all internal exons correctly
(100%), whereas only 72% of initial exons and 77% of terminal exons
are predicted correctly (Table 3). Thus, methods to
better predict terminal exons and the related problem of recognizing
the beginnings (transcription start sites) and endings [poly(A)
sites] of genes are other possible areas for improvement in the use of
gene-finding programs.
|
|
In conclusion, we note that even programs based on similar approaches
often produce significantly different results. For example, Fgenesh predicts 5839 exons on human Chromosome 22 (>88% of 3488 manually annotated exons having some EST or protein
similarity are among these predictions), whereas GENSCAN
predicts 6100 exons. Fgenesh predictions are presented in
Infogene database format (Solovyev and Salamov 1999
) at
http://genomic.sanger.ac.uk. Of these exons, ~80% are the same or
similar when comparing Fgenesh and GENSPAN
predictions and further experiments are necessary for verification.
A region that has been analyzed experimentally (but with low gene
density and unusually difficult for ab initio predictions) provides a
good test of the programs accuracy and demonstrates their differences.
The results of Fgenesh, Fgenes, and
GENSCAN gene predictions on the BRACA2 region
are presented in Table 4. We can see that the
repeat masked sequence results in fewer false-positive predictions,
especially for the GENSCAN program. Exons predicted by
different methods might represent alternative splicing variants.
|
| |
ACKNOWLEDGMENTS |
|---|
We thank Dr. Igor Seledtsov for collaborative work with Infogen visualization. Development of gene prediction approaches was supported by a Wellcome Trust research grant (to V.S.).
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL solovyev{at}sanger.ac.uk; FAX 44-1-2223-494919.
| |
REFERENCES |
|---|
|
|
|---|
Received February 9, 2000; accepted in revised form February 29, 2000.
This article has been cited by other articles:
![]() |
P. Gayral, J.-C. Noa-Carrazana, M. Lescot, F. Lheureux, B. E. L. Lockhart, T. Matsumoto, P. Piffanelli, and M.-L. Iskra-Caruana A Single Banana Streak Virus Integration Event in the Banana Genome as the Origin of Infectious Endogenous Pararetrovirus J. Virol., July 1, 2008; 82(13): 6697 - 6710. [Abstract] [Full Text] [PDF] |
||||
![]() |
Genome Information Integration Project And H-Invit The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts Nucleic Acids Res., January 11, 2008; 36(suppl_1): D793 - D799. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-F. Chen, M. Shi, F. Huang, and X.-x. Chen Characterization of two genes of Cotesia vestalis polydnavirus and their expression patterns in the host Plutella xylostella J. Gen. Virol., December 1, 2007; 88(12): 3317 - 3322. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Dixit, E. Ansseau, A. Tassin, S. Winokur, R. Shi, H. Qian, S. Sauvage, C. Matteotti, A. M. van Acker, O. Leo, et al. DUX4, a candidate gene of facioscapulohumeral muscular dystrophy, encodes a transcriptional activator of PITX1 PNAS, November 13, 2007; 104(46): 18157 - 18162. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. DeCaprio, J. P. Vinson, M. D. Pearson, P. Montgomery, M. Doherty, and J. E. Galagan Conrad: Gene prediction using conditional random fields Genome Res., September 1, 2007; 17(9): 1389 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Coghlan and R. Durbin Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron exon structure Bioinformatics, June 15, 2007; 23(12): 1468 - 1475. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sato, Y. Nakamura, E. Asamizu, S. Isobe, and S. Tabata Genome Sequencing and Genome Resources in Model Legumes Plant Physiology, June 1, 2007; 144(2): 588 - 593. [Full Text] [PDF] |
||||
![]() |
B. Palenik, J. Grimwood, A. Aerts, P. Rouze, A. Salamov, N. Putnam, C. Dupont, R. Jorgensen, E. Derelle, S. Rombauts, et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation PNAS, May 1, 2007; 104(18): 7705 - 7710. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhu and C. R. Buell Improvement of whole-genome annotation of cereals through comparative analyses Genome Res., March 1, 2007; 17(3): 299 - 310. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhao, J. H. Thomas, N. Chen, J. A. Sheps, and D. L. Baillie Comparative Genomics and Adaptive Selection of the ATP-Binding-Cassette Gene Family in Caenorhabditis Species Genetics, March 1, 2007; 175(3): 1407 - 1418. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Wang, S. Wang, Y. Li, M. S. R. Paradesi, and S. J. Brown BeetleBase: the model organism database for Tribolium castaneum Nucleic Acids Res., January 12, 2007; 35(suppl_1): D476 - D479. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Foret and R. Maleszka Function and evolution of a gene family encoding odorant binding-like proteins in a social insect, the honey bee (Apis mellifera) Genome Res., November 1, 2006; 16(11): 1404 - 1413. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Rehmeyer, W. Li, M. Kusaba, Y.-S. Kim, D. Brown, C. Staben, R. Dean, and M. Farman Organization of chromosome ends in the rice blast fungus, Magnaporthe oryzae Nucleic Acids Res., October 18, 2006; 34(17): 4685 - 4701. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Bruggmann, A. K. Bharti, H. Gundlach, J. Lai, S. Young, A. C. Pontaroli, F. Wei, G. Haberer, G. Fuks, C. Du, et al. Uneven chromosome contraction and expansion in the maize genome Genome Res., October 1, 2006; 16(10): 1241 - 1251. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Tuskan, S. DiFazio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hellsten, N. Putnam, S. Ralph, S. Rombauts, A. Salamov, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, September 15, 2006; 313(5793): 1596 - 1604. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. O. Ortiz, J. F. Etchberger, S. L. Posy, C. Frokjaer-Jensen, S. Lockery, B. Honig, and O. Hobert Searching for Neuronal Left/Right Asymmetry: Genomewide Analysis of Nematode Receptor-Type Guanylyl Cyclases Genetics, May 1, 2006; 173(1): 131 - 149. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, X. Tang, Z. Cheng, L. Mueller, J. Giovannoni, and S. D. Tanksley Euchromatin and Pericentromeric Heterochromatin: Comparative Composition in the Tomato Genome Genetics, April 1, 2006; 172(4): 2529 - 2540. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Gundersen-Rindal and M. J. Pedroni Characterization and transcriptional analysis of protein tyrosine phosphatase genes and an ankyrin repeat gene of the parasitoid Glyptapanteles indiensis polydnavirus in the parasitized host J. Gen. Virol., February 1, 2006; 87(2): 311 - 322. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Chan, G. Pertea, F. Cheung, D. Lee, L. Zheng, C. Whitelaw, A. C. Pontaroli, P. SanMiguel, Y. Yuan, J. Bennetzen, et al. The TIGR Maize Database Nucleic Acids Res., January 1, 2006; 34(suppl_1): D771 - D776. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Galagan, M. R. Henn, L.-J. Ma, C. A. Cuomo, and B. Birren Genomics of the fungal kingdom: Insights into eukaryotic biology Genome Res., December 1, 2005; 15(12): 1620 - 1631. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Haberer, S. Young, A. K. Bharti, H. Gundlach, C. Raymond, G. Fuks, E. Butler, R. A. Wing, S. Rounsley, B. Birren, et al. Structure and Architecture of the Maize Genome Plant Physiology, December 1, 2005; 139(4): 1612 - 1624. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. F. Odenwald, W. Rasband, A. Kuzin, and T. Brody EVOPRINTER, a multigenomic comparative tool for rapid identification of functionally important DNA PNAS, October 11, 2005; 102(41): 14700 - 14705. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Juretic, D. R. Hoen, M. L. Huynh, P. M. Harrison, and T. E. Bureau The evolutionary fate of MULE-mediated duplications of host gene fragments in rice Genome Res., September 1, 2005; 15(9): 1292 - 1297. [Abstract] [Full Text] [PDF] |
||||
![]() |
The Rice Chromosome 3 Sequencing Consortium Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species Genome Res., September 1, 2005; 15(9): 1284 - 1291. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Pan, L. Stein, and V. Brendel SynBrowse: a synteny browser for comparative sequence analysis Bioinformatics, September 1, 2005; 21(17): 3461 - 3468. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Radvanyi, D. Singh-Sandhu, S. Gallichan, C. Lovitt, A. Pedyczak, G. Mallo, K. Gish, K. Kwok, W. Hanna, J. Zubovits, et al. The gene associated with trichorhinophalangeal syndrome in humans is overexpressed in breast cancer PNAS, August 2, 2005; 102(31): 11005 - 11010. [Abstract] [Full Text] [PDF] |
||||