Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Published online before print May 12, 2004, 10.1101/gr.1917404
Genome Res. 14:1147-1159, 2004
©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
1917404v1
14/6/1147    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Chevreux, B.
Right arrow Articles by Suhai, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chevreux, B.
Right arrow Articles by Suhai, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Methods

Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs

Bastien Chevreux1,7, Thomas Pfisterer2, Bernd Drescher3, Albert J. Driesel4, Werner E.G. Müller5, Thomas Wetter6 and Sándor Suhai1

1 Department of Molecular Biophysics, German Cancer Research Centre Heidelberg, 69120 Heidelberg, Germany 2 MWG Biotech AG, 85560 Ebersberg, Germany 3 RZPD German Resource Center for Genome Research, 14059 Berlin, Germany 4 VitiGen AG, 76833 Siebeldingen, Germany 5 Abteilung Angewandte Molekularbiologie, Institut für Physiologische Chemie, Universität Mainz, 55099 Mainz, Germany 6 Institute for Medical Biometry and Informatics, University of Heidelberg, 69120 Heidelberg, Germany

We present an EST sequence assembler that specializes in reconstruction of pristine mRNA transcripts, while at the same time detecting and classifying single nucleotide polymorphisms (SNPs) occuring in different variations thereof. The assembler uses iterative multipass strategies centered on high-confidence regions within sequences and has a fallback strategy for using low-confidence regions when needed. It features special functions to assemble high numbers of highly similar sequences without prior masking, an automatic editor that edits and analyzes alignments by inspecting the underlying traces, and detection and classification of sequence properties like SNPs with a high specificity and a sensitivity down to one mutation per sequence. In addition, it includes possibilities to use incorrectly preprocessed sequences, routines to make use of additional sequencing information such as base-error probabilities, template insert sizes, strain information, etc., and functions to detect and resolve possible misassemblies. The assembler is routinely used for such various tasks as mutation detection in different cell types, similarity analysis of transcripts between organisms, and pristine assembly of sequences from various sources for oligo design in clinical microarray experiments.


Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1917404. Article published online before print in May 2004.

7 Corresponding author.
E-MAIL bastien{at}chevreux.org; FAX +49 6227 422333.

8 Contig as a short form of contiguous sequence, a term first coined for assembly of genomic data.

9 For example, could the base A at position 235 in read 1 be replaced by a G? (because the overall consensus at this position of the other reads suggests this possibility).

10 For example, quality clipping, sequencing vector, and cosmid vector removal can be controlled by the PREGAP4 environment provided with the GAP4 package (Bonfield et al. 1995; Bonfield and Staden 1996; Staden 1996) or the LUCY program from Chou and Holmes (2001); parts of these tasks can also be done with cross-match provided by the PHRAP package or other packages such as, for example, PFP from Paracel (Paracel 2002a).

11 Of course, a single read itself cannot be called a contig. However, putting it into the same data structure (a contig object) like the other, assembled reads is a convenient way to keep unassembled reads in the internal assembly database.

12 Based mainly on redundancy information in suspicious sequence stretches, using base-call error probabilities and signal analysis capabilities of the automatic editor very sparsely.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan
A hitchhiker's guide to expressed sequence tag (EST) analysis
Brief Bioinform, January 1, 2007; 8(1): 6 - 21.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. Eisenberg, K. Adamsky, L. Cohen, N. Amariglio, A. Hirshberg, G. Rechavi, and E. Y. Levanon
Identification of RNA editing sites in the SNP database
Nucleic Acids Res., August 12, 2005; 33(14): 4612 - 4617.
[Abstract] [Full Text] [PDF]


Home page
Integr. Comp. Biol.Home page
R. E. Steele
Genomics of Basal Metazoans
Integr. Comp. Biol., August 1, 2005; 45(4): 639 - 648.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Malde, E. Coward, and I. Jonassen
A graph based algorithm for generating EST consensus sequences
Bioinformatics, April 15, 2005; 21(8): 1371 - 1375.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.
Copyright © 2004 by Cold Spring Harbor Laboratory Press.