Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Batzoglou, S.
Right arrow Articles by Lander, E. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Batzoglou, S.
Right arrow Articles by Lander, E. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 12, Issue 1, 177-189, January 2002

METHODS
ARACHNE: A Whole-Genome Shotgun Assembler

Serafim Batzoglou,1,2,3 David B. Jaffe,2,3,4 Ken Stanley,2 Jonathan Butler,2 Sante Gnerre,2 Evan Mauceli,2 Bonnie Berger,1,5 Jill P. Mesirov,2 and Eric S. Lander2,6,7

1 Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; 2 Whitehead Institute/MIT Center for Genome Research, Cambridge, Massachusetts 02141, USA; 4 Department of Mathematics and Statistics, University of Nebraska, Lincoln, Nebraska 68588, USA; 5 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; 6 Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing ~10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and D. melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the D. melanogaster genome yielded ~98% coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of ~1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory.


3 These authors contributed equally to this work.

7 Corresponding author.


12:177-189 ©2002 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/02 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
J. Bacteriol.Home page
W. M. Sattley, M. T. Madigan, W. D. Swingley, P. C. Cheung, K. M. Clocksin, A. L. Conrad, L. C. Dejesa, B. M. Honchak, D. O. Jung, L. E. Karbach, et al.
The Genome of Heliobacterium modesticaldum, a Phototrophic Representative of the Firmicutes Containing the Simplest Photosynthetic Apparatus
J. Bacteriol., July 1, 2008; 190(13): 4687 - 4696.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, and D. B. Jaffe
ALLPATHS: De novo assembly of whole-genome shotgun microreads
Genome Res., May 1, 2008; 18(5): 810 - 820.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
D. R. Zerbino and E. Birney
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Genome Res., May 1, 2008; 18(5): 821 - 829.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Denisov, B. Walenz, A. L. Halpern, J. Miller, N. Axelrod, S. Levy, and G. Sutton
Consensus generation and variant detection by Celera Assembler
Bioinformatics, April 15, 2008; 24(8): 1035 - 1040.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J.-H. Choi, S. Kim, H. Tang, J. Andrews, D. G. Gilbert, and J. K. Colbourne
A machine-learning approach to combined evidence validation of genome assemblies
Bioinformatics, March 15, 2008; 24(6): 744 - 750.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
M. J. Chaisson and P. A. Pevzner
Short read fragment assembly of bacterial genomes
Genome Res., February 1, 2008; 18(2): 324 - 330.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. V. Zimin, D. R. Smith, G. Sutton, and J. A. Yorke
Assembly reconciliation
Bioinformatics, January 1, 2008; 24(1): 42 - 45.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. U. Pontius, J. C. Mullikin, D. R. Smith, Agencourt Sequencing Team, K. Lindblad-Toh, S. Gnerre, M. Clamp, J. Chang, R. Stephens, B. Neelam, et al.
Initial sequence and comparative analysis of the cat genome
Genome Res., November 1, 2007; 17(11): 1675 - 1689.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. C. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer
SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
Genome Res., November 1, 2007; 17(11): 1697 - 1706.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. H. Kim, M. S. Waterman, and L. M. Li
Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi
Genome Res., July 1, 2007; 17(7): 1101 - 1110.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Valouev, D. C. Schwartz, S. Zhou, and M. S. Waterman
An algorithm for assembly of ordered restriction maps from single DNA molecules
PNAS, October 24, 2006; 103(43): 15770 - 15775.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
R. Bruggmann, A. K. Bharti, H. Gundlach, J. Lai, S. Young, A. C. Pontaroli, F. Wei, G. Haberer, G. Fuks, C. Du, et al.
Uneven chromosome contraction and expansion in the maize genome
Genome Res., October 1, 2006; 16(10): 1241 - 1251.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
R. L. Warren, D. Varabei, D. Platt, X. Huang, D. Messina, S.-P. Yang, J. W. Kronstad, M. Krzywinski, W. C. Warren, J. W. Wallis, et al.
Physical map-assisted whole-genome shotgun sequence assemblies.
Genome Res., June 1, 2006; 16(6): 768 - 775.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
X. Huang, S.-P. Yang, A. T. Chinwalla, L. W. Hillier, P. Minx, E. R. Mardis, and R. K. Wilson
Application of a superword array in genome assembly
Nucleic Acids Res., January 5, 2006; 34(1): 201 - 205.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. P. Chan, G. Pertea, F. Cheung, D. Lee, L. Zheng, C. Whitelaw, A. C. Pontaroli, P. SanMiguel, Y. Yuan, J. Bennetzen, et al.
The TIGR Maize Database
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D771 - D776.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. L. Salzberg and J. A. Yorke
Beware of mis-assembled genomes
Bioinformatics, December 15, 2005; 21(24): 4320 - 4321.
[Full Text] [PDF]


Home page
Genome Res.Home page
J. E. Galagan, M. R. Henn, L.-J. Ma, C. A. Cuomo, and B. Birren
Genomics of the fungal kingdom: Insights into eukaryotic biology
Genome Res., December 1, 2005; 15(12): 1620 - 1631.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
G. Haberer, S. Young, A. K. Bharti, H. Gundlach, C. Raymond, G. Fuks, E. Butler, R. A. Wing, S. Rounsley, B. Birren, et al.
Structure and Architecture of the Maize Genome
Plant Physiology, December 1, 2005; 139(4): 1612 - 1624.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. R. Gale, J. D. Bryant, S. Calvo, H. Giese, T. Katan, K. O'Donnell, H. Suga, M. Taga, T. R. Usgaard, T. J. Ward, et al.
Chromosome Complement of the Fungal Plant Pathogen Fusarium graminearum Based on Genetic and Physical Mapping and Cytological Observations
Genetics, November 1, 2005; 171(3): 985 - 1001.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. P. Vinson, D. B. Jaffe, K. O'Neill, E. K. Karlsson, N. Stange-Thomann, S. Anderson, J. P. Mesirov, N. Satoh, Y. Satou, C. Nusbaum, et al.
Assembly of polymorphic genomes: Algorithms and application to Ciona savignyi
Genome Res., August 1, 2005; 15(8): 1127 - 1135.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Malde, E. Coward, and I. Jonassen
A graph based algorithm for generating EST consensus sequences
Bioinformatics, April 15, 2005; 21(8): 1371 - 1375.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Bartels, S. Kespohl, S. Albaum, T. Druke, A. Goesmann, J. Herold, O. Kaiser, A. Puhler, F. Pfeiffer, G. Raddatz, et al.
BACCardI--a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison
Bioinformatics, April 1, 2005; 21(7): 853 - 859.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
R. Warren, W. W. L. Hsiao, H. Kudo, M. Myhre, M. Dosanjh, A. Petrescu, H. Kobayashi, S. Shimizu, K. Miyauchi, E. Masai, et al.
Functional Characterization of a Catabolic Plasmid from Polychlorinated- Biphenyl-Degrading Rhodococcus sp. Strain RHA1
J. Bacteriol., November 15, 2004; 186(22): 7783 - 7795.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
P. A. Pevzner, H. Tang, and G. Tesler
De Novo Repeat Classification and Fragment Assembly
Genome Res., September 1, 2004; 14(9): 1786 - 1796.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. D. Jaffe, N. Stange-Thomann, C. Smith, D. DeCaprio, S. Fisher, J. Butler, S. Calvo, T. Elkins, M. G. FitzGerald, N. Hafez, et al.
The Complete Genome and Proteome of Mycoplasma mobile
Genome Res., August 1, 2004; 14(8): 1447 - 1461.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
V. Magrini, W. C. Warren, J. Wallis, W. E. Goldman, J. Xu, E. R. Mardis, and J. D. McPherson
Fosmid-Based Physical Mapping of the Histoplasma capsulatum Genome
Genome Res., August 1, 2004; 14(8): 1603 - 1609.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. A. Bailey, D. M. Church, M. Ventura, M. Rocchi, and E. E. Eichler
Analysis of Segmental Duplications and Genome Assembly in the Mouse
Genome Res., May 1, 2004; 14(5): 789 - 801.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
P. Havlak, R. Chen, K. J. Durbin, A. Egan, Y. Ren, X.-Z. Song, G. M. Weinstock, and R. A. Gibbs
The Atlas Genome Assembly System
Genome Res., April 1, 2004; 14(4): 721 - 732.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
K. A. Borkovich, L. A. Alex, O. Yarden, M. Freitag, G. E. Turner, N. D. Read, S. Seiler, D. Bell-Pedersen, J. Paietta, N. Plesofsky, et al.
Lessons from the Genome Sequence of Neurospora crassa: Tracing the Path from Genomic Blueprint to Multicellular Organism
Microbiol. Mol. Biol. Rev., March 1, 2004; 68(1): 1 - 108.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Gajer, M. Schatz, and S. L. Salzberg
Automated correction of genome sequence errors
Nucleic Acids Res., January 26, 2004; 32(2): 562 - 569.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
M. Pop, D. S. Kosack, and S. L. Salzberg
Hierarchical Scaffolding With Bambus
Genome Res., January 1, 2004; 14(1): 149 - 159.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
X. Huang, J. Wang, S. Aluru, S.-P. Yang, and L. Hillier
PCAP: A Whole-Genome Assembly Program
Genome Res., September 1, 2003; 13(9): 2164 - 2170.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. T. Tammi, E. Arner, E. Kindlund, and B. Andersson
Correcting errors in shotgun sequences
Nucleic Acids Res., August 1, 2003; 31(15): 4663 - 4672.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. D. Adams, G. G. Sutton, H. O. Smith, E. W. Myers, and J. C. Venter
The independence of our genome assemblies
PNAS, March 18, 2003; 100(6): 3025 - 3026.
[Full Text] [PDF]


Home page
Genome Res.Home page
D. B. Jaffe, J. Butler, S. Gnerre, E. Mauceli, K. Lindblad-Toh, J. P. Mesirov, M. C. Zody, and E. S. Lander
Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2
Genome Res., January 1, 2003; 13(1): 91 - 96.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. C. Mullikin and Z. Ning
The Phusion Assembler
Genome Res., January 1, 2003; 13(1): 81 - 90.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
L. van der Weyden, D. J. Adams, and A. Bradley
Tools for targeted manipulation of the mouse genome
Physiol Genomics, December 3, 2002; 11(3): 133 - 164.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. J. MacMurray, D. H. Moralejo, A. E. Kwitek, E. A. Rutledge, B. Van Yserloo, P. Gohlke, S. J. Speros, B. Snyder, J. Schaefer, S. Bieg, et al.
Lymphopenia in the BB Rat Model of Type 1 Diabetes is Due to a Mutation in a Novel Immune-Associated Nucleotide (Ian)-Related Gene
Genome Res., July 1, 2002; 12(7): 1029 - 1039.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. Wang, G. K.-S. Wong, P. Ni, Y. Han, X. Huang, J. Zhang, C. Ye, Y. Zhang, J. Hu, K. Zhang, et al.
RePS: A Sequence Assembler That Masks Exact Repeats Identified from the Shotgun Data
Genome Res., May 1, 2002; 12(5): 824 - 831.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.