Genome Res. 13:2164-2170, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Methods
PCAP: A Whole-Genome Assembly Program
Xiaoqiu Huang1,4,
Jianmin Wang1,
Srinivas Aluru2,
Shiaw-Pyng Yang3 and
LaDeana Hillier3
1 Department of Computer Science Iowa State University, Ames, Iowa
50011-1040, USA
2 Department of Electrical and Computer Engineering, Iowa State University,
Ames, Iowa 50011-1040, USA
3 Genome Sequencing Center, Washington University Medical School, St. Louis,
Missouri 63108, USA
We describe a whole-genome assembly program named PCAP for processing tens
of millions of reads. The PCAP program has several features to address
efficiency and accuracy issues in assembly. Multiple processors are used to
perform most time-consuming computations in assembly. A more sensitive method
is used to avoid missing overlaps caused by sequencing errors. Repetitive
regions of reads are detected on the basis of many overlaps with other reads,
instead of many shorter word matches with other reads. Contaminated end
regions of reads are identified and removed. Generation of a consensus
sequence for a contig is based on an alignment of reads in the contig, in
which both base quality values and coverage information are used to determine
every consensus base. The PCAP program was tested on a mouse whole-genome data
set of 30 million reads and a human Chromosome 20 data set of 1.7 million
reads. The program is freely available for academic use.
[The following individuals kindly provided reagents, samples, or
unpublished information as indicated in the paper: the
Mouse Genome Sequencing Consortium
2002; J. Mullikin. The assembled mouse sequences are available at
our Web site, http://seq.cs.iastate.edu.]
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.1390403.
4 Corresponding author. E-MAIL
xqhuang{at}cs.iastate.edu;
FAX (515) 294-0258.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
D. R. Zerbino and E. Birney
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Genome Res.,
May 1, 2008;
18(5):
821 - 829.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Denisov, B. Walenz, A. L. Halpern, J. Miller, N. Axelrod, S. Levy, and G. Sutton
Consensus generation and variant detection by Celera Assembler
Bioinformatics,
April 15, 2008;
24(8):
1035 - 1040.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. Chaisson and P. A. Pevzner
Short read fragment assembly of bacterial genomes
Genome Res.,
February 1, 2008;
18(2):
324 - 330.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. V. Zimin, D. R. Smith, G. Sutton, and J. A. Yorke
Assembly reconciliation
Bioinformatics,
January 1, 2008;
24(1):
42 - 45.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Kurokawa, T. Itoh, T. Kuwahara, K. Oshima, H. Toh, A. Toyoda, H. Takami, H. Morita, V. K. Sharma, T. P. Srivastava, et al.
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes
DNA Res,
October 16, 2007;
(2007)
dsm018v2.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. S. Samuel, E. E. Hansen, J. K. Manchester, P. M. Coutinho, B. Henrissat, R. Fulton, P. Latreille, K. Kim, R. K. Wilson, and J. I. Gordon
Genomic and metabolic adaptations of Methanobrevibacter smithii to the human gut
PNAS,
June 19, 2007;
104(25):
10643 - 10648.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. L. Warren, D. Varabei, D. Platt, X. Huang, D. Messina, S.-P. Yang, J. W. Kronstad, M. Krzywinski, W. C. Warren, J. W. Wallis, et al.
Physical map-assisted whole-genome shotgun sequence assemblies.
Genome Res.,
June 1, 2006;
16(6):
768 - 775.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Huang, S.-P. Yang, A. T. Chinwalla, L. W. Hillier, P. Minx, E. R. Mardis, and R. K. Wilson
Application of a superword array in genome assembly
Nucleic Acids Res.,
January 5, 2006;
34(1):
201 - 205.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. L. Salzberg and J. A. Yorke
Beware of mis-assembled genomes
Bioinformatics,
December 15, 2005;
21(24):
4320 - 4321.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Q. Dong, C. J. Lawrence, S. D. Schlueter, M. D. Wilkerson, S. Kurtz, C. Lushbough, and V. Brendel
Comparative Plant Genomics Resources at PlantGDB
Plant Physiology,
October 1, 2005;
139(2):
610 - 618.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. P. Vinson, D. B. Jaffe, K. O'Neill, E. K. Karlsson, N. Stange-Thomann, S. Anderson, J. P. Mesirov, N. Satoh, Y. Satou, C. Nusbaum, et al.
Assembly of polymorphic genomes: Algorithms and application to Ciona savignyi
Genome Res.,
August 1, 2005;
15(8):
1127 - 1135.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Xu and J. I. Gordon
MapLinker: a software tool that aids physical map-linked whole genome shotgun assembly
Bioinformatics,
April 1, 2005;
21(7):
1265 - 1266.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Magrini, W. C. Warren, J. Wallis, W. E. Goldman, J. Xu, E. R. Mardis, and J. D. McPherson
Fosmid-Based Physical Mapping of the Histoplasma capsulatum Genome
Genome Res.,
August 1, 2004;
14(8):
1603 - 1609.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Havlak, R. Chen, K. J. Durbin, A. Egan, Y. Ren, X.-Z. Song, G. M. Weinstock, and R. A. Gibbs
The Atlas Genome Assembly System
Genome Res.,
April 1, 2004;
14(4):
721 - 732.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|