Genome Research cityscape

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Published online before print February 15, 2002, 10.1101/gr.207902. Article published online before print in February 2002
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
GR-2079Rv1
12/3/424    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Semple, C. A.M.
Right arrow Articles by Evans, K. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Semple, C. A.M.
Right arrow Articles by Evans, K. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Vol. 12, Issue 3, 424-429, March 2002

LETTER
Computational Comparison of Human Genomic Sequence Assemblies for a Region of Chromosome 4

Colin A.M. Semple,1,2 Stewart W. Morris, David J. Porteous, and Kathryn L. Evans

Medical Genetics Section, Department of Medical Sciences, The University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Edinburgh EH4 2XU, United Kingdom

Much of the available human genomic sequence data exist in a fragmentary draft state following the completion of the initial high-volume sequencing performed by the International Human Genome Sequencing Consortium (IHGSC) and Celera Genomics (CG). We compared six draft genome assemblies over a region of chromosome 4p (D4S394-D4S403), two consecutive releases by the IHGSC at University of California, Santa Cruz (UCSC), two consecutive releases from the National Centre for Biotechnology Information (NCBI), the public release from CG, and a hybrid assembly we have produced using IHGSC and CG sequence data. This region presents particular problems for genomic sequence assembly algorithms as it contains a large tandem repeat and is sparsely covered by draft sequences. The six assemblies differed both in terms of their relative coverage of sequence data from the region and in their estimated rates of misassembly. The CG assembly method attained the lowest level of misassembly, whereas NCBI and UCSC assemblies had the highest levels of coverage. All assemblies examined included <60% of the publicly available sequence from the region. At least 6% of the sequence data within the CG assembly for the D4S394-D4S403 region was not present in publicly available sequence data. We also show that even in a problematic region, existing software tools can be used with high-quality mapping data to produce genomic sequence contigs with a low rate of rearrangements.

[All sequence accessions for the genomic sequence assemblies analyzed and the data sets used to assess coverage and rates of misassembly are available from http://www.ed.ac.uk/~csemple.]


1 Present address: Bioinformatics, MRC Human Genetics Unit, Edinburgh EH4 2XU, UK.

2 Corresponding author.


12:424-429 ©2002 by Cold Spring Harbor Laboratory Press  ISSN 1088-9051/02 $5.00

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
E. C. Rouchka, W. Gish, and D. J. States
Comparison of whole genome assemblies of the human genome
Nucleic Acids Res., November 15, 2002; 30(22): 5004 - 5014.
[Abstract] [Full Text] [PDF]


Home page
MutagenesisHome page
I. Dunham
Human genome sequences: enigmatic variations
Mutagenesis, November 1, 2002; 17(6): 457 - 461.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. Katsanis, K. C. Worley, G. Gonzalez, S. J. Ansley, and J. R. Lupski
A computational/functional genomics approach for the enrichment of the retinal transcriptome and the identification of positional candidate retinopathy genes
PNAS, October 29, 2002; 99(22): 14326 - 14331.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.