Genome Research

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Published online before print September 15, 2003, 10.1101/gr.1350803
Genome Res. 13:2306-2315, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
1350803v1
13/10/2306    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Healy, J.
Right arrow Articles by Wigler, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Healy, J.
Right arrow Articles by Wigler, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Methods

Annotating Large Genomes With Exact Word Matches

John Healy1,3, Elizabeth E. Thomas1, Jacob T. Schwartz2 and Michael Wigler1

1 Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA 2 Courant Institute of Mathematical Sciences, New York University, New York, New York 10003, USA

We have developed a tool for rapidly determining the number of exact matches of any word within large, internally repetitive genomes or sets of genomes. Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words. We create a Burrows-Wheeler transform of the genome, which together with auxiliary data structures facilitating counting, can reside in about one gigabyte of RAM. Our original interest was motivated by oligonucleotide probe design, and we describe a general protocol for defining unique hybridization probes. But our method also has applications for the analysis of genome structure and assembly. We demonstrate the identification of chromosome-specific repeats, and outline a general procedure for finding undiscovered repeats. We also illustrate the changing contents of the human genome assemblies by comparing the annotations built from different genome freezes.


3 Corresponding author.
E-MAIL healy{at}cshl.edu; FAX (516) 367-8381.

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1350803. Article published online before print in September 2003.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
C. M. Bergman and H. Quesneville
Discovering and detecting transposable elements in genome sequences
Brief Bioinform, November 1, 2007; 8(6): 382 - 392.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Graf, F. G. G. Nielsen, S. Kurtz, M. A. Huynen, E. Birney, H. Stunnenberg, and P. Flicek
Optimized design and assessment of whole genome tiling arrays
Bioinformatics, July 1, 2007; 23(13): i195 - i204.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
B. Lakshmi, I. M. Hall, C. Egan, J. Alexander, A. Leotta, J. Healy, L. Zender, M. S. Spector, W. Xue, S. W. Lowe, et al.
Mouse genomic representational oligonucleotide microarray analysis: Detection of copy number variations in normal and tumor specimens
PNAS, July 25, 2006; 103(30): 11234 - 11239.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Morgulis, E. M. Gertz, A. A. Schaffer, and R. Agarwala
WindowMasker: window-based masker for sequenced genomes
Bioinformatics, January 15, 2006; 22(2): 134 - 141.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Campagna, C. Romualdi, N. Vitulo, M. Del Favero, M. Lexa, N. Cannata, and G. Valle
RAP: a new computer program for de novo identification of repeated sequences in whole genomes
Bioinformatics, March 1, 2005; 21(5): 582 - 588.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
J. Sebat, B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Maner, H. Massa, M. Walker, M. Chi, et al.
Large-Scale Copy Number Polymorphism in the Human Genome
Science, July 23, 2004; 305(5683): 525 - 528.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. E. Thomas, N. Srebro, J. Sebat, N. Navin, J. Healy, B. Mishra, and M. Wigler
Distribution of short paired duplications in mammalian genomes
PNAS, July 13, 2004; 101(28): 10349 - 10354.
[Abstract] [Full Text] [PDF]




Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Genes Dev. Learn. Mem.
Protein Science RNA Genome Res.
Copyright © 2003 by Cold Spring Harbor Laboratory Press.