|
|
|
|
Genome Res. 17:1547-1549, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 OPEN ACCESS ARTICLE
Commentary 2x genomes—Does depth matter?Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
This issue of Genome Research marks the publication by collaborators at the NIH, Agencourt Bioscience, and the Broad Institute of a genome sequence for the domestic cat Felis catus (Pontius et al. 2007
The tradeoff between breadth and depth has actually been a recurring theme in genome sequencing, because the choice between spreading available data-generating capacity broadly over a larger extent of targeted DNA and obtaining deeper coverage of a more limited target arises in several contexts. The appropriate balance in each case is far from obvious and depends on a number of factors, including the relative difficulty of obtaining DNA sources of various types, available data analysis tools, and the implications of sequence errors and gaps for downstream utility. Early in the Human Genome Project, the high level of redundancy required by the shotgun method was often viewed as unduly wasteful, and considerable effort was expended toward developing more efficient directed strategies. These all foundered, due to a combination of logistical complexity, the problems posed by interspersed repeats, and the recognition that variation in data quality makes it impossible to get highly accurate sequence without considerable redundancy. Once the superiority of the shotgun approach was accepted, a key issue became the breadth of the targeted region: is it better to shotgun a whole genome, or a series of individual BAC clones (Green 1997
With completion of the human genome sequence, the breadth versus depth issue has now shifted to how best to apply available sequencing capacity across organisms. Broad phylogenetic representation accelerates research on a wide variety of species and provides general insights into the core evolutionary processes of mutation and selection. However, from the perspective of the human genome, its main benefit is to help identify functional features as regions with a reduced frequency of substitution differences between organisms due to purifying selection. The statistical power to detect such regions depends on overall sequence divergence: there is an analogy to shotgun-sequence assembly, in that one seeks "coverage" of human bases by multiply aligned orthologous sequences instead of reads, and the relevant "depth" is total branch length (expected number of mutations per neutral site) of the phylogenetic tree relating the species. Theoretical analysis (Eddy 2005
It is worth noting that effective delineation of conserved elements via this strategy is not yet a completely solved problem. Multiple-genome alignments are error prone even with relatively complete sequences (Prakash and Tompa 2007
What are the consequences of reduced depth for individual genomes? The primary determinants of sequence utility are assembly accuracy (correctness of read overlaps and of contig order and orientation) and percent coverage of the genome. For low-redundancy (
For both a simulated unassisted 2x mouse genome assembly (Margulies et al. 2005
The analyses by Pontius et al. (2007)
On the other hand, the low-percent coverage significantly limits many applications of the sequence. Inferences about lineage-specific losses of genes or other functional features are not possible, and it is hard to distinguish genes from pseudogenes. More seriously, since very few features of any appreciable size (e.g., genes) will be completely covered, analyses requiring complete features cannot be carried out. In addition, as was noted above, whole-genome assemblies (of any depth) often fail to incorporate a significant fraction of the repetitive sequence in the genome. This is often considered to be a relatively minor deficiency, which may be true so long as the primary research focus is on broadly shared biological features. However, it is now apparent that repetitive sequence is a key agent of evolutionary change: Segmental duplications are likely the primary source of new genes (Ohno 1970
What are the prospects for correcting these deficiencies? Fortunately, 11 of the 24 2x genomes (including cat) are already slated for deeper sequencing (Table 1). For the others, technical improvements in assembly, including better discrimination between different repeat copies and more aggressive assisted assembly strategies, should help somewhat. One hope is that most gaps could be closed with large numbers of cheap short reads generated using newer technologies (Bentley 2006
I thank E. Eichler, E. Green, R. Waterston, A. Felsenfeld, and two anonymous reviewers for suggestions.
Corresponding author.
E-mail phg{at}u.washington.edu; fax (206) 685-9720. Article is online at http://www.genome.org/cgi/doi/10.1101/gr.7050807
Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.H., Gocayne, J.D., Amanatides, P.G., Sherer, S.E., Li, P.W., Hoskins, R.A., and Galle, R.F. 2000. The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. Bentley, D.R. 2006. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16: 545–552.[CrossRef][Medline] Eddy, S.R. 2005. A model of the statistical power of comparative genome sequence analysis. PLoS Biol. 3: e10. doi: 10.1371/journal.pbio.0030010.[CrossRef][Medline] The ENCODE Project Consortium. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816.[CrossRef][Medline] Goldberg, S.M., Johnson, J., Busam, D., Feldblyum, T., Ferriera, S., Friedman, R., Halpern, A., Khouri, H., Kravitz, S.A., and Lauro, F.M. 2006. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc. Natl. Acad. Sci. 103: 11240–11245. Green, P. 1997. Against a whole-genome shotgun. Genome Res. 7: 410–417. Kamal, M., Xie, X., and Lander, E.S. 2006. A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl. Acad. Sci. 103: 2740–2745. Kirkness, E.F., Bafna, V., Halpern, A.L., Levy, S., Remington, K., Rusch, D.B., Delcher, A.L., Pop, M., Wang, W., Fraser, C.M., et al. 2003. The dog genome: Survey sequencing and comparative analysis. Science 301: 1898–1903. Lander, E.S. and Waterman, M.S. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2: 231–239.[CrossRef][Medline] Lowe, C.B., Bejerano, G., and Haussler, D. 2007. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl. Acad. Sci. 104: 8005–8010. Margulies, E.H., Vinson, J.P. NISC Comparative Sequencing Program, Miller, W., Jaffe, D.B., Lindblad-Toh, K., Chang, J.L., Green, E.D., Lander, E.S., Mullikin, J.C., et al. 2005. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc. Natl. Acad. Sci. 102: 4795–4800. Mikkelsen, T.S., Wakefield, M.J., Akin, B., Amemeya, C.T., Chang, J.L., Duke, S., Garber, M., Gentles, A.J., Goodstadt, L., Heger, A., et al. 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447: 167–177.[CrossRef][Medline] Nishihara, H., Smit, A.F., and Okada, N. 2006. Functional noncoding sequences derived from SINEs in the mammalian genome. Genome Res. 16: 864–874. Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York. Pontius, J.U., Mullikin, J.C., Smith, D. Agencourt Sequencing Team, Lindblad-Toh, K., Gnerre, S., Clamp, M., Chang, J., Stephens, R., Neelam, B., et al. 2007. Initial sequence and comparative analysis of the cat genome. Genome Res. this issue doi: 10.1101/gr.6380007. Prakash, A. and Tompa, M. 2007. Measuring the accuracy of genome-size multiple alignments. Genome Biol. 8: R124. doi: 10.1186/gb-2007-8-6-r124.[CrossRef][Medline] Sharp, A.J., Cheng, Z., and Eichler, E.E. 2006. Structural variation of the human genome. Annu. Rev. Genomics Hum. Genet. 7: 407–442.[CrossRef][Medline] She, X., Jiang, Z., Clark, R.A., Liu, G., Cheng, Z., Tuzun, E., Church, D.M., Sutton, G., Halpern, A.L., and Eichler, E.E. 2004. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431: 927–930.[CrossRef][Medline] Thomas, J.W., Touchman, J.W., Blakesley, R.W., Boufford, G.G., Beckstrom-Sternberg, S.M., Margulies, E.H., Blanchette, M., Siepel, A.C., Thomas, P.J., McDowell, J.C., et al. 2003. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424: 788–793.[CrossRef][Medline] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandells, M., Evans, C.A., Holt, R.A., et al. 2001. The sequence of the human genome. Science 291: 1304–1391. Weber, J.L. and Myers, E.W. 1997. Human whole-genome shotgun sequencing. Genome Res. 7: 401–409.
Related Article
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||