|
|
|
|
Genome Res. 14:1603-1609, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Methods Fosmid-Based Physical Mapping of the Histoplasma capsulatum Genome1 Washington University School of Medicine, Genome Sequencing Center, St. Louis, Missouri 63108, USA 2 Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, Texas 77030, USA 3 Washington University School of Medicine, Department of Molecular Microbiology, St. Louis, Missouri 63110, USA 4 Washington University School of Medicine, Department of Molecular Biology and Pharmacology, St. Louis, Missouri 63110, USA
A fosmid library representing 10-fold coverage of the Histoplasma capsulatum G217B genome was used to construct a restriction-based physical map. The data obtained from three restriction endonuclease fingerprints, generated from each clone using BamHI, HindIII, and PstI endonucleases, were combined and used in FPC for automatic and manual contig assembly builds. Concomitantly, a whole-genome shotgun (WGS) sequencing of paired-end reads from plasmids and fosmids were assembled with PCAP, providing a predicted genome size of up to 43.5 Mbp and 17% repetitive DNA. Fosmid paired-end sequences in the WGS assembly provide anchoring information to the physical map and result in joining of existing physical map contigs into 84 clusters containing 9551 fosmid clones. Here, we detail mapping the Histoplasma capsulatum genome comprehensively in fosmids, resulting in an efficient paradigm for de novo sequencing that uses a map-assisted whole genome shotgun approach.
The dimorphic fungus, Histoplasma capsulatum, is the causative agent of histoplasmosis. In the United States, the Ohio and Mississippi river valleys are endemic regions for histoplasmosis; geographically, Histoplasma is not confined within the U.S. and can be isolated globally (Rippon 1982
Genetic elements required for morphogenesis and pathogenesis are poorly defined. CBP1, a yeast phase-specific gene that encodes a secreted calcium-binding protein essential for intracellular parasitism and pulmonary colonization, is the only genetic element defined for virulence (Kügler et al. 2000
Our sequencing strategy requires a physical map of this relatively small eukaryotic genome. We opted to construct a fosmid library in lieu of the commonly used Bacterial Artificial Chromosome (BAC) library-based approaches (Marra et al. 1997
Characterization of the H. capsulatum G217B Fosmid Library Fosmid clones of the H. capsulatum G217B library were initially sized by restriction with HindIII. The average insert size was 35.9 kbp (±4.3 kbp) and ranged from 22.0 to 49.9 kbp (n = 130; data not shown). A total of 9551 fosmid clones were analyzed for the physical map, and based on the published genome size of H. capsulatum isolate G186A-S (Carr and Shearer Jr. 1998 10-fold genome representation in the map.
Constructing the H. capsulatum G217B Physical Map
Assessment of the H. capsulatum G217B Physical Map Starting with only clones for which three combined digests were available, an automated assembly cutoff of 3e-10 and a band tolerance of 7 produced an initial 852 contigs. We followed this initial build with an incremental addition of clones having two usable restriction digests in their derived fingerprint at cutoff of 3e-06, resulting in 1096 contigs. The manual pathfinding process, described in Methods, ultimately resulted in reducing this contig number to 113. Joining of map contigs using the linkage information of paired-fosmid end sequences present in the WGS assembly contigs results in the formation of map clusters. Therefore, by using the WGS assembly contigs to inform the map, make joins, and confirm fingerprint contigs, the final physical map is comprised of 84 contiguous clusters of ordered fosmid clones. The distribution of the 9551 clones used in generating the clusters ranged from 3 to 575, with 305 buried clones and 708 singletons (Table 1).
Statistics of the H. capsulatum G217B WGS Sequence Assembly Over 1.3 million end sequence reads derived from both plasmids (24 kbp insert size fraction) and fosmids were assembled with the PCAP program (Huang et al. 2003 75% were included in the assembly; whereas the majority of reads excluded from the assembly contained homopolymeric and heteropolymeric runs composed of di- and trinucleotide repeats. The excluded reads were assembled using phrap (P. Green, unpubl.), generating an excluded reads assembly. This assembly and the PCAP assembly were converged by using phrap. The current WGS assembly statistics after merging of the PCAP and excluded phrap assemblies result in 1243 WGS sequence assembly contigs >2 kb (average 35 kb size) and 687 >8 kb (average 60 kb size), representing total calculated genome sizes of 43.5 and 41.4 Mbp, respectively. Presently, the composite WGS assembly is being finished.
The goal of this study was to create a fosmid-based physical map of the H. capsulatum G217B genome. An iterative process of anchoring and orienting physical map clusters to the WGS assembly contigs was employed using fosmid end-sequence information. In this project, the fosmid-based physical map supplies a genomic scaffold that both guides the genome finishing efforts and aids in resolving the differences observed between the predicted G186A-S (24 Mb) and the assembled G217B genome size. The genome size discrepancy is in conflict with preliminary flow cytometry data for H. capsulatum isolates G186A-B and G217B, which indicates that they have the same DNA mass per cell as G186A-S (Carr and Shearer Jr. 1998 To elucidate differences in the predicted and assembled genome sizes, we are in the process of anchoring physical map clusters to WGS assembly contigs. These anchors provide long-range linking information to make joins of physical map clusters and WGS assembly contigs. Graphically, Figure 2 shows map cluster 2501 from MapLink (J. Xu and J.I. Gordon, in prep.) anchored to a Consed assembly view of WGS assembly Supercontig Merge 57. After flipping the orientation of Supercontig Merge 57, these two data sets show good concordance, with two sequence gaps between contigs 2622 and 2219. These gaps are spanned by fosmid clones (dark lines) and can be addressed by a variety of means, including producing and sequencing shotgun libraries of the gap-spanning fosmids. Additionally, fosmid clone L_AAZ036H02 resides in contig 26 of Supercontig Merge 57 and anchors map cluster 2501 to Supercontig Merge 20 (dashed line). Using fosmid end sequences as long-range linking information, WGS assembly contigs anchored to physical map clusters allow us to evaluate and address potential joins between Supercontigs (e.g., joining Merge 20 to Merge 57)
Reciprocally, the WGS assembly aids the physical map. In Figure 2, contig 19 contains end sequence of two fosmids (dashed lines) anchoring Supercontig Merge 57 to map cluster 5901, which in turn, is anchored to Supercontig Merge 81. In this iterative process, we can use the WGS assembly to inform the physical map of potential joins (i.e., linking physical map cluster 2501 to cluster 5901 and also joining Merge 57 and Merge 81). Using both data sets in a map-assisted whole-genome shotgun approach, we can resolve the difference between the predicted and assembled genome size of H. capsulatum G217B. In this study, successive modifications of the PCAP WGS assembly were used to incorporate more of the shotgun reads. The initial PCAP WGS assembly includes 75% of the end sequence data. To incorporate the missing data, we used phrap to assemble the excluded reads and combine the assembly of the excluded reads with the PCAP assembly. The current WGS assembly represents over 3100 contigs and contains up to 17% repetitive DNA. Although a comprehensive repeat analysis has not yet been performed, the theoretical repetitive nature (based on reassociation kinetics) is greater than expected with observed homopolymeric runs and small repeat units. Additionally, A:G and T:C transitions present in small repeat regions, 50250 bp in length (data not shown) contribute to misassembled regions and may increase the repeat bias within the WGS assembly. Thus, using MapLink, discordant linking information observed between the physical map and WGS assembly targets problematic genomic regions for resolution during finishing.
Fosmid cloning is straightforward and practical for physical mapping. However, physical maps generated from BAC clones are advantageous, needing far fewer clones (when compared with fosmids) to provide the necessary depth of coverage and the fewest gaps (Schmitt et al. 1996
The fosmid-based mapping approach fits into our existing pipeline. DNA fragments from HindIII, PstI, and BamHI-digested fosmid clones were separated by agarose gel electrophoresis, stained, and scanned for interpretation by the Image software package (http://www.sanger.ac.uk/Software/Image) producing fragment mobility bands files. A combined bands file used in the initial FPC contig build (Fig. 1) was generated by offsetting the mobility values for the independent digest of each clone by a fixed amount. Advantages of combining multiple restriction patterns in the bands file, include a decreased false-negative frequency associated with the use of smaller clone sizes, as shown in Table 4 of Soderlund et al. (2000 Our results detailed mapping the Histoplasma capsulatum G217B genome exclusively using fosmid clones with existing software (Image and FPC). The map-assisted whole genome shotgun approach provides an efficient means for de novo sequencing, in which the WGS assembly and physical map are produced by different processes. The MapLink software performs automated integration between physical map clusters and WGS sequence-assembly contigs, and provides a graphical interface to view both data sets. This resulting genomic organization supplies essential linking information used in finishing, and ultimately will create an accurate genomic sequence of this unique, dimorphic fungus.
Strains, Growth Conditions, and Vectors Histoplasma capsulatum isolate G217B (ATCC MYA-2455) was recovered from 4°C stocks on HMM agar plates and grown as yeast at 37°C (Worsham and Goldman 1988 The EpiFOS Fosmid Library Production Kit (EpiCentre Technologies) was used with vector pEpiFOS-5. Transductants were isolated on LB agar plates (Sambrook 1989) supplemented with 25 µg/mL chloramphenicol (Cm25) and S-GAL (Sigma). Clones were picked into shallow growth plates with 240 µL of TB, Cm25, supplemented with 8% glycerol and grown at 37°C overnight while shaking. Overnight growth was subcultured into 1.2 mL of LB, Cm25 and incubated overnight at 37°C. Glycerol stocks were stored at 80°C.
Histoplasma DNA Isolation
Generating Fosmid Libraries
DNA fragments migrating to 3050 kbp in length were excised and purified after a second size selection in 1% low-melting point (LMP) agarose (Bio-Rad). The second size selection gel was run at 30 V for 16 h and stained with SYBR-Green. DNA fragments were visualized with a blue light, excised, melted at 65°C, and treated with AgarAce (Promega, 1.5 U per 100 mg agarose) in 0.5x TBE buffer at 42°C for 1 h. AgarAce treatment was followed by a single phenol extraction, and the DNA solution was minimized using sec-butanol and ethanol precipitated with 0.1 M NaCl. DNA was suspended in 10 µL of molecular biology grade H20 (Sigma). DNA was desalted by drop dialysis (MF-Millipore 0.025 µm pore-size membrane filters) and ligated to pEpiFOS-5 DNA. Fosmid clones were packaged using MaxPlax
Fingerprinting and Physical Map Assembly of Histoplasma Fosmids
Manual Contig Editing Clusters of fingerprint contigs and individual clones were created on the basis of the order of the 15,655 H. capsulatum G217B fosmid-end sequences that are contained within the assembly. The boundaries on the assembly of fingerprint contigs and clones with two end sequences were determined. Overlapping contigs and clones were positioned in clusters, separated by gaps for easy viewing and editing in the FPC graphical interface. The orientations of contigs were flipped as needed. The fingerprints within each cluster were manually reviewed and merges made on the basis of fingerprint matches. Of the 294 merges suggested by the sequence assembly, 94% were confirmed by fingerprints. This process was repeated using a later sequence assembly, resulting in the incorporation of an additional 134 single clones and 38 contig merges.
Fosmid End Sequencing
Genome Sequencing Strategy
Data Availability
We thank the members within Mapping, Finishing and support, and Informatics groups of the GSC directly involved in this endeavor. Special thanks go to Jacquelyn Engle and Linda Eissenberg of the Goldman laboratory for their assistance in strain handling and virulence assays. This work was supported by NIAID and Public Health Service grant AI25584 (to E.R.M.). The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2361404.
5 Corresponding author.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403410.[CrossRef][Medline]
Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., and Lander, E.S. 2002. ARACHNE: A whole-genome shotgun assembler. Genome Res. 12: 177189. Bradsher, R.W. 1996. Histoplasmosis and blastomycosis. Clin. Infect. Dis. 22 Suppl 2: S102S11.[Medline]
Carr, J. and Shearer Jr., G. 1998. Genome size, complexity, and ploidy of the pathogenic fungus Histoplasma capsulatum. J. Bacteriol. 180: 66976703. Carter, D.A., Taylor, J.W., Dechairo, B., Burt, A., Koenig, G.L., and White, T.J. 2001. Amplified single-nucleotide polymorphisms and a (GA)(n) microsatellite marker reveal genetic differentiation between populations of Histoplasma capsulatum from the Americas. Fungal Genet. Biol. 34: 3748.[CrossRef][Medline]
Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., and Johnston, M. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science. 301: 7176.
Eissenberg, L.G., Goldman, W.E., and Schlesinger, P.H. 1993. Histoplasma capsulatum modulates the acidification of phagolysosomes. J. Exp. Med. 177: 16051611.
Ewing, B. and Green, P. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8: 186194.
Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: 175185.
Gordon, D., Abajian, C, and Green, P. 1998. Consed: A graphical tool for sequence finishing. Genome Res. 8: 195202. Gregory, S.G., Sekhon, M., Schein, J., Zhao, S., Osoegawa, K., Scott, C.E., Evans, R.S., Burridge, P.W., Cox, T.V., Fox, C.A., et al. 2002. A physical map of the mouse genome. Nature 418: 743750.[CrossRef][Medline]
Huang, X., Wang, J., Aluru, S., Yang, S-P., and Hillier, L. 2003. PCAP: A whole-genome assembly program. Genome Res. 13: 21642170.
Kasuga, T., Taylor, J.W., and White, T.J. 1999. Phylogenetic relationships of varieties and geographical groups of the human pathogenic fungus Histoplasma capsulatum Darling. J. Clin. Microbiol. 37: 653663.
Kersulyte, D., Woods, J.P., Keath, E.J., Goldman, W.E., and Berg, D.E. 1992. Diversity among clinical isolates of Histoplasma capsulatum detected by polymerase chain reaction with arbitrary primers. J. Bacteriol. 174: 70757079.
Kügler, S., Schurtz-Sebghati, T., Groppe-Eissenberg, L., and Goldman, W.E. 2000. Phenotypic variation and intracellular parasitism by Histoplasma capsulatum. Proc. Natl. Acad. Sci. 97: 87948798.
Maresca, B. and Kobayashi, G.S. 1989. Dimorphism in Histoplasma capsulatum: A model for the study of cell differentiation in pathogenic fungi. Microbiol. Rev. 53: 186209.
Marra, M.A., Kucaba, T.A., Dietrich, N.L., Green, E.D., Brownstein, B., Wilson, R.K., McDonald, K.M., Hillier, L.W., McPherson, J.D., and Waterston, R.H. 1997. High throughput fingerprint analysis of large-insert clones. Genome Res. 7: 10721084.
Marra, M.A., Kucaba, T.A., Hillier, L.W., and Waterston, R.H. 1999. High-throughput plasmid DNA purification for 3 cents per sample. Nucleic Acids Res. 27: e37. McPherson, J.D., Marra, M., Hillier, L., Waterston, R.H., Chinwalla, A., Wallis, J., Sekhon, M., Wylie, K., Mardis, E.R., Wilson, R.K., et al. 2001. A physical map of the human genome. Nature 409: 934941.[CrossRef][Medline] Newman, S.L. 1999. Macrophages in host defense against Histoplasma capsulatum. Trends. Microbiol. 7: 6771.[CrossRef][Medline]
Olson, M.V., Dutchik, J.E., Graham, M.Y., Brodeur, G.M., Helms, C., Frank, M., MacCollin, M., Scheinman, R., and Frank, T. 1986. Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl. Acad. Sci. 83: 78267830. Riles, L., Dutchik, J.E., Baktha, A., McCauley, B.K., Thayer, E.C., Leckie, M.P., Braden, V.V., Depke, J.E., and Olson, M.V. 1993. Physical maps of the six smallest chromosomes of Saccharomyces cerevisiae at a resolution of 2.6 kilobase pairs. Genetics 134: 81150.[Abstract] Rippon, J.W. 1982. Histoplasmosis (Histoplasmosis capsulati). In: Medical mycology. The Pathogenic fungi and the pathogenic actinomycetes (ed. M.L.W.R. Wonsiewicz), pp. 201205. W.B. Saunders & Co., Philadelphia, PA. Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular cloning: A laboratory manual. 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Schmitt, H., Kim, U.J., Slepak, T., Blin, N., Simon, M.I., and Shizuya, H. 1996. Framework for a physical map of the human 22q13 region using bacterial artificial chromosomes (BACs). Genomics 33: 920.[CrossRef][Medline]
Sebghati, T.S., Engle, J.T., and Goldman, W.E. 2000. Intracellular parasitism by Histoplasma capsulatum: Fungal virulence and calcium dependence. Science 290: 13681372.
Soderlund, C., Humphray, S., Dunham, A., and French, L. 2000. Contigs built with fingerprints, markers, and FPC V4.7. Genome Res. 10: 17721787. Sorensen, K.N., Clemons, K.V., and Stevens, D.A. 1999. Murine models of blastomycosis, coccidioidomycosis, and histoplasmosis. Mycopathologia 146: 5365.[CrossRef][Medline]
Steele, P.E., Carle, G.F., Kobayashi, G.S., and Medoff, G. 1989. Electrophoretic analysis of Histoplasma capsulatum chromosomal DNA. Mol. Cell. Biol. 9: 983987.
Wong, G.K., Yu, J., Thayer, E.C., and Olson, M.V. 1997. Multiple-complete-digest restriction fragment mapping: Generating sequence-ready maps for large-scale DNA sequencing. Proc. Natl. Acad. Sci. 94: 52255230. Woods, J.P., Heinecke, E.L., Luecke, J.W., Maldonado, E., Ng, J.Z., Retallack, D.M., and Timmerman, M.M. 2001. Pathogenesis of Histoplasma capsulatum. Semin. Respir. Infect. 16: 91101.[Medline] Worsham, P.L. and Goldman, W.E. 1988. Quantitative plating of Histoplasma capsulatum without addition of conditioned medium or siderophores. J. Med. Vet. Mycol. 26: 137143.[Medline]
http://www.genome.arizona.edu/software/fpc/; The FPC and Friends Web site is associated with the Arizona Genomics Institute (AGI) and Arizona Genomics Computational Laboratory (AGCoL). This site provides FPC software download and documentation links. http://www.genome.wustl.edu/blast/histo_client.cgi; This link is part of the Histoplasma capsulatum project maintained on the Genome Sequencing Center Web site. The Histoplasma capsulatum G217B and G186A-R Whole Genome Shotgun (WGS) PCAP and ARACHNE assemblies are available for BLASTIN searches. http://genome.wustl.edu/projects/hcapsulatum/index.php?fpc=1; An additional link where the Histoplasma capsulatum FPC database can be downloaded. http://www.sanger.ac.uk/Software/Image; A link to IMAGE, the fingerprint image analysis system, at the Sanger Institute. http://www.tigr.org/tdb/bac_ends/mouse/bac_end_intro.html; This site provides links to BAC End Sequencing Protocols and BAC library resources.
Received January 14, 2004; accepted in revised format April 30, 2004. This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||