|
|
|
|
Published online before print
November 7, 2006, 10.1101/gr.5512906 Genome Res. 17:61-68, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Letter A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: Convergence or divergence by homologous recombination?1 Department of Statistics, University of Oxford, Oxford OX1 3SY, United Kingdom; 2 Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany 10117; 3 The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom
All Salmonella can cause disease but severe systemic infections are primarily caused by a few lineages. Paratyphi A and Typhi are the deadliest human restricted serovars, responsible for 600,000 deaths per annum. We developed a Bayesian changepoint model that uses variation in the degree of nucleotide divergence along two genomes to detect homologous recombination between these strains, and with other lineages of Salmonella enterica. Paratyphi A and Typhi showed an atypical and surprising pattern. For three quarters of their genomes, they appear to be distantly related members of the species S. enterica, both in their gene content and nucleotide divergence. However, the remaining quarter is much more similar in both aspects, with average nucleotide divergence of 0.18% instead of 1.2%. We describe two different scenarios that could have led to this pattern, convergence and divergence, and conclude that the former is more likely based on a variety of criteria. The convergence scenario implies that, although Paratyphi A and Typhi were not especially close relatives within S. enterica, they have gone through a burst of recombination involving more than 100 recombination events. Several of the recombination events transferred novel genes in addition to homologous sequences, resulting in similar gene content in the two lineages. We propose that recombination between Typhi and Paratyphi A has allowed the exchange of gene variants that are important for their adaptation to their common ecological niche, the human host.
Most bacteria undergo frequent homologous recombination, whereby portions of their genomes are replaced by the corresponding sequences from other bacteria (Smith et al. 1993
Here we investigate the extent and origin of recombination that has occurred during the evolution of the two most important human-specific serovars of Salmonella enterica, Typhi and Paratyphi A. Most S. enterica, e.g., the widespread serovars Typhimurium and Enteritidis, colonize the mucosal surfaces of a wide range of mammals and birds. However, some serovars have become specialized, host-specific pathogens that can cause systemic disease, such as typhoid fever in humans (Uzzau et al. 2000
Genomic pattern of relatedness between Typhi and Paratyphi A Genome-wide recombination patterns were investigated using a novel statistical algorithm that infers genetic exchange on the basis of the distribution of nucleotide differences between pairs of strains. If the ancestor of one of the strains imported DNA from a close relative of the other, then that should result in a stretch of DNA with high sequence homology. Alternatively, DNA imported from an unrelated strain might instead exhibit an atypically high level of nucleotide divergence compared with the neighboring region. Our algorithm detects points in the genomic sequences where divergence levels change, which should correspond to the beginning and end points of specific imports, assuming that the neutral levels of polymorphism between the strains are uniform (Hughes and Friedman 2004
The genomes of S. enterica Typhi, Paratyphi A, Enteritidis, Paratyphi B, Typhimurium, Choleraesuis (Chiu et al. 2005
In total there are 124 low-divergence regions in the comparison of Typhi and Paratyphi A, with an average size of 6.4 kb and containing a total of 948 genes (Fig. 2). The high-divergence regions are on average 32 kb in size. After accounting for spatial clustering of genes with a similar function, we find that most classes of genes are randomly distributed between the low- and high-divergence regions (Supplemental Table S1). The only classes of genes that are obviously enriched within the low-divergence regions are shared "rare genes" and genes encoding transposases, as detailed below.
A microarray study (Porwollik et al. 2004
Twenty-one of the 50 transposase genes in Typhi are at syntenic locations in the Paratyphi A genome, and 13 of these (62%) are located in regions of low divergence (Figs. 1E, 2), which is significantly higher than random expectations (P = 0.001, Fishers exact test). Transposons jump from one genomic region to another and are also lost with a certain frequency (Beuzon et al. 2004
It has previously been suggested that the independent degradation of certain genes or genes within certain pathways has been important for the convergent phenotypes of Typhi and Paratyphi A (Parkhill et al. 2001 There are three prophages in the Paratyphi A genome and seven in Typhi. One of the three found in Paratyphi A, SPA-2-SopE, is highly similar to the SopE phage in Typhi, in fact representing the region of the genome with the lowest level of divergence (Fig. 1G; lines 3031 on Fig. 2). The other two Paratyphi A phages show only incomplete, limited homology with phages in Typhi. In summary we have found that, although according to several different criteria the Typhi and Paratyphi A genomes are more similar to each other than to the other members of S. enterica that have been sequenced, this similarity is almost entirely due to a very high degree of homology of a quarter of the genome. The other three quarters are about as different as randomly selected strains in terms of gene content and nucleotide divergence. How did the two genomes come to have regions with such distinct evolutionary histories?
Comparison of pattern of relatedness with expectations under two evolutionary scenarios
Under the convergence scenario, the ancestors of Typhi and Paratyphi A were as unrelated as are most other pairs of serovars of S. enterica, but 23% of their genome in total have been recently imported from one lineage to the other. The remaining 77% of the genome would then either reflect the original composition of the lineages prior to this recombination or have been imported from other serovars. The convergence scenario requires much less recombination in total than under divergence and, for this reason, is intuitively more appealing. Frequent recombination involving DNA donors outside of S. enterica subspecies enterica is unlikely according to both scenarios because the resulting nucleotide differences would be >1.2%. We performed simulations in order to see whether the observed spatial distribution of low- and high-divergence regions was consistent with what is expected under the two scenarios. Although the two scenarios can each give quite distinct signatures in some circumstances (e.g., shortly after convergence), we were able to obtain a good fit to the observed distribution of the lengths of the low- and high-divergence regions under both scenarios (Fig. 5) by adjusting average tract length and other parameters (see Methods).
These simulations showed that, because recombination events sometimes overlap, the actual number of imports may have been much higher than the observed number of boundaries. The simulations in Figure 5 involved 870 imports in the convergence scenario and 3979 imports under divergence. The mean tract length in both cases was 1700 base pairs. In both simulations, a constant rate of 0.18% nucleotide differences was used for the low divergence regions. This gives a better fit than using a uniform distribution between 0% and 0.3% (data not shown). This observation is expected under the divergence scenario, but is also consistent with convergence if recombination occurred within a short time span. Although the spatial distribution of low and high divergence is consistent with both scenarios, the pattern that we observe between Paratyphi A and Typhi is qualitatively and quantitively different from the patterns observed in the recent divergence between Enteritidis and Gallinarum (Fig. 1, cf. C and D). The Enteritidis and Gallinarum genomes differ by approximately half as many mutations per kilobase as observed in the low-divergence region of Typhi and Paratyphi A, but the fraction of the genome in high-divergence regions is 20-fold lower. The divergence scenario would, therefore, require a 10-fold higher rate of recombinational divergence between Typhi and Paratyphi A than observed between Enteritidis and Gallinarum. Moreover, most of the high-divergence regions between Enteritidis and Gallinarum are contiguous (not shown), apparently reflecting one large event, as opposed to more than a hundred high-divergence regions in the Typhi and Paratyphi A comparison. The pattern of import that would be required under the divergence scenario is, therefore, quantitatively and qualitatively different from the one that took place in the divergence of Enteritidis and Gallinarum, making this scenario less likely.
Possible markers of recombination
Evidence from MLST data
Genetic imports from other members of S. enterica We used genomes from seven serovars (Enteritidis, Typhimurium, Paratyphi B, Dublin, Choleraesuis, Hadar, and Infantis) to search for putative imports required by the divergence scenario. Only very few Paratyphi A or Typhi genes can be attributed to import from one of these seven serovars. Except for 106 kb (3% of the genome), all other coding sequences differed by >0.3% estimated divergence from their orthologs in all seven genomes (Supplemental Table S2). Many of these homologous regions probably do not represent true imports from the seven lineages. For example, one homologous stretch included a cluster of 12 ribosomal RNA genes that showed low divergence in all pairwise comparisons, with the lowest pairwise difference being between Paratyphi A and Typhi. Other putative imports were too short to be convincing (<1 kb) and probably represent false positives. Only two to four regions may represent true imports, totaling 1015 kb (Supplemental Table S2).
Thus, <0.5% of the imports needed under the divergence hypothesis can be attributed to close relatives of Enteritidis, Typhimurium, Paratyphi B, Dublin, Choleraesuis, Infantis, or Hadar. Under the divergence hypothesis, we should have detected a much higher number of imports from one of the seven serovars. These serovars make up a substantial proportion of the S. enterica gene pool, at least according to current snapshots provided by serotyping and MLST of diverse strain collections. The seven serovars represent >50% of Salmonella isolates from chickens, pigs, cattle, and humans in the Netherlands (van Duijkeren et al. 2002 The frequency of specific lineages and serovars can change over time and our knowledge about Salmonella is doubtless biased toward humans and agricultural reservoirs, so that the currently common lineages in our databases might constitute only part of the historical gene pool. This argument seems unlikely to explain our inability to identify a source of imports: The seven test genomes exhibit significantly more genetic exchange with each other than with either Typhi or Paratyphi A, with a higher proportion of the pairwise comparisons between the seven test genomes showing divergence levels <0.3% (Fig. 1, cf. A and B; Supplemental Fig. S1, cf. A and B). Secondly, for the six MLST gene fragments that are located in high-divergence regions, the MLST database did not contain any strains with alleles more closely related to those of Typhi or Paratyphi A than those of the seven test genomes (Supplemental Table S3). Thus, the seven test genomes are as good, or better, than any other candidate in the MLST database as a source of genes for importation in the divergence scenario. We, therefore, reject the divergence scenario and infer that Typhi and Paratyphi A have converged by homologous recombination.
Summary of findings We have described a surprising pattern of genomic relatedness between Typhi and Paratyphi A. Three quarters of their genomes show a level of genetic similarity that is typical of distantly related members of the species S. enterica, both in gene content and nucleotide divergence. The remaining quarter, however, is much more similar in both aspects, with average nucleotide divergence of 0.18% instead of 1.2%. There are two quite different scenarios that could explain this pattern, recent convergence or recent divergence. Both explanations require an atypical and unclock-like pattern of recombination, when compared with other S. enterica. We rejected the divergence hypothesis. The divergence hypothesis would require several-fold more recombination events with other members of S. enterica than convergence. This rate of recombination would also have needed to be much higher for Typhi and Paratyphi A than for Enteritidis and Gallinarum. Most importantly, we were unable to identify a plausible origin of the imported genes amongst seven other sequenced genomes of S. enterica, although these are as similar to Typhi and Paratyphi A as any of almost 600 strains that have been MLST-typed.
We infer that convergence has taken place, which implies an extremely elevated rate of recombination between Typhi and Paratyphi A. That spate of recombination probably occurred over a short period that preceded the most recent population bottlenecks in the history of Typhi and Paratyphi A. Only a few genes diverge between these genomes by <0.1%, with a clear mode at 0.18% that presumably reflects the accumulation of mutations subsequent to recombination. In comparison, the bottleneck within Typhi is associated with average synonymous sequence variation of
Role of selection However, despite its evolutionary logic, it is difficult to find unambiguous evidence for the importance of selection. Each of the 124 low-divergence regions contains several genes. Adaptive convergence would require only a fraction of these to be positively selected, which may explain why we were unable to find any overrepresentation of particular categories of genes within the low-divergence regions (Supplemental Table S1). One particularly interesting feature of the low divergence regions is that they contain an overrepresentation of rare genes shared between Typhi and Paratyphi A, compared both with the high-divergence regions and with the number of rare genes shared between Typhi and other strains (Figs. 2, 3). The 83 shared rare genes found in homologous low-divergence regions thus represent particularly good candidates for adaptive convergence. Unfortunately, the existence of these shared genes does not indicate the directionality of exchange or prove the role of selection.
Means of convergence
Our inability to detect a suitable candidate phage within the genome of Typhi or Paratyphi A is not crucial. Firstly, if the putative phage was lyric it would not have integrated into the host cell genome. Secondly, the fact that recombination occurred over a short time span suggests that the phage was lost in a bottleneck subsequent to its epidemic spread. Thirdly, one or both lineages may have evolved resistance to the phage. Indeed, the most closely related genes within the two genomes encode SopE
This scenario implies that a transient mechanism or transient ecological conditions, such as high coinfection rates, can facilitate highly nonrandom patterns of recombination. This recombination can in turn lead to a saltation in which a hybrid strain or strains are rapidly assembled from the constituent parts of two or more lineages. Hybridization has been shown to facilitate the evolution of novel and extreme phenotypes in animal (Schwarz et al. 2005
Conclusions Whatever mechanism led to the convergence of Typhi and Paratyphi A, our results highlight the potential importance of homologous recombination as an evolutionary force, in addition to the more widely recognized mechanisms of lateral gene transfer and gene degradation. They also demonstrate that genomic patterns of recombination can be highly nonrandom, even within a single species of bacteria, which has important consequences for our understanding of bacterial evolution and the evolution of pathogenicity.
Genomes In addition to Paratyphi A ATCC 9150 (McClelland et al. 2004
Alignments
The prior model
Bayesian inference
Simulation of convergence and divergence scenarios
Functional analysis
We thank Rory Bowden, Angus Buckling, Peter Donnelly, Ed Feil, Ichizo Kobayashi, Myrone Levine, Gil McVean, Simon Myers, Phillipe Roumagnac, Brian Spratt, John Wain, and two anonymous reviewers for providing helpful comments or suggestions. This work was funded by the Wellcome Trust.
4 Corresponding author.
E-mail falush{at}stats.ox.ac.uk; fax +44-1865-272595. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5512906
Beuzon, C.R., Chessa, D., and Casadesus, J. 2004. IS200: An old and still bacterial transposon. Int. Microbiol. 7: 312.[Medline] Bisharat, N., Cohen, D.I., Harding, R.M., Falush, D., Crook, D.W., Peto, T., and Maiden, M.C. 2005. Hybrid Vibrio vulnificus. Emerg. Infect. Dis. 11: 3035.[Medline] Brown, E.W., Mammel, M.K., LeClerc, J.E., and Cebula, T.A. 2003. Limited boundaries for extensive horizontal gene transfer among Salmonella pathogens. Proc. Natl. Acad. Sci. 100: 1567615681. Carver, T.J., Rutherford, K.M., Berriman, M., Rajandream, M.A., Barrell, B.G., and Parkhill, J. 2005. ACT: The Artemis comparison tool. Bioinformatics 21: 34223423. . Centers for Disease Control. 2005. Salmonella surveillance: Annual summary, 2004. US Department of Health and Human Services, Atlanta, Georgia. Chiu, C.H., Tang, P., Chu, C.S., Hu, S.N., Bao, Q.Y., Yu, J., Chou, Y.Y., Wang, H.S., and Lee, Y.S. 2005. The genome sequence of Salmonella enterica serovar Choleraesuis, a highly invasive and resistant zoonotic pathogen. Nucleic Acids Res. 33: 16901698. Darling, A.C.E., Mau, B., Blattner, F.R., and Perna, N.T. 2004. Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14: 13941403. Green, P.J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711732. Handa, N., Ohashi, S., Kusano, K., and Kobayashi, I. 1997. Chi*, a chi-related 11-mer sequence partially active in an E coli recC1004 strain. Genes Cells 2: 525536.[Abstract] Hughes, A.L. and Friedman, R. 2004. Patterns of sequence divergence in 5' intergenic spacers and linked coding regions in 10 species of pathogenic bacteria reveal distinct recombinational histories. Genetics 168: 17951803. Husmeier, D. and McGuire, G. 2003. Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. Mol. Biol. Evol. 20: 315337. Jolley, K.A., Wilson, D.J., Kriz, P., Mcvean, G., and Maiden, M.C.J. 2005. The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. Mol. Biol. Evol. 22: 562569. Joshi, S., Wattal, C., Sharma, A., and Prasad, K.J. 2002. Mixed Salmonella infectionA case report. Indian J. Med. Microbiol. 20: 113114.[Medline] Kidgell, C., Reichard, U., Wain, J., Linz, B., Torpdahl, M., Dougan, G., and Achtman, M. 2002. Salmonella typhi, the causative agent of typhoid fever, is approximately 50,000 years old. Infect. Genet. Evol. 2: 3945.[CrossRef][Medline] Linz, B., Schenker, M., Zhu, P., and Achtman, M. 2000. Frequent interspecific genetic exchange between commensal Neisseriae and Neisseria meningitidis. Mol. Microbiol. 36: 10491058.[CrossRef][Medline] Maiden, M.C., Bygraves, J.A., Feil, E., Morelli, G., Russell, J.E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., and Caugant, D.A., et al. 1998. Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. 95: 31403145. McClelland, M., Sanderson, K.E., Clifton, S.W., Latreille, P., Porwollik, S., Sabo, A., Meyer, R., Bieri, T., Ozersky, P., and McLellan, M., et al. 2004. Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat. Genet. 36: 12681274.[CrossRef][Medline] Miesel, L. and Roth, J.R. 1994. Salmonella recD mutations increase recombination in a short sequence transduction assay. J. Bacteriol. 176: 40924103. Parkhill, J., Dougan, G., James, K.D., Thomson, N.R., Pickard, D., Wain, J., Churcher, C., Mungall, K.L., Bentley, S.D., and Holden, M.T.G., et al. 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413: 848852.[CrossRef][Medline] Pelludat, C., Mirold, S., and Hardt, W.D. 2003. The SopE Phi phage integrates into the ssrA gene of Salmonella enterica serovar typhimurium A36 and is closely related to the Fels-2 prophage. J. Bacteriol. 185: 51825191. Pickard, D., Wain, J., Baker, S., Line, A., Chohan, S., Fookes, M., Barron, A., Gaora, P.O., Chabalgoity, J.A., and Thanky, N., et al. 2003. Composition, acquisition, and distribution of the Vi exopolysaccharide-encoding Salmonella enterica pathogenicity island SPI-7. J. Bacteriol. 185: 50555065. Porwollik, S., Boyd, E.F., Choy, C., Cheng, P., Florea, L., Proctor, E., and McClelland, M. 2004. Characterization of Salmonella enterica subspecies I genovars by use of microarrays. J. Bacteriol. 186: 58835898. Redfield, R.J. 2001. Do bacteria have sex? Nat. Rev. Genet. 2: 634639.[CrossRef][Medline] Rieseberg, L.H., Raymond, O., Rosenthal, D.M., Lai, Z., Livingstone, K., Nakazato, T., Durphy, J.L., Schwarzbach, A.E., Donovan, L.A., and Lexer, C. 2003. Major ecological transitions in wild sunflowers facilitated by hybridization. Science 301: 12111216. Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M.A., and Barrell, B. 2000. Artemis: Sequence visualization and annotation. Bioinformatics 16: 944945. Schicklmaier, P. and Schmieger, H. 1995. Frequency of generalized transducing phages in natural isolates of the Salmonella typhimurium complex. Appl. Environ. Microbiol. 61: 16371640.[Abstract] Schwarz, D., Matta, B.M., Shakir-Botteri, N.L., and McPheron, B.A. 2005. Host shift to an invasive plant triggers rapid animal hybrid speciation. Nature 436: 546549.[CrossRef][Medline] Shaw, M., Cooper, L., Xu, X., Thompson, W., Krauss, S., Guan, Y., Zhou, N., Klimov, A., Cox, N., and Webster, R., et al. 2002. Molecular changes associated with the transmission of avian influenza A H5N1 and H9N2 viruses to humans. J. Med. Virol. 66: 107114.[CrossRef][Medline] Smith, J.M., Smith, N.H., ORourke, M., and Spratt, B.G. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. 90: 43844388. Suchard, M.A., Weiss, R.E., Dorman, K.S., and Sinsheimer, J.S. 2003. Inferring spatial phylogenetic variation along nucleotide sequences: A multiple changepoint model. J. Am. Stat. Assoc. 98: 427437.[CrossRef] Suerbaum, S., Smith, J.M., Bapumia, K., Morelli, G., Smith, N.H., Kunstmann, E., Dyrek, I., and Achtman, M. 1998. Free recombination within Helicobacter pylori. Proc. Natl. Acad. Sci. 95: 1261912624. Uzzau, S., Brown, D.J., Wallis, T., Rubino, S., Leori, G., Bernard, S., Casadesus, J., Platt, D.J., and Olsen, J.E. 2000. Host adapted serotypes of Salmonella enterica. Epidemiol. Infect. 125: 229255.[CrossRef][Medline] van Duijkeren, E., Wannet, W.J.B., Houwers, D.J., and van Pelt, W. 2002. Serotype and phage type distribution of Salmonella strains isolated from humans, cattle, pigs, and chickens in the Netherlands from 1984 to 2001. J. Clin. Microbiol. 40: 39803985. Zhu, P., van der Ende, A., Falush, D., Brieske, N., Morelli, G., Linz, B., Popovic, T., Schuurman, I.G., Adegbola, R.A., and Zurth, K., et al. 2001. Fit genotypes and escape variants of subgroup III Neisseria meningitidis during three pandemics of epidemic meningitis. Proc. Natl. Acad. Sci. 98: 52345239.
Received May 17, 2006; accepted in revised format August 31, 2006. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||