|
|
|
|
Published online before print
December 8, 2004, 10.1101/gr.3002305 Genome Res. 15:98-110, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00
Chicken Special/Letter Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages1 Genome Institute of Singapore, Singapore 138672, Republic of Singapore 2 European Molecular Biology Laboratory, 69117 Heidelberg, Germany 3 Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, USA 4 Department of Mathematics, University of California, San Diego, La Jolla, California 92093, USA
Molecular evolution studies are usually based on the analysis of individual genes and thus reflect only small-range variations in genomic sequences. A complementary approach is to study the evolutionary history of rearrangements in entire genomes based on the analysis of gene orders. The progress in whole genome sequencing provides an unprecedented level of detailed sequence data to infer genome rearrangements through comparative approaches. The comparative analysis of recently sequenced rodent genomes with the human genome revealed evidence for a larger number of rearrangements than previously thought and led to the reconstruction of the putative genomic architecture of the murid rodent ancestor, while the architecture of the ancestral mammalian genome and the rate of rearrangements in the human lineage remained unknown. Sequencing the chicken genome provides an opportunity to reconstruct the architecture of the ancestral mammalian genome by using chicken as an outgroup. Our analysis reveals a very low rate of rearrangements and, in particular, interchromosomal rearrangements in chicken, in the early mammalian ancestor, or in both. The suggested number of interchromosomal rearrangements between the mammalian ancestor and chicken, during an estimated 500 million years of evolution, only slightly exceeds the number of interchromosomal rearrangements that happened in the mouse lineage, over the course of about 87 million years.
Whole genome sequencing provides an unprecedented level of detailed sequence data for comparative studying of genome organizations beyond the level of individual genes, and highlighting rearrangements shaping our genomes. The analysis revealed evidence for a larger number of rearrangements than previously thought, and shed some light on previously unknown features of eukaryotic evolution (Lander et al. 2001
Genome rearrangement studies start with identification of corresponding orthologous regions in different genomes. The definition of orthologous regions has been gradually shifting from requirements of DNA-level alignment, corresponding to strict conservation of gene order and orientation and applicable to genomes as close as human and mouse (Lander et al. 2001
Rearrangement analyses were initially restricted to unichromosomal genomes (Palmer and Herbon 1988
In the current study, we considered two different types of evidence to establish orthologous genomic regions, referred to in the manuscript as synteny blocks. One stems from DNA-DNA alignments, referred as sequence-based data, and the other from protein-protein alignments, referred to as gene-based data. We then applied GRIMM-Synteny (Pevzner and Tesler 2003
The analysis of rearrangements in human, mouse, rat, and chicken was done on two distinct data sets. The first data set consisted of gene-based data while the second was obtained directly from sequence-based data; see Methods for further details, and Figure 1 for an illustration. As expected, the results highlight advantages and disadvantages of the two types of inputs, but overall the high level of similarity between the solutions confirms the robustness of the method.
Gene-based data To generate synteny blocks using genes as anchors, we started from a set of 6447 four-way orthologous genes, pre-filtered for evidence of conserved pairwise synteny using SyntQL (Zdobnov et al. 2002
Using these 586 four-way blocks, GRIMM reveals evidence of at least 441 pairwise rearrangements (of the order and orientation of the whole blocks) between chicken and human, 511 between chicken and mouse, and 506 between chicken and rat. Using the same blocks suggests at least 219 rearrangements between human and mouse, 220 between human and rat, and 75 between mouse and rat. The fact that these last three numbers are 25% smaller than the ones described in Bourque et al. (2004 25% reduction in the number of blocks due to the inclusion of a fourth genome and to the use of genes instead of similarity anchors. Running MGR on the 586 synteny blocks generates a rearrangement scenario, and two putative ancestors shown in Figure 2. MGR uses heuristics to attempt to minimize the number of rearrangements on the tree, but does not guarantee the result is a most parsimonious solution. On the recovered tree, there are 73 rearrangements between human and MA and 389 between chicken and MA. There are also 38 rearrangements between mouse and RA and 41 between rat and RA. Finally, there are 122 rearrangements between MA and RA. These numbers are lower bounds based on the assumption that the tree is correct; it is possible that nature used a less efficient sequence of steps or operations besides those considered.6 These numbers, normalized using the MA-human edge, are displayed in Table 2.
The rearrangement scenario corresponding to Figure 2 confirms a high ratio of intrachromosomal versus interchromosomal rearrangements on the chicken edge. This was computed by attempting to maximize the number of inversions on the MA-chicken edge while staying within the constraint of 389 steps; however, it is only an approximation, because (1) there could be an alternative sequence of 389 steps with a higher ratio, and (2) intrachromosomal rearrangements could have been mimicked by interchromosomal rearrangements. This ratio varies: 2.8 on the MA-chicken edge, 1.7 on the RA-rat edge, 1.4 on MA-human edge, 0.7 on the RA-mouse edge, and 0.7 on the MA-RA edge.7 The number of interchromosomal rearrangements on the path from human to chicken (132) is only slightly higher than the number of interchromosomal rearrangements on the path from human to mouse and rat (124 and 116). It implies an extremely slow rate of interchromosomal rearrangements along the chicken edge in the evolutionary tree: 0.19 rearrangements per million years on the MA-chicken edge as compared to 0.34 on the MA-human edge and 1.1 on the MA-mouse edge.8
Figure 2 also reveals a large number of inversions that scramble the genomic make-up of individual chromosomes. For example, chicken Chromosome 19 (GGA19) is "built" from synteny blocks residing on only two human chromosomes (HSA7 and HSA17) that form a complex shuffle represented by eight different unicolored segments (four from HSA7, alternating with four from HSA17, and those in turn contain intrachromosomal rearrangements). These segments likely arose from a translocation of HSA7 and HSA17, creating a chromosome that was further shuffled by inversions on the evolutionary path between human and chicken. To sort out some of the intrachromosomal (inversions) from the interchromosomal (translocations/fusions/fissions) rearrangements and in an attempt to "reverse" history, we perform a maximal number of initial inversions in the four starting genomes,9 thus making every chromosome less shuffled than it appears in Figure 2. By performing all these initial inversions, we reduce the number of four-way blocks from 586 down to 311 new "pre-ancestral" synteny blocks. We call these four modified genomes "pre-ancestors."10 Figure 3 illustrates the proposed organization of these pre-ancestors and of the two ancestors (RA and MA) using this smaller number of segments.
Because the pairwise distances between the initial genomes are substantial, it is possible to find alternative ancestors also minimizing the total number of rearrangement events on the evolutionary tree. By exploring some of these alternative ancestors (see Methods), we can partition all the adjacencies of the recovered ancestor into "strong" and "weak" adjacencies depending on whether they are present or not present in all of the observed alternative ancestors.11 In run gene7, we find 524 strong adjacencies and 83 weak adjacencies12 (see Fig. 2). Many of the previously postulated chromosome associations of the placental ancestor correspond to strong adjacencies in MA. These associations are 3/21, 4/8, 12/22a, and 12/22b (Murphy et al. 2003
The reconstruction of the murid rodent ancestor, RA, is also coherent with the reconstruction of the same ancestor in Bourque et al. (2004 Some large regions of mammalian genomes are extremely well conserved across many species. The X-chromosome is one such example where the limited amount of exchange of genetic material with the other chromosomes (see Fig. 1) allows a detailed analysis of its rearrangement history. In the most parsimonious rearrangement scenario of the 17 synteny blocks on the X-chromosome of human-mouse-rat and of the homologous blocks on chicken Chromosomes 1 and 4, there are 20 rearrangements in total. There are no rearrangements between human and MA (i.e., human order is ancestral), 14 rearrangements (13 inversions and one fusion) between chicken and MA, two inversions between MA and RA, one inversion between mouse and RA, and three inversions between rat and RA. Moreover, the scenario recovered is optimal and MA and RA are unique. For that optimal score, the number of steps on each edge is unique, but the specific order is not. The set of blocks on the human X-chromosome is an example of a set of blocks that is not interrupted by any foreign block (a block outside that set) in any of the four genomes (although blocks can reside on multiple chromosomes in each genome). We call this a set of contiguous blocks. Large sets of contiguous blocks are interesting because they can be analyzed for rearrangements independently from the rest of the genome. HSA8p and HSA13 is an example of a well-conserved region, but, unfortunately, it does not form a set of contiguous block in run gene7 because the blocks from HSA8p are interrupted on GGA3 by blocks from HSA2 and on GGA4 by blocks from HSA4 (see Fig. 2). Blocks from HSA8p on GGA3-4 were probably interspersed with other blocks by a series of more recent inversions. Fortunately, we can undo these inversions by using the pre-ancestors (see Fig. 3A) where HSA8p and HSA13 do form a set of contiguous blocks. This specific region is shown in Figure 3B. The rearrangement scenario highlights how some chromosomes are well preserved in chicken but shuffled in rodent, whereas others are well conserved in rodent but more shuffled in chicken.
Sequence-based data
We find that there are about four to five times as many total rearrangements between chicken and MA as between human and MA. Between the rodents and MA, there are about twice as many. For interchromosomal rearrangements, there are about three times as many between chicken and MA as there are between human and MA, and there are also three times as many between each rodent and MA as between human and MA. Some of the putative ancestral human chromosome associations were observed in all runs (3/21, 4/8, 12/22a, and 12/22b), while others were only recovered in some of the runs (14/15, 16/19) or not recovered at all (7/16). Some human chromosome syntenies were systematically preserved in MA (13, 14, 20, 21, and X) and similarly, some chicken chromosome syntenies were found in MA for all runs (20, 21, 23, 24, and 27). Many of the discrepancies observed in Table 3 can be explained by differences in the coverage of gene-based data versus sequence-based data. For instance, two chicken chromosome syntenies (22, 32) are only observed in sequence-based data mainly because they are covered by fewer blocks (1 block each in run 300K) (see Fig. 4) in those data sets. Chicken Chromosome 16 is an even more drastic example: in gene-based data a single short block represents it, whereas in sequence-based data, it is not represented at all.
Microrearrangement scenarios
We separately ran MGR within the same synteny blocks while imposing the known topology for human-mouse-rat-chicken; this only leads to a slight increase in the total number of microrearrangements (see Table 5).
In Table 6, we compute the proportion of the microrearrangements over the different edges of the tree. The results show that most of the microrearrangements (
Choosing a set of orthologous genes rather than the "similarity anchors" as in Gibbs et al. (2004 Using GRIMM-Synteny on "similarity anchors" also has advantages. For instance, it allows avoiding the identification of orthologous versus paralogous copies of genes. It is also less affected by high-copy-number gene families (such as kinases or GPCRs). Moreover, it retains information in regions outside of exons. This extra information can be useful for the analysis of microrearrangements, but it can also be an asset in the reconstruction of the ancestral genomes by keeping stronger footprints of past events. Although the two initial data sets are different, the properties of the evolutionary tree and of the reconstructed ancestors obtained at various thresholds are largely consistent. We observed a high ratio of inversions over all types of rearrangements in chicken but, overall, a relatively slow rate of rearrangements in this lineage. The large number of inversions in chicken can also be confirmed by the analysis microrearrangements within the synteny blocks. Finally, we also observed an accelerated rate of interchromosomal rearrangements in rodents. Future developments could include more systematic ways of comparing synteny blocks generated using different programs, different sets of parameters, different types of input (e.g., gene vs. sequence data) but also different sets of initial genomes. Similarly, new metrics could also be developed to compare the ancestral reconstructions not only at the level of chromosome associations, chromosome syntenies, and rates of rearrangements but also at the level of actual synteny blocks and suggested adjacencies. Such metrics would need to account for the multiplicity of alternative solutions.
We compared these assemblies: Human (NCBI build 34, July 2003; UCSC hg16); Mouse (NCBI build 30, Feb. 2003; UCSC mm3); Rat (Baylor HGSC v. 3.1, June 2003; UCSC rn3); and Chicken (WUSTL Feb.2004, UCSC galGal2).
Gene-based comparisons used a set of 6447 four-way orthologous genes obtained by intersection of less strictly defined pairwise synteny maps, requiring at least two neighboring orthologous genes but allowing for up to four intervening genes, computed with SyntQL (Zdobnov et al. 2002
Sequence-based comparisons used alignments computed by Angie Hinrichs in the UCSC consortium using BLASTZ, MULTIZ, and other tools (Kent et al. 2003
GRIMM-Synteny parameters In the original GRIMM-Synteny, we then discarded blocks whose span was below a minimum size in human. In this study, we set minimums in all species. In the present study, we discarded blocks whose span was (strictly) below a minimum size Ci in any species; in the sequence-based runs 100K, 200K, and 300K, we chose to set Ci = Gi. Occasionally, the blocks that make it past this filter will have conflicting coordinates in one species (the coordinate interval of block A is a subinterval of block B in one species), in which case we split the blocks up to resolve this. We then merged together blocks that form a strip of consecutive blocks in the exact same order (allowing an overall flip) and chromosome window in all genomes, without interruption by other blocks. This step is appropriate for analyzing rearrangements, but may not be appropriate for other purposes. Our "signed" gene-based data was analyzed with the same procedure but with a "gene" metric instead of a "nucleotide" metric; all genes in the data set are assigned an identical size. Specifically, the j-th consecutive gene in a chromosome is assigned span 2j through 2j + 1, and the orientation of the gene determines which coordinate is the "start" and which is the "end." Our "gene7" data set used throughout this paper uses the procedure described above with Gi = 7 for all species. Two genes A and B are joined together if there are up to two intervening genes between them in every species, with certain constraints on flipping A and B: at two intervening genes, the relative orientations of A and B must be the same in all species or one of them can be flipped, and with less than two intervening genes, either or both can be flipped. Next, we discarded blocks supported by less than three genes, and finally, we merged together strips of blocks as before. The gene4, gene10, and gene20 data sets similarly used 4, 10, or 20 in place of 7. Gi = 4 (respectively, 10 or 20) has the effect of allowing 1 (respectively, 4 or 9) intervening gene, but A and B must remain in the same relative orientation in all species, or up to 0 (respectively, 3 or 8) intervening genes, but A and B may be flipped arbitrarily.
In contrast, the data set in Murphy et al. (2003
Search for alternative ancestors
Specifically, in run gene7, we found a list of 83 rearrangements in MA that did not increase the overall tree score. Using a breadth-first-search approach, we looked at all 83 corresponding alternative ancestors (at distance 1 from MA). We generated a new list of rearrangements that did not increase the overall tree score for each of these alternative ancestors. Since most of the rearrangements from the initial list of 83 are commutative, the number of alternative ancestors at distance 2 from MA is We also used a depth-first-search approach to look for different alternative ancestors of MA. Starting only from the first alternative ancestor identified at distance 1 from MA, we found 80 alternative ancestors at distance 2. Starting from the first of these alternative ancestors, we found 80 new alternative ancestors but at distance 3 from MA. We repeated the process iteratively. When the first ancestor at distance x did not suggest any ancestor at distance x + 1, we moved down to another ancestor at distance x. When we ran out of ancestors at distance x, we stepped back to the list of ancestors at distance x - 1. In practice, we stopped once we reached a total of 3000 distinct ancestors found at various distances from MA. In run gene7, the distance between MA and the alternative ancestors found with the depth-first-search approach ranged from 1 to 38 rearrangements.
We are grateful to LaDeanna Hillier, Ross Hardison, Bill Murphy, Lior Pachter, and Angie Hinrichs for many helpful discussions and suggestions. We also thank Angie Hinrichs for providing the sequence-based alignments and the anonymous reviewers for valuable recommendations. G.B. is supported by a fellowship of the Fonds Québecois de la Recherche sur la Nature et les Technologies.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3002305. Article published online before print in December 2004.
5 Corresponding author. [Supplemental material is available online at www.genome.org.]
6 This is especially true for long edges. An alternative approach would be to use a statistical model (Larget et al. 2002
7 Although these ratios depend on the level of microrearrangements tolerated, we found the ratio on the MA-chicken edge to be consistently higher in all runs as compared to the one on other edges.
8 We are using estimated divergence times of 16 milion years ago (Mya) for RA and 87 Mya for MA (Springer et al. 2003
9 The "maximal number of initial inversions" is the maximum we found, but there could be an alternative sequence with the same edge length and even more initial inversions.
10 This definition of pre-ancestor differs from the one in Murphy et al. (2003
11 The number of weak adjacencies that we determine is actually a lower bound because we only explore a subset of all the alternative solutions.
12 The total number of adjacencies is the total number of blocks plus the total number of chromosomes. Adjacencies include both "internal adjacencies" (blocks adjacent within a chromosome) and "external adjacencies" (blocks adjacent to a chromosome end).
13 A topology is said to agree with another if the sets of partitions of leaves defined by the internal edges with at least one rearrangement are the same.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Bafna, V. and Pevzner, P. 1995. Sorting by reversals: Genome rearrangements in plant organelles and evolutionary history of X chromosome. Mol. Biol. Evol. 12: 239-246. Blanchette, M., Kunisawa, T., and Sankoff, D. 1999. Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 49: 193-203.[CrossRef][Medline]
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708-715.
Bourque, G. and Pevzner, P.A. 2002. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Res. 12: 26-36.
Bourque, G., Pevzner, P.A., and Tesler, G. 2004. Reconstructing the genomic architecture of ancestral mammals: Lessons from human, mouse, and rat genomes. Genome Res. 14: 507-516. Cosner, M.E., Jansen, R.K., Moret, B.M., Raubeson, L.A., Wang, L.S., Warnow, T., and Wyman, S. 2000. A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8: 104-115.[Medline] Gibbs, R.A., Weinstock, G.M., Metzker, M.L., Muzny, D.M., Sodergren, E.J., Scherer, S., Scott, G., Steffen, D., Worley, K.C., Burch, P.E., et al. 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428: 493-521.[CrossRef][Medline] Hannenhalli, S. and Pevzner, P. 1995. Transforming men into mice: Polynomial algorithm for genomic distance problem. Thirty-Sixth IEEE Symposium on Foundations of Computer Science, pp. 581-592. IEEE Press, Los Alamos, CA. Hedges, S.B. and Kumar, S. 2004. Precision of molecular time estimates. Trends Genet. 20: 242-247.[CrossRef][Medline]
Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. 2003. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. 100: 11484-11489. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.[CrossRef][Medline] Larget, B., Simon, D.L., and Kadane, B.J. 2002. Bayesian phylogenetic inference from animal mitochondrial genome arrangements. J. Roy. Stat. Soc. B 64: 681-695.[CrossRef]
Murphy, W.J., Sun, S., Chen, Z., Yuhki, N., Hirschmann, D., Menotti-Raymond, M., and O'Brien, S.J. 2000. A radiation hybrid map of the cat genome: Implications for comparative mapping. Genome Res. 10: 691-702.
Murphy, W.J., Eizirik, E., O'Brien, S.J., Madsen, O., Scally, M., Douady, C.J., Teeling, E., Ryder, O.A., Stanhope, M.J., de Jong, W.W., et al. 2001. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294: 2348-2351. Murphy, W.J., Bourque, G., Tesler, G., Pevzner, P., and O'Brien, S.J. 2003. Reconstructing the genomic architecture of mammalian ancestors using multispecies comparative maps. Hum. Genom. 1: 30-40.
O'Brien, S.J., Menotti-Raymond, M., Murphy, W.J., Nash, W.G., Wienberg, J., Stanyon, R., Copeland, N.G., Jenkins, N.A., Womack, J.E., and Marshall Graves, J.A. 1999. The promise of comparative genomics in mammals. Science 286: 458-462, 479-481. Palmer, J.D. and Herbon, L.A. 1988. Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J. Mol. Evol. 28: 87-97.[CrossRef][Medline]
Pevzner, P. and Tesler, G. 2003. Genome rearrangements in mammalian evolution: Lessons from human and mouse genomes. Genome Res. 13: 37-45. Reisz, R.R. and Muller, J. 2004. Molecular timescales and the fossil record: A paleontological perspective. Trends Genet. 20: 237-241.[CrossRef][Medline]
Sankoff, D., Leduc, G., Antoine, N., Paquin, B., Lang, B.F., and Cedergren, R. 1992. Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome. Proc. Natl. Acad. Sci. 89: 6575-6579. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. 2003. Human-mouse alignments with BLASTZ. Genome Res. 13: 103-107.
Springer, M.S., Murphy, W.J., Eizirik, E., and O'Brien, S.J. 2003. Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc. Natl. Acad. Sci. 100: 1056-1061. Stanyon, R., Stone, G., Garcia, M., and Froenicke, L. 2003. Reciprocal chromosome painting shows that squirrels, unlike murid rodents, have a highly conserved genome organization. Genomics 82: 245-249.[CrossRef][Medline] Tesler, G. 2002a. Efficient algorithms for multichromosomal genome rearrangements. J. Comp. Sys. Sci. 65: 587-609.[CrossRef]
. 2002b. GRIMM: Genome rearrangements web server. Bioinformatics 18: 492-493. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562.[CrossRef][Medline]
Zdobnov, E.M., von Mering, C., Letunic, I., Torrents, D., Suyama, M., Copley, R.R., Christophides, G.K., Thomasova, D., Holt, R.A., Subramanian, G.M., et al. 2002. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 298: 149-159.
Received July 14, 2004; accepted in revised format October 4, 2004. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||