Published online before print
December 30, 2002, 10.1101/gr.757503
Vol 13, Issue 1, 37-45, January 2003
Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes
Pavel Pevzner1 and
Glenn Tesler
Department of Computer Science and Engineering, University of
California, San Diego, La Jolla, CA 92093-0114, USA
 |
ABSTRACT
|
|---|
Although analysis of genome rearrangements was pioneered by
Dobzhansky and Sturtevant 65 years ago, we still know very little about
the rearrangement events that produced the existing varieties of
genomic architectures. The genomic sequences of human and mouse provide
evidence for a larger number of rearrangements than previously thought
and shed some light on previously unknown features of mammalian
evolution. In particular, they reveal that a large number of
microrearrangements is required to explain the differences in draft
human and mouse sequences. Here we describe a new algorithm for
constructing synteny blocks, study arrangements of synteny blocks in
human and mouse, derive a most parsimonious humanmouse rearrangement
scenario, and provide evidence that intrachromosomal rearrangements are
more frequent than interchromosomal rearrangements. Our analysis is
based on the humanmouse breakpoint graph, which reveals related
breakpoints and allows one to find a most parsimonious scenario.
Because these graphs provide important insights into rearrangement
scenarios, we introduce a new visualization tool that allows one to
view breakpoint graphs superimposed with genomic
dot-plots.
[Supplemental material is available online at
www.genome.org.]
Analysis of genome rearrangements in molecular
evolution was pioneered by Dobzhansky and Sturtevant (1938) , who
published a milestone paper with an evolutionary tree presenting a
rearrangement scenario with 17 inversions for the species
Drosophila pseudoobscura and Drosophila miranda.
Every genome rearrangement study involves solving a combinatorial
puzzle to find a series of genome rearrangements to transform one
genome into another. Palmer and co-authors (Palmer and Herbon 1988 )
pioneered studies of the shortest (most parsimonious) rearrangement
scenarios and applied this approach to plant mtDNA and cpDNA. Since
then, the analysis of the most parsimonious scenarios has become the
dominant approach in genome rearrangement studies. For unichromosomal
genomes, it usually amounts to analysis of inversions (also known as
reversals), which are the most common rearrangement events. The problem
of finding the minimum number of reversals to transform one
unichromosomal genome into another is known as the "reversal distance
problem." For multichromosomal genomes, the most common
rearrangements are reversals, translocations, fusions, and fissions,
and the number of such rearrangements in a most parsimonious scenario
is known as the "genomic distance" between multichromosomal
genomes.
Finding the reversal distance is a difficult combinatorial problem. In
the very first computational studies of genome rearrangements,
Watterson et al. (1982) and Nadeau and Taylor (1984) introduced the
notion of a breakpoint (disruption of gene order) and noticed some
correlations between the reversal distance and the number of
breakpoints (in fact, Sturtevant and Dobzhansky [1936] implicitly
discussed these correlations 65 years ago!). The shortcoming of early
genome rearrangement studies is that they considered breakpoints
independently without revealing combinatorial dependencies between
related breakpoints. The simplest example of related breakpoints are
two breakpoints formed by a single reversal. Kececioglu and Sankoff
(1995) were the first to recognize the importance of dependencies
between breakpoints and to come up with an approximation algorithm for
the reversal distance problem. The important result of Bafna and
Pevzner (1993 , 1996 ) is the construction of the "breakpoint graph,"
which reveals related breakpoints and allows one to find the most
parsimonious scenarios.
Based on the concept of the breakpoint graph, Hannenhalli and Pevzner
(1995a) developed a polynomial algorithm for the reversal distance
problem, that is, for computing a most parsimonious scenario to
transform one unichromosomal genome into another. This approach was
further extended to the genomic distance problem, that is, finding a
most parsimonious scenario for multichromosomal genomes under
inversions, translocations, fusions, and fissions of chromosomes
(Hannenhalli and Pevzner 1995a ; Tesler 2002b ). However, these results,
although useful, do not yet yield a meaningful estimate of the number
of the rearrangement events on the evolutionary path from mouse to
human. The problem is that the genomic sequences provide evidence for
both microrearrangements (e.g., intrachromosomal rearrangements with a
span <1 Mb) and macrorearrangements (e.g., intrachromosomal
rearrangements of larger span as well as interchromosomal
rearrangements). The existing rearrangement algorithms do not
distinguish between these two types of rearrangements. Because some
microrearrangements may be caused by fragment assembly errors, mixing
microrearrangements and macrorearrangements within one rearrangement
scenario may produce a distorted picture greatly influenced by the
sequencing errors in draft genomic sequences. Another difficulty is an
unreliable assignment of orthologs (false orthologs), which may create
an impression of a rearrangement that never happened (Tatusov et al.
1997 ). The conserved gene order can also be disrupted by recent
duplications and insertions (Hardison et al. 1997 ).
To address these complications, we first describe a new approach to
synteny block generation that separates microrearrangements from
macrorearrangements. It allows one to study microrearrangements and
macrorearrangements separately and to arrive at a new estimate of the
number of macrorearrangements that cover 170 Myr (million years) of
evolutionary distance between human and mouse. We also estimate the
number of microrearrangements (but it remains to be seen to what extent
this estimate is influenced by the fragment assembly errors) on the
evolutionary path between mouse and human.
 |
RESULTS
|
|---|
Synteny Blocks
In a pioneering paper, Nadeau and Taylor (1984) introduced the
notion of "conserved segments" (i.e., segments with preserved gene
orders without disruption by rearrangements) and estimated that there
are roughly 180 conserved segments in human and mouse. Later, Copeland
et al. (1993) , DeBry and Seldin (1996) , Waterston et al. (2002) , and
Gregory et al. (2002) confirmed these estimates. In the past decade,
the progress in understanding the evolutionary history of entire
genomes was mainly based on comparative genetic maps (O'Brien et al.
1999 ). However, these estimates suffer from low resolution of
comparative maps in certain genomic areas. Present genomic sequences
provide evidence that the human and mouse genomes are significantly
more rearranged than previously thought. Moreover, they indicate that a
large proportion of previously identified conserved segments are not
really conserved because there is evidence of multiple
microrearrangements in many of them (Mural et al. 2002 ). These
microrearrangements were not visible in the comparative genetic maps
that were used for defining 180 conserved segments in the past. We
study "synteny blocks" instead of conserved segments. Intuitively,
the synteny blocks are segments that can be converted into conserved
segments by microrearrangements (see the GRIMM-Synteny algorithm below
for a formal definition). The synteny blocks do not necessarily
represent areas of continuous similarity between two genomes. Instead,
they usually consist of short regions of similarity that may be
interrupted by dissimilar regions and gaps. Most synteny blocks are
subject to microrearrangements within these blocks.
We demonstrate that human and mouse genomes share 281 synteny blocks of
size at least 1 Mb (shown in Fig. 1a) and
that at least 245 rearrangements of these blocks occurred since the
divergence of human and mouse. The positions of these blocks in the
human and mouse genomes are given in Supplementary Materials (available
online at http://www.genome.org). The largest synteny block in the
human genome is 79.6 Mb, and the average block size is 9.6 Mb. The
largest synteny block in the mouse genome is 64.8 Mb, and the average
block size is 8.5 Mb.

View larger version (26K):
[in this window]
[in a new window]
|
Figure 1. (a) Human and mouse synteny blocks. Every block corresponds to
a rectangle, with a diagonal showing whether the arrangements of
anchors in human and mouse (within the synteny block) are the same or
reversed. (b) Combining anchors into clusters by the
GRIMM-Synteny algorithm at G = 100 kb. The edges in the
anchor graph connect the closest ends of the anchors. The anchors are
color-coded by the resulting clusters. At G = 1 Mb, this
forms a single cluster, which in turn forms a synteny block (the
lower right block in the human 18/mouse 17 rectangle in
a).
|
|
The overall size of syntenic blocks is 2707 Mb in human and 2397
Mb in mouse. The breakpoint regions (i.e., intervals between
consecutive syntenic blocks) vary and may be as large as 23.2 Mb in
human and 6.7 Mb in mouse. The average size of breakpoint regions is
668 kb in human and 458 kb in mouse. The overall size of the breakpoint
regions equals 172 Mb in human and 119 MB in mouse (although some of
these breakpoint regions may host shorter synteny blocks). There is
evidence of at least 3170 microrearrangements (reversals) that happened
within the synteny blocks (although many of them may be artifacts of
incorrect assemblies). This very high estimate of the number of
microrearrangements further confirms the conjecture that
microrearrangements are more common than previously thought (Carver and
Stubbs 1997 ; Puttagunta et al. 2000 ; Thomas et al. 2000 ; Kumar et al.
2001 ). In fact, this number does not even include the
microrearrangements within synteny blocks shorter than 1 Mb.
From Local Alignments to Synteny Blocks
Given two genomic sequences, how one can construct synteny blocks?
False ortholog assignments and microrearrangements make it nontrivial
to find the analogs of synteny blocks (conserved gene clusters) even in
shorter bacterial genomes (Fujibuchi et al. 2000 ; Lathe et al. 2000 ;
Wolf et al. 2001 ; Rogozin et al. 2002 ). In addition, humanmouse
sequence similarities in noncoding regions (Koop and Hood 1994 ; Thomas
et al. 2000 ) may further complicate ortholog assignments and make it
difficult to apply the methods developed in bacterial genomics to
construction of humanmouse synteny blocks.
Sankoff and Blanchette (1997) were the first to come up with an
algorithm for synteny block generation. However, their approach was
mainly intended for comparative mapping data. Below we describe a
different approach that is geared toward genomic sequences. To
construct the humanmouse synteny blocks, we start with bidirectional
best local similarities (also called anchors) between human and mouse
genomic sequences (Tatusov et al. 1997 ; Mural et al. 2002 ). Several
software tools have recently become available to generate such anchors
for entire mammalian genomes (Mayor et al. 2000 ; Schwartz et al. 2000 ;
Kent 2002 ; Ma et al. 2002 ). We assume that a set of nonoverlapping
anchors (local alignments between two genomes) is given and the goal is
to construct the synteny blocks based on these anchors. We study the
same versions of draft human and mouse sequences and the same set of
anchors that were used in Waterston et al. (2002) . This set of anchors
was provided by Michael Kamal at the Whitehead Institute and was
generated by PatternHunter (Ma et al. 2002 ). The set consists of
558,678 anchors with alignment lengths ranging from 409647 nt (the
mean is 340). We emphasize that these anchors do not necessarily
represent similarities within human and mouse genes but may also
represent similarities between noncoding regions. This is a departure
from the previous "gene order comparison" approach of genome
rearrangement studies. It allows us to bypass the difficult issues of
gene annotation and ortholog identification, which are not necessary
for genome rearrangement studies. This approach may miss similarities
between some genes at evolutionary distances where protein similarity
still exists but DNA similarity has faded away. However, this is not a
serious concern for the rather similar human and mouse genomes.
Moreover, our approach can be generalized to handle both DNA and
protein similarities in a unified framework.
We assume that human and mouse chromosomes are concatenated to form a
single coordinate system. An anchor that starts at position h
in the human genome and at position m in the mouse genome is
described by its starting point (h,m) in two
dimensions (2D). We remark that in reality the anchors are not points
(h,m) but diagonals in 2D described by the
coordinates (h,m) of an alignment start and the
length of the alignment. Such a coordinate system is shown in Figure
1a, with chromosomes dividing the plane into rectangles. We define the
distance between two points
(h1,m1) and
(h2,m2) from the same chromosome
pair (the same rectangle) as the Manhattan distance
||h2 h1|| +
||m2 m1||. The distance
between points from different chromosome pairs is defined as infinity.
The distance between two anchors is defined as the distance between
their closest ends.
Although the number of anchors may be very large (hundreds of
thousands), one still can apply fast genome rearrangement algorithms
(Tesler 2002b ) to find a most parsimonious scenario to transform the
order of anchors in human into an order of anchors in mouse. However,
this scenario will likely be unrealistic because many anchors may
correspond to false orthologs. Therefore, a technique to filter out
false orthologs (even at the expense of filtering some real orthologs)
is needed. False orthologs will often look like isolated points (or
"small clusters") in a genomic dot-plot, whereas synteny blocks
will be formed from clusters consisting of a larger number of points.
Figure 2a presents the genomic dot-plots
for anchors from the X chromosomes (a blowup of the XX rectangle from
Fig. 1a). A brief look at Figure 2a reveals 16 clusters (Fig. 2b).
Figure 2c presents rectified clusters that ignore the details of the
internal anchor arrangements in the clusters and represent every
cluster as a diagonal. These rectified clusters are further combined
into diagonals that correspond to 11 synteny blocks (Fig. 2d). Although
the synteny blocks in Figure 2d differ in size, the sizes of synteny
blocks are irrelevant for genome rearrangement algorithms. Figure 2e is
a symbolic representation of synteny blocks as units of the same size,
used in the construction of the breakpoint graph.

View larger version (40K):
[in this window]
[in a new window]
|
Figure 2. X-chromosome: from local similarities, to synteny blocks, to breakpoint
graph, to rearrangement scenario. (a) Dot-plot of anchors.
Anchors are enlarged for visibility. (b) Clusters of anchors.
(c) Rectified clusters. (d) Synteny blocks.
(e) Synteny blocks (symbolic representation as genome
rearrangement units). (f) 2D breakpoint graph superimposed on
synteny blocks. The projections of the 2D graph onto the human and
mouse axes form the conventional breakpoint graphs. (g) 2D
breakpoint graph. The four cycles in the breakpoint graph are shown by
different colors. (h) A most parsimonious rearrangement
scenario for human and mouse X-chromosomes.
|
|
The above description hides many important details, and in many cases
the choice of synteny blocks is less obvious. Below we describe the
GRIMM-Synteny algorithm for synteny block generation from a collection
of anchors. The algorithm uses the gap threshold G and minimum
cluster size C as parameters and works as follows:
GRIMM-Synteny Algorithm- Form an anchor graph whose vertex set is the set of anchors.
- Connect vertices in the anchor graph by an edge if the distance between them is smaller than the gap size G.
- Determine the connected components of the anchor graph. Each connected component is called a cluster.
- Delete "small" clusters (shorter than the minimum cluster size C in length).
- Determine the cluster order and signs for each genome.
- Output the strips in the resulting cluster order as synteny blocks.
GRIMM-Synteny finds 319 clusters in the human genome that are longer
than 1 Mb. In addition to these clusters, we identified a number of
smaller clusters; for example, in the human genome there are 36
clusters whose length varies from 0.5 Mb1 Mb, 21 clusters whose
length varies from 250 kb500 kb, and 774 clusters with lengths from
50 kb250 kb. However, smaller syntenic block assignments are less
reliable because they may be caused by false orthologs and sequencing
errors.
Figure 1b presents examples of some highly rearranged clusters from
human Chromosome 18/mouse Chromosome 17 and the corresponding anchor
graph. After constructing the cluster graph and deleting small clusters
(steps 14), one has to determine the cluster order and signs (step
5). We define the span of a cluster in human (mouse) as the interval
between its minimum and maximum coordinates in human (mouse). Similarly
to Mural et al. (2002) and Gregory et al. (2002) , we found that the
cluster spans in human often significantly differ from cluster spans in
mouse (the span may include gaps and unaligned regions that contribute
to these differences). Note that although different clusters are not
supposed to overlap in 2D, they often overlap in 1D (i.e., their span
intervals may overlap in human or mouse). Therefore, defining the
cluster order for intermingled clusters should be done with caution. We
compute the center of mass of all anchors forming the cluster and order
clusters in human by the coordinates of their centers of masses. We
assign the clusters numbers according to their order on the human
genome. This lets us read off a cluster order in the mouse genome in
terms of these labels.
Signs (orientations) of the resulting clusters are usually well-defined
but in some cases are not obvious. The algorithm for sign assignments
in GRIMM-Synteny and the theorem justifying this algorithm will be
described elsewhere.
The number of clusters found depends on the value of the gap threshold
G. Figure 1b shows clusters in a region of the genome for the
gap threshold G = 100 kb. Increasing the gap threshold will
typically merge some clusters; in this case, this region forms a single
cluster at G = 1 Mb. The human and mouse genomes include
some gaps and regions without anchors that may be longer than
G. Such regions break a single synteny block into a few
clusters. To combine such clusters into a single synteny block, we
define the notion of a strip. A strip is a sequence of consecutive
signed clusters i1, ..., in in
the first genome that either appears consecutively in the same way or
in the reverse order
in, ..., i1 in the other
genome. For example, for G = 1 Mb and C = 1 Mb,
the number of clusters in the human and mouse genomes is 319 whereas
the number of strips (synteny blocks) is 281. Most synteny blocks
correspond to a single cluster, but some synteny blocks contain as many
as 5 clusters.
On the X-chromosome, comparing Figures 2, ab, most discarded material
is very small, but there is a region near the red cluster, at human
84.688.6 Mb, mouse 94.399.7 Mb, which forms three clusters. Each
has length <C = 1 Mb in human, so they are discarded.
Increasing G or lowering C sufficiently would retain
these clusters and possibly merge them with the red cluster. If they
were the only addition, the red block would be larger, but the synteny
block order in Figure 2e would not be affected, thus the rearrangement
analysis described below would remain the same. If distinct blocks were
added, it would affect the rearrangement analysis. The chosen values of
G and C result in a classification of the anchor
arrangements into microrearrangements, macrorearrangements, and noise.
Rearrangements of anchors within a synteny block are called
microrearrangements. Rearrangements of the order and orientation of
synteny blocks are called macrorearrangements.
From Synteny Blocks to the Breakpoint Graph
We illustrate the notion of the breakpoint graph using the
X-chromosome as an example. The signed permutation describing synteny
block order on the X-chromosome in mouse is 4, 5, 3, 11, 2, 8,
9, 10, 6, 7, 1. For our goals, we shall use 1, 7, 6, 10, 9,
8, 2, 11, 3, 5, 4 (a "flip" of the entire chromosome). We may
transform this permutation into the "identity" permutation
representing the human X-chromosome, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
by 7 reversals (Fig. 2h) via the HannenhalliPevzner algorithm
(Hannenhalli and Pevzner 1995b , 1999 ). This algorithm uses the
breakpoint graph (Fig. 2g) to construct a most parsimonious
evolutionary scenario (Fig. 2h) in polynomial time. We now show a new
way to construct the breakpoint graph.
Figure 3a presents the genomic dot-plot
(with added "start" and "end" elements) and the "human"
path (shown with solid edges) traversing the synteny blocks in human
order. The projection of this path on the human genome is shown below
the human axis. Similarly, Figure 3b presents the same genomic dot-plot
and the "mouse" path (shown with dotted edges) traversing the
synteny blocks in mouse order. The two-dimensional breakpoint graph is
obtained by superimposing these solid and dotted paths (Fig. 3c) and
further deleting the synteny blocks (Fig. 3d). One can prove that the
breakpoint graph is a collection of alternating solid-dotted cycles.
Figure 3d consists of a cycle of length 2 (containing the start
vertex), two cycles of length 4, a cycle of length 6, and a cycle of
length 8. After constructing the cycles, we usually color the edges so
that each cycle has its own color (Fig. 2g). At this point, we apply
the HannenhalliPevzner algorithm to obtain a most parsimonious
scenario. Note that the breakpoint graphs in Figures 2 and 3 are
different because of the X-chromosome flip.

View larger version (40K):
[in this window]
[in a new window]
|
Figure 3. Construction of the breakpoint graph from synteny blocks. (a)
Solid path through human. (b) Dotted path through mouse.
(c) Superposition of paths. (d) Remove blocks to
obtain cycles.
|
|
Hannenhalli and Pevzner (1995a) demonstrated that the cycles and their
interleaving structure are the "fossil records" of rearrangement
events and showed how to use them for solving reversal distance and
genomic distance problems. The 2D representation of breakpoint graphs
shown in Figure 3 is different from the representation used by
Hannenhalli and Pevzner (1995a ; they used the 1D projections of this
graph shown along the axis in Fig. 3). We believe that the 2D
breakpoint graph is a better visualization than the 1D one, in addition
to being independent of the choice of the axis. Therefore, it provides
better geometrical intuition for the HannenhalliPevzner theory.
Because every reversal creates at most two new breakpoints, the
reversal distance is at most half the number of the breakpoints in the
genome. If there is no breakpoint reuse, then the reversal distance is
exactly half the number of breakpoints. Moreover, in this case the real
evolutionary scenario is a most parsimonious one, thus implying that
the reversal distance equals the real distance in this case. However,
the estimate of reversal distance as half the number of breakpoints is
inaccurate because it assumes that the breakpoints are not reused in
evolution. In most genome rearrangement studies, there are breakpoint
reuses (at least at a certain level of synteny block resolution), thus
indicating that breakpoint reuse is the rule rather than the exception.
For example, Palmer and Herbon (1988) describe an evolutionary scenario
with one breakpoint reuse for cabbage and turnip mtDNA, whereas Bafna
and Pevzner (1995) describe an evolutionary scenario with four
breakpoint reuses in the rearrangement scenario for human and mouse
X-chromosomes.
Given an evolutionary scenario, we call two breakpoints,
B1 and B2, siblings if they are
endpoints of a reversal R from this scenario. Two breakpoints
are related if there is a series of breakpoints
B1,B2, ... ,Bn
such that every two consecutive breakpoints in this series are
siblings. In addition, reconstructing a most parsimonious scenario and
identification of related breakpoints are nontrivial tasks, even in the
absence of breakpoint reuse. The breakpoint graph reveals related
breakpoints and allows one to find the most parsimonious scenarios. In
the scenario shown in Figure 2h, the breakpoint at the start (flat end)
of block 4 is used twice. We emphasize that by reusing breakpoints we
do not mean multiple use of exactly the same genomic position as an
endpoint of rearrangements, but rather the fact that between synteny
blocks, there are regions that host endpoints for multiple
rearrangement events.
In contrast to unichromosomal breakpoint graphs (consisting of cycles),
the multichromosomal breakpoint graph consists of both cycles and paths
(such paths correspond to breaking the cycles at the chromosome
endpoints). The breakpoint graph for the entire human and mouse genomes
is very complicated (Fig. 4).
From the Breakpoint Graph to Rearrangement Scenarios
A previous analysis of comparative maps of human and mouse
X-chromosomes revealed 8 syntenic blocks and postulated a most
parsimonious rearrangement scenario with 6 inversions (Bafna and
Pevzner 1995 ). The genomic sequences reveal 11 synteny blocks of 1 Mb
and longer and provide evidence for at least 7 inversions (Fig. 2h).
Moreover, there are 177 microrearrangements within the X-chromosome
that were beyond the resolution of previous comparative mapping studies
(some of them may be artifacts of assembly errors). Two out of 11
synteny blocks on the X-chromosome show evidence of extensive
microrearrangements.
These estimates are based on the Hannenhalli and Pevzner (1995a)
theorem that expresses the reversal distance between two genomes as
n + 1 c + h, where n is
the number of synteny blocks, c is the number of cycles in the
breakpoint graph, and h is another easily computable
combinatorial parameter of the breakpoint graph. Because h
equals zero for many biological data sets,
n + 1 c is usually a good approximation for
reversal distance. For the X-chromosome as depicted in Figure 2g, we
have n = 11, c = 4, h = 0, and we
obtain a reversal distance of 8. However, flipping the whole mouse
X-chromosome results in the breakpoint graph of Figure 3d, with
n = 11, c = 5, h = 0, and reversal
distance 7. Flipping a whole chromosome does not count as a
rearrangement event, thus the genomic distance on the X-chromosome
between human and mouse is 7.
A similar theorem (Hannenhalli and Pevzner 1995b , 1999 ; Tesler 2002b )
holds for multichromosomal genomes, and automatically takes into
account whole chromosome flips. We used a fast implementation of the
HannenhalliPevzner algorithm (Tesler 2002a ) to analyze the
humanmouse rearrangement scenario (available via the GRIMM Web server
at http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM/index.html).
Although the algorithm finds a most parsimonious scenario, the real
scenario is not necessarily a most parsimonious one (Blanchette et al.
1996 ), and the order of rearrangement events within a most parsimonious
scenario often remains uncertain. Availability of three or more
mammalian genomes could remedy some of these limitations and provide a
means to infer the gene order in the mammalian ancestor (Bourque and
Pevzner 2002 ).
The GRIMM algorithm constructs a most parsimonious evolutionary
scenario between human and mouse genomes with 245 rearrangements. With
at least 245 rearrangements between human and mouse and an estimated 83
Myr (Huchon et al. 2002 ) of evolution from their common ancestor, we
obtain an estimated rate of 1.5 chromosomal rearrangements per Myr,
which is higher than the previous estimate of 1.0 (Lander et al. 2001 ).
However, this estimate should not be viewed as typical for mammalian
evolution because rodents may have unusually rapid chromosome
alterations. The comparative mapping data for cat and cow may soon shed
further light on the comparative rates of rearrangements in different
branches of the mammalian evolutionary tree.
The humanmouse breakpoint graph provides insights into rearrangements
that may have occurred in the course of evolution. Some of these
rearrangements are almost "obvious" (they correspond to short
cycles in the breakpoint graph), whereas others involve long series of
interacting breakpoints. Such complicated rearrangement events are
described by long cycles/paths in the breakpoint graph. The longest
path in the humanmouse breakpoint graph involves 26 breakpoints. The
humanmouse breakpoint graph has 6 other long paths with more than 10
breakpoints.
The analysis of microrearrangements within the synteny blocks
demonstrates a large variation in the rate of microrearrangements
(reversals) along the genomes. In particular, 41 out of 281 synteny
blocks do not show any evidence of microrearrangements, whereas 10
synteny blocks are extremely rearranged (40 or more rearrangements
within a block). For example, a long synteny block on human Chromosome
13/mouse Chromosome 8 (nucleotides 101,902,085 to 113,413,125 on human
Chromosome 13) consists of 65 regions of local similarity whose order
is perfectly conserved in human and mouse. On the other hand, a long
synteny block on human Chromosome 18/mouse Chromosome 17 (positions
2,789,316 to 10,083,804 on human Chromosome 18) consists of 143 regions
of local similarity and has a large number of microrearrangement
breakpoints, indicating that there were at least 85 inversions within
this block (Fig. 1b). The length of this synteny block in mouse is
smaller than in human (6.0 Mb vs. 7.3 Mb). Of course, some of the
breakpoints within this synteny block may be caused by assembly errors.
There is evidence of at least 3170 microrearrangements within all the
synteny blocks, some of which may be due to assembly errors.
Every breakpoint defines two synteny blocks A and B that are adjacent
in one genome but separated in the second one. We distinguish between
unichromosomal breakpoints (A and B belong to the same chromosome in
the second genome) and multichromosomal breakpoints (A and B are on
different chromosomes). Most breakpoints in the human and mouse genomes
are unichromosomal breakpoints, thus indicating that most
rearrangements that happened in the course of humanmouse evolution
are intrachromosomal inversions. In particular, one can come up with a
most parsimonious rearrangement scenario that includes 134 reversals in
the human and mouse genomes before any translocations/fusions/fissions
happen. After performing these reversals, the number of synteny blocks
is reduced from 281 to 144. The breakpoint graph of these human and
mouse "preancestors" allows one to infer which pairs of chromosomes
were involved in multiple translocations/fusions/fissions. The longest
cycle in this graph involves 8 breakpoints located on 8 different
chromosomes in human. The resulting rearrangement scenario from the
mouse to human preancestor has 15 inversions, 93 translocations, and 3
fissions. The complete scenario from mouse to human has 149 inversions,
93 translocations, and 3 fissions. (There are other combinations of 245
steps consistent with the breakpoint graph; this is the one we found
with the most inversions.)
 |
DISCUSSION
|
|---|
Molecular evolution studies are usually based on the analysis of
individual genes rather than entire genomes. However, such widespread
phenomena as horizontal gene transfer, differential gene loss, and the
like, often lead to situations in which evolutionary trees for
different genes tell different stories. An alternative approach is to
infer the evolutionary history of entire genomes, rather than
individual genes, based on the analysis of gene orders. Although this
approach is successful in bacterial genomics (for a recent review, see
Wolf et al. 2002 ), its applications in mammalian genomics are somewhat
limited owing to incompleteness of gene order data derived from
comparative maps. Human and mouse genomic sequences, for the first
time, provide a possibility to accurately estimate the extent of
rearrangement events. However, the "original synteny" problem
(Nadeau and Sankoff 1997 ) remains unsolved because at least three
mammalian gene orders are required to derive the ancestral mammalian
karyotype. The ongoing mammalian sequencing projects and recently
developed algorithms for reconstructing ancestral gene orders (Bourque
and Pevzner 2002 ) provide hope that the "original synteny" problem
will finally be resolved.
 |
WEB SITE REFERENCES
|
|---|
http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM/index.html;
GRIMM Web server.
 |
Acknowledgements
|
|---|
We are grateful to Ewan Birney, Guillaume Bourque, and Bill Murphy
for many helpful suggestions; and to Bernard Moret for providing his
group's genome rearrangement programs. We are also indebted to Michael
Kamal, Kerstin Linblad-Toh, and Jade Vinson for their advice on synteny
blocks and rearrangements in human and mouse genomes.
The publication costs of this article were defrayed in part by payment
of page charges. This article must therefore be hereby marked
"advertisement" in accordance with 18 USC section 1734 solely to
indicate this fact.
 |
Footnotes
|
|---|
1 Corresponding author. 
E-MAIL ppevzner{at}cs.ucsd.edu
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.757503. Article published online before print in December
2002.
 |
REFERENCES
|
|---|
Bafna, V. and Pevzner, P.A., 1993. Genome rearrangements and sorting by reversals. In Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science, pp. 148157.
___, 1995. Sorting by reversals: Genome rearrangements in plant organelles and evolutionary history of X chromosome. Mol. Biol. Evol. 12: 239-246.
___, 1996. Genome rearrangements and sorting by reversals. SIAM J. Comput. 25: 272-289.[CrossRef]
Blanchette, M., Kunisawa, T., and Sankoff, D. 1996. Parametric genome rearrangements. Gene 172: GC11-GC17.[CrossRef][Medline]
Bourque, G. and Pevzner, P.A. 2002. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Res. 12: 9748-9753.
Carver, E.A. and Stubbs, L. 1997. Zooming in on the humanmouse comparative map: Genome conservation re-examined on a high-resolution scale. Genome Res. 7: 1123-1137.[Abstract/Free Full Text]
Copeland, N., Jenkins, N.A., Gilbert, D.J., Eppig, J.T., Maltais, L.J., Miller, J.C., Dietrich, W.F., Weaver, A., Lincoln, S.E., Steen, R.G., et al. 1993. A genetic linkage map of the mouse: Current applications and future prospects. Science 262: 57-66.[Abstract/Free Full Text]
DeBry, R.W. and Seldin, M.F. 1996. Human/mouse homology relationships. Genomics 33: 337-351.[CrossRef][Medline]
Dobzhansky, T. and Sturtevant, A.H. 1938. Inversions in the chromosomes of Drosophila pseudoobscura. Genetics 23: 28-64.[Free Full Text]
Fujibuchi, W., Ogata, H., Matsuda, H., and Kanehisa, M. 2000. Automatic detection of conserved gene clusters in multiple genomes by graph comparison and p-quasi grouping. Nucleic Acids Res. 28: 4029-4036.[Abstract/Free Full Text]
Gregory, S.G., Sekhon, M., Schein, J., Zhao, S., Osoegawa, K., Scott, C.E., Evans, R.S., Burridge, P.W., Cox, T.V., Fox, C.A., et al. 2002. A physical map of the mouse genome. Nature 418: 743-750.[CrossRef][Medline]
Hannenhalli, S. and Pevzner, P.A., 1995a. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proceedings of the 36th Annual IEEE Symposium on Foundations of Computer Science, pp. 581592. IEEE, Milwaukee, Wisconsin.
___,, 1995b. Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In Proceedings of the 27th Annual ACM Symposium on the Theory of Computing, pp. 178189.
___, 1999. Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). J. ACM 46: 1-27.[CrossRef]
Hardison, R.C., Oeltjen, J., and Miller, W. 1997. Long humanmouse sequence alignments reveal novel regulatory elements: A reason to sequence the mouse genome. Genome Res. 7: 959-966.[Free Full Text]
Huchon, D., Madsen, O., Sibbald, M.J., Ament, K., Stanhope, M.J., Catzeflis, F., de Jong, W.W., and Douzery, E.J. 2002. Rodent phylogeny and a timescale for the evolution of glires: Evidence from an extensive taxon sampling using three nuclear genes. Mol. Biol. Evol. 19: 1053-1065.[Abstract/Free Full Text]
Kececioglu, J. and Sankoff, D. 1995. Exact and approximation algorithms for the inversion distance between two permutations. Algorithmica 13: 180-210.[CrossRef]
Kent, W.J. 2002. BLATThe BLAST-like alignment tool. Genome Res. 12: 656-664.[Abstract/Free Full Text]
Koop, B.F. and Hood, L. 1994. Striking sequence similarity over almost 100 kilobases of human and mouse T-cell receptor DNA. Nat. Genet. 7: 48-53.[CrossRef][Medline]
Kumar, S., Gadagkar, S.R., Filipski, A., and Gu, X. 2001. Determination of the number of conserved chromosomal segments between species. Genetics 157: 1387-1395.[Abstract/Free Full Text]
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.[CrossRef][Medline]
Lathe, W.C., Snel, B., and Bork, P. 2000. Gene context conservation of a higher order than operons. Trends Biochem Sci. 25: 474-479.[CrossRef][Medline]
Ma, B., Tromp, J., and Li, M. 2002. PatternHunter: Faster and more sensitive homology search. Bioinformatics 18: 440-445.[Abstract/Free Full Text]
Mayor, C., Brudno, M., Schwartz, J.R., Poliakov, A., Rubin, E.M., Frazer, K.A., Pachter, L., and Dubchak, I. 2000. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16: 1046-1047.[Abstract/Free Full Text]
Mural, R.J., Adams, M.D., Myers, E.W., Smith, H.O., Miklos, G.L., Wides, R., Halpern, A., Li, P.W., Sutton, G.G., Nadeau, J., et al. 2002. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296: 1661-1671.[Abstract/Free Full Text]
Nadeau, J.H. and Sankoff, D. 1997. Landmarks in the Rosetta Stone of mammalian comparative maps. Nat. Genet. 15: 6-7.[CrossRef][Medline]
Nadeau, J.H. and Taylor, B.A. 1984. Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. 81: 814-818.[Abstract/Free Full Text]
O'Brien, S.J., Menotti-Raymond, M., Murphy, W.J., Nash, W.G., Wienberg, J., Stanyon, R., Copeland, N.J., Jenkins, N.A., Womack, J.E., and Graves, J.A. 1999. The promise of comparative genomics in mammals. Science 286: 458-481.[Abstract/Free Full Text]
Palmer, J.D. and Herbon, L.A. 1988. Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J. Mol. Evol. 27: 87-97.
Puttagunta, R., Gordon, L.A., Meyer, G.E., Kapfhamer, D., Lamerdin, J.E., Kantheti, P., Portman, K.M., Chung, W.K., Jenne, D.E., Olsen, A.S., et al. 2000. Comparative maps of human 19p13.3 and mouse chromosome 10 allow identification of sequences at evolutionary breakpoints. Genome Res. 10: 1369-1380.[Abstract/Free Full Text]
Rogozin, I.B., Makarova, K.S., Murvai, J., Czabarka, E., Wolf, Y.I., Tatusov, R.L., Szekely, L.A., and Koonin, E.V. 2002. Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 30: 2212-2223.[Abstract/Free Full Text]
Sankoff, D. and Blanchette, M., 1997. The median problem for breakpoints in comparative genomics. In Computing and Combinatorics, Proceedings of COCOON '97, Lecture notes in computer science, pp. 251263. Springer Verlag, New York.
Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMakerA Web server for aligning two genomic DNA sequences. Genome Res. 10: 577-586.[Abstract/Free Full Text]
Sturtevant, A.H. and Dobzhansky, T. 1936. Inversions in the third chromosome of wild races of Drosophila pseudoobscura, and their use in the study of the history of the species. Proc. Natl. Acad. Sci. 22: 448-450.[Free Full Text]
Tatusov, R.L., Koonin, E.V., and Lipman, D.J. 1997. A genomic perspective on protein families. Science 278: 631-637.[Abstract/Free Full Text]
Tesler, G. 2002a. GRIMM: Genome rearrangements Web server. Bioinformatics 18: 492-493.[Abstract/Free Full Text]
. 2002b. Efficient algorithms for multichromosomal genome rearrangements. J. Comp. Sys. Sci. (in press).
Thomas, J.W., Summers, T.J., Lee-Lin, S.Q., Maduro, V.V., Idol, J.R., Mastrian, S.D., Ryan, J.F., Jamison, D.C., and Green, E.D. 2000. Comparative genome mapping in the sequence-based era: Early experience with human Chromosome 7. Genome Res. 10: 624-633.[Abstract/Free Full Text]
Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562.[CrossRef][Medline]
Watterson, G.A., Ewens, W.J., Hall, T.E., and Morgan, A. 1982. The chromosome inversion problem. J. Theor. Biol. 99: 1-7.[CrossRef]
Wolf, Y.I., Rogozin, I.B., Kondrashov, A.S., and Koonin, E.V. 2001. Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res. 11: 356-372.[Abstract/Free Full Text]
Wolf, Y., Rogozin, I., Grishin, N., and Koonin, E. 2002. Genome trees and the tree of life. Trends Genet. 18: 472.[CrossRef][Medline]
Received September 2, 2002;
accepted in revised format November 4, 2002.
13:37-45 © by 2003 Cold Spring Harbor Laboratory Press ISSN 1088-9051/03 $5.00

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
P. G. Engstrom, S. J. Ho Sui, O. Drivenes, T. S. Becker, and B. Lenhard
Genomic regulatory blocks underlie extensive microsynteny conservation in insects
Genome Res.,
December 1, 2007;
17(12):
1898 - 1908.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Derrien, C. Andre, F. Galibert, and C. Hitte
Analysis of the Unassembled Part of the Dog Genome Sequence: Chromosomal Localization of 115 Genes Inferred from Multispecies Comparative Genomics
J. Hered.,
August 3, 2007;
(2007)
esm027v3.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Y. Wang and F. C. Leung
Description of a Synteny on the Chicken Chromosome Zp23-22
Poult. Sci.,
March 1, 2007;
86(3):
453 - 459.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. Chaisson, B. J. Raphael, and P. A. Pevzner
Microinversions in mammalian evolution
PNAS,
December 26, 2006;
103(52):
19824 - 19829.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Soderlund, W. Nelson, A. Shoemaker, and A. Paterson
SyMAP: A system for discovering and viewing syntenic regions of FPC maps.
Genome Res.,
September 1, 2006;
16(9):
1159 - 1168.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Flannick and S. Batzoglou
Using multiple alignments to improve seeded local alignment algorithms
Nucleic Acids Res.,
August 12, 2005;
33(14):
4563 - 4577.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Pasek, A. Bergeron, J.-L. Risler, A. Louis, E. Ollivier, and M. Raffinot
Identification of genomic features using microsyntenies of domains: Domain teams
Genome Res.,
June 1, 2005;
15(6):
867 - 874.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. L. Lu, T. C. Wang, Y. C. Lin, and C. Y. Tang
ROBIN: a tool for genome rearrangement of block-interchanges
Bioinformatics,
June 1, 2005;
21(11):
2780 - 2782.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. C. Ferreri, D. M. Liscinsky, J. A. Mack, M. D. B. Eldridge, and R. J. O'Neill
Retention of Latent Centromeres in the Mammalian Genome
J. Hered.,
May 1, 2005;
96(3):
217 - 224.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. E. Hampson, B. S. Gaut, and P. Baldi
Statistical detection of chromosomal homology using shared-gene density alone
Bioinformatics,
April 15, 2005;
21(8):
1339 - 1348.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Bourque, E. M. Zdobnov, P. Bork, P. A. Pevzner, and G. Tesler
Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages
Genome Res.,
January 1, 2005;
15(1):
98 - 110.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Blanchette, E. D. Green, W. Miller, and D. Haussler
Reconstructing large regions of an ancestral mammalian genome in silico
Genome Res.,
December 1, 2004;
14(12):
2412 - 2423.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Zhao, J. Shetty, L. Hou, A. Delcher, B. Zhu, K. Osoegawa, P. de Jong, W. C. Nierman, R. L. Strausberg, and C. M. Fraser
Human, Mouse, and Rat Genome Large-Scale Rearrangements: Stability Versus Speciation
Genome Res.,
October 1, 2004;
14(10a):
1851 - 1860.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|