|
|
|
|
Vol. 12, Issue 11, 1651-1662, November 2002
LETTER
|
| |
ABSTRACT |
|---|
|
|
|---|
Human chromosome 2 was formed by the head-to-head fusion of two ancestral chromosomes that remained separate in other primates. Sequences that once resided near the ends of the ancestral chromosomes are now interstitially located in 2q13-2q14.1. Portions of these sequences had duplicated to other locations prior to the fusion. Here we present analyses of the genomic structure and evolutionary history of >600 kb surrounding the fusion site and closely related sequences on other human chromosomes. Sequence blocks that closely flank the inverted arrays of degenerate telomere repeats marking the fusion site are duplicated at many, primarily subtelomeric, locations. In addition, large portions of a 168-kb centromere-proximal block are duplicated at 9pter, 9p11.2, and 9q13, with 98%-99% average sequence identity. A 67-kb block on the distal side of the fusion site is highly homologous to sequences at 22qter. A third ~100-kb segment is 96% identical to a region in 2q11.2. By integrating data on the extent and similarity of these paralogous blocks, including the presence of phylogenetically informative repetitive elements, with observations of their chromosomal distribution in nonhuman primates, we infer the order of the duplications that led to their current arrangement. Several of these duplicated blocks may be associated with breakpoints of inversions that occurred during primate evolution and of recurrent chromosome rearrangements in humans.
[Supplemental material is available online at http://www.genome.org. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: T. Newman, C. Harris, and J. Young.]
| |
INTRODUCTION |
|---|
|
|
|---|
Humans have 46 chromosomes, whereas chimpanzee,
gorilla, and orangutan have 48. This major karyotypic difference was
caused by the fusion of two ancestral chromosomes to form human
chromosome 2 and subsequent inactivation of one of the two original
centromeres (Yunis and Prakash 1982
). As a result of this fusion,
sequences that once resided near the ends of the ancestral chromosomes
are now located in the middle of chromosome 2, near the borders of bands 2q13 and 2q14.1. For brevity, we refer henceforth to the region
surrounding the fusion as 2qFus. Two head-to-head arrays of degenerate
telomere repeats are found at this site; their head-to-head orientation
indicates that chromosome 2 resulted from a telomere-to-telomere fusion
(Ijdo et al. 1991
). Furthermore, cross-hybridization between 2qFus and
various subtelomeric regions has been observed by fluorescence in situ
hybridization (FISH) (Ijdo et al. 1991
; Trask et al. 1993
; Hoglund et
al. 1995
; Martin-Gallardo et al. 1995
; Ning et al. 1996
; Lese et al.
1999
; Ciccodicola et al. 2000
; Park et al. 2000
; Bailey et al. 2002
;
Martin et al. 2002
). Thus, the fusion must have occurred after
subtelomeric sequences present at the ends of the ancestral fusion
partners had already duplicated to/from at least one other chromosome end.
The subtelomeric regions of human chromosomes are particularly dynamic
relative to most of the human genome. Sequences have recurrently
exchanged, recombined, and duplicated among the ends of nonhomologous
chromosomes (for review, see Mefford and Trask 2002
). Thus, the
entrapment of subtelomeric regions at the more sequestered interstitial
fusion site provides a potential opportunity to compare the composition
of two ancestral subtelomeres to their counterparts that have persisted
at, and propagated among, subtelomeric locations.
Martin et al. (2002)
recently presented a clone contig
encompassing the fusion and showed homology with several interstitial sites, in addition to subtelomeric sites. Here, we provide more detail
on the structure of the DNA surrounding the fusion site and these
paralogous relationships. We quantify the extent and degree of homology
between this region and paralogous segments elsewhere in the human
genome, including sites not described previously. Using these data and
observations of the chromosomal location of these sequences in nonhuman
primates, we infer the history of some of the duplications and
rearrangements that have occurred during recent primate evolution. The
extensive homology among 2qFus-related regions of the genome may have
mediated
or been the result of
some of the rearrangements that
distinguish the karyotypes of higher primates and that may now interact
to cause chromosome rearrangements in humans.
| |
RESULTS |
|---|
|
|
|---|
Chromosomal Distribution of Sequences from the 2q13-2q14.1 Fusion Region
Bacterial Artificial Chromosome (BAC) RP11-395L14
(AL078621) contains 789 bp of degenerate telomere repeats organized in two head-to-head arrays and overlaps the ancestral fusion site (Fig.
1A) (Martin et al. 2002
). Using this BAC as
the seed, we independently assembled a 614-kb contig surrounding the
fusion site using publicly available BAC sequences (Fig. 1A). The
finished BACs in the contig are 99.9%-100% identical in their
regions of overlap. Our 2qFus contig is consistent with the automated
assembly of this region performed by University of California in Santa Cruz (UCSC) and National Center for Biotechnology Information (NCBI)
(http://genome.ucsc.edu/ and http://www.ncbi.nlm.nih.gov) and recent
analyses by Martin et al. (2002)
.
|
We used a combination of three approaches in order to confirm the assignment of BACs forming the 614-kb contig to chromosome 2qFus and to investigate the chromosomal distribution of paralogous sequences.
PCR Analyses of Monochromosomal Hybrid Panel
First, we designed 48 PCR primer pairs, which amplify DNA free of known repeats, across the contig and performed PCR assays on DNA from a panel of hybrid cell lines, each containing a different human chromosome against a rodent background (Fig. 1C). As expected, all primer pairs amplified products from chromosome 2. However, only a portion of the assembled sequence
a total of ~350 kb on the ends of
the contig
is unique to chromosome 2.
Sequences closely flanking the telomere-repeat arrays (red zone, Fig.
1C) amplified from seven or more chromosomes, with one assay amplifying
a product from 13 different chromosomes, including chromosome 22. A
40-kb block common to only chromosomes 2 and 22 and defined by four PCR
assays (blue zone) adjoins the region of multichromosomal segments. One
assay within this block is also positive for chromosome 15, due to the
retrotransposition of a processed pseudogene of SNRPA1 from
the intron-containing copy on chromosome 15 prior to the segmental
duplication that gave rise to the larger block of homology between
2qFus and 22qter (Fan et al. 2002Fluorescence In Situ Hybridization (FISH)
Second, in order to more precisely define the chromosomal location of sequences homologous to this region, we performed FISH analyses using the five BAC clones comprising the 2qFus contig. The results are summarized in Figure 1A and shown schematically for three of the BACs in Figure 2. Sequences in RP11-395L14, which contains the fusion site, hybridize to five prominent sites, 2qFus, 9q13, 9p11.2, 9pter (9p24), and 22qter (22q13.3), as well as to several other chromosomal ends with lower intensity. RP11-480C16 produces FISH signals at 2qFus, 2q11.2, 9p11.2, 9q13, and 9pter, as observed recently by Martin et al. (Martin et al. 2002
|
Database Mining and Sequence Alignment
Third, we conducted a BlastN search of all finished and draft sequences publicly available as of February 28, 2002 in order to identify sequences paralogous to the 614-kb region of 2qFus. Results are summarized in Table 1 and Figure 3. Some, but not all, of these paralogous segments were detected in earlier whole-genome scans for duplications (Bailey et al. 2001
|
|
98 kb and is 95.8% identical to the
centromere-proximal portion of 2qFus. Our FISH analyses of this clone
confirm its 2q11.2 location as indicated by the NCBI and UCSC
assemblies: Signals are observed at both 2q11.2 and 2qFus
(Supplementary Fig. IA, available online at http://www.genome.org).
This intrachromosomal identity explains why clones RP11-65I12 and
480C16 from 2qFus give FISH signals on 2q11.2: over 55 kb and 45 kb
of their inserts, respectively, match this paralogous segment in 2q11.2
at >95% identity. A second 20.5-kb block of 2qFus homology (96.0%
identity), called 2q11.2-B, is in the finished sequence of RP11-468G5.
FISH confirms the 2q11.2 location of this paralogy: This BAC gives a
strong FISH signal in 2q11.2 and a weak signal at 2qFus (not shown).
Although they are 99.0% identical over ~16 kb, clones RP11-34G16
and
468G5 represent distinct paralogous blocks in 2q11.2. Sequences
neighboring the paralogy are very different (Fig. 3), and a
dissimilarity of 1% is greater than expected from the combination of
allelic variation and sequencing errors. The two 2q11.2 blocks
are not resolvable by FISH in metaphase chromosomes, however.
The paralogy between 9pter (9p24) and 2qFus was reported to reside in
RP11-174M15 and RP11-143M1 by Martin et al. (2002)
403A15 hybridize by FISH most intensely to 9p11.2 and 9q13, and less
intensely to 2qFus, 9pter, and many pericentromeric sites
(Supplementary Fig. ID for RP11-15J10, available online at
http://www.genome.org). Although these clones are assigned to 9q13 in
the NCBI and UCSC maps, there is no strong justification for this
assignment over 9p11.2. These sequences are not connected to other 9q13
BACs by overlapping sequence or end-sequenced BACs, and they contain no
radiation-hybrid or linkage markers that are unambiguously assigned to
one side of the chromosome-9 centromere or the other. RP11-403A15 and
15J10 have no sequence in common, but must lie sufficiently close to
each other that they are not resolvable by metaphase FISH.
As expected from our hybrid panel data and FISH observations made by us
and others (references earlier), the multicopy regions immediately
flanking the fusion site match many publicly available BAC and cosmid
sequences. Clones with homology with these multicopy regions belong to
at least 15 different contigs representing different chromosomal ends
(not shown). We show only one of the longest available homologies, that
in RP11-34P13 (AC073186/AL627309), which is 98.8% identical to 2qFus
over
29.5 kb and 99.6% identical to the 9pter sequence.
This clone contains no chromosome-specific DNA and has been variously
assigned to chromosome 1, 7, 18, and 21 in GenBank entries and draft
assemblies over the last 2 years. It does not derive from chromosome 1, 18, or 21, since it does not cross-hybridize by FISH to these
chromosomes in any of several individuals analyzed (not shown). It is
likely to be a variant chromosome allele of 19, as it shares extensive
homology with the 19pter allele sequenced by the Department of Energy
Joint Genome Institute (http://www.jgi.doe.gov) and contains sequence variants of the olfactory receptor gene most often found on chromosome 19 in 22 individuals sampled from different ethnic groups (180 chromosomes) (Mefford et al. 2001
22 kb and 104 kb on the two sides and 28 kb between the regions of homology with 2q11.2-A and 9pter). The
remainder of the sequence is duplicated in at least one other location.
So far, we have detected 16 locations with at least 5 kb of homology
with 2qFus by at least two of three methods (a reproducible FISH
signal, more than one positive PCR assay, or >5 kb of >95% sequence
match in a chromosomally assigned genomic sequence). Of these
locations, 11 are subtelomeric, and 3 are pericentromeric. An
additional 14 sites of homology (of which 11 are subtelomeric) were
detected with only a single method. The failure to detect these 14 sites with more than one method is likely due to incomplete sequence
coverage, insensitivity of FISH, low density of PCR assays, mismatches
to primer sequences, and/or normal polymorphism among the chromosomes
analyzed in the three methods.
FISH Analyses of Nonhuman Primates
We confirmed the centromere-telomere orientation of the 2qFus
contig by FISH analyses of constituent clones on chimpanzee chromosomes
(Fig. 2). Chimpanzee chromosomes 12 and 13 are homologous to human 2p
and 2q, respectively (Yunis and Prakash 1982
; Wienberg et al. 1994
).
RP11-480C16, from one end of the 2qFus contig, hybridizes to
chimpanzee 12, indicating that it maps to the centromere-proximal side
of the fusion site. RP11-432G15, from the other end of the 2qFus
contig, hybridizes to chimpanzee 13, indicating that it lies on the
centromere-distal side of the fusion site in human. As expected,
RP11-395L14, which contains the fusion site, generates signals on both
chimpanzee chromosomes 12 and 13.
FISH analyses also reveal changes in location and copy number of
paralogous segments that have occurred during hominid evolution. In
both human and chimpanzee, RP11-432G15 (red symbols in Fig. 2)
hybridizes only to the regions corresponding to 2qFus and 22qter. These
two sites are also detected in gorilla and orangutan, indicating that
the transfer of material between these locations predated hominid
divergence. However, sequences homologous to this clone are distributed
on at least 38 additional telomeres and two interstitial sites in
gorilla. Hybridization is detected at 14 of the same locations in
orangutan. Given the generally accepted hominid lineage (Chen and Li
2001
), either orangutan and gorilla independently acquired copies of
portions of the RP11-432G15 sequence at these locations, or homologous
sequence was deposited at these sites before hominids diverged and then
was lost in the ancestor of human and chimpanzee. One interstitial and
26 subtelomeric integration sites are unique to gorilla, indicating
that a burst of duplications also occurred along the gorilla-specific
branch. One subtelomeric and five interstitial sites are unique to orangutan.
Sequences in RP11-480C16, which hybridize by FISH to five sites in human (two on 2, three on 9), are present in four of the orthologous sites in chimpanzee and three in gorilla and orangutan (green symbols in Fig. 2). Chimpanzee, gorilla, and orangutan all lack cross-hybridizing sequences at the 9p11.2-equivalent location, and gorilla and orangutan are missing an additional signal corresponding to 9pter or 9q13. Because of an inversion with breakpoints in these bands that differentiates human chromosome 9 from its counterpart in gorilla and orangutan, it is not clear whether the remaining conserved location corresponds to 9pter or 9q13, but other evidence (see below) indicates that 9q13 holds the ancestral copy. Five additional subtelomeric locations have detectable homology with RP11-480C16 in chimpanzee.
As in human, blocks immediately flanking the fusion site and contained in RP11-395L14 are multicopy in the chimpanzee, gorilla, and orangutan genomes, and the copies are primarily subtelomerically located (blue symbols, Fig. 2). Because this BAC encompasses blocks whose chromosomal positions were assayed by the two BACs discussed in the preceding two paragraphs, we expected to see marked species differences in the distribution of its FISH signals. Indeed, of 30 subtelomeric locations detected in either human or chimpanzee with reasonable efficiency, 15 are common to both species, and 15 are seen in only one of the two species. Signals were also observed in two additional interstitial locations in chimpanzee. Of the ~50 locations detected with RP11-395L14 in any of the four tested hominid species, only seven are common to all four species, ~13 are species-specific locations, and the rest are common to different combinations of two or three species.
Almost all gorilla chromosome ends and half of chimpanzee ends are capped with AT-rich, DAPI-bright bands. These caps are not present on human or orangutan chromosomes. 2qFus homologous sequences are invariably found centromere proximal of these caps when both are present.
Genomic Structure
Base-Pair Composition
The GC content of the 2q fusion region averages 44%, but it fluctuates markedly across the 614-kb sequence (Fig. 1B). The GESTALT program (Glusman and Lancet 2000
|
Interspersed Repeats
The density and nature of repetitive elements also vary across the 614-kb 2qFus sequence (Fig. 1B). Overall, interspersed repeats occupy 40% of the sequence, with Short Interspersed Elements (SINEs) and Long Interspersed Elements (LINEs) accounting for 12% and 15% of the sequence, respectively. Recent repeat activity helps to date some of the duplication events involving the 2qFus sequence. The full-length AluY, AluYa5, and AluYb8 insertions into the 2qFus-paralogous blocks are indicated in Figure 3. These are the youngest classes of Alu elements found in the region. The AluYa5 and AluYb8 subfamilies have been transpositionally active very recently: 99% of the insertions of these elements are human specific, and ~25% exhibit presence/absence polymorphism in humans (Carroll et al. 2001
99.7% identical. (4) Another AluYb8
element is common to both 2qFus and 22qter in their region of homology.
Its presence in both blocks indicates that sequence was transferred
between 22qter and 2qFus (or its unfused predecessor) after the AluYb8 element was inserted. The implications of these observations are discussed below.
Three repetitive elements cross breakpoints of homology and therefore
provide clues to the ancestral and derived states of the duplicative
transfers. (1) An L1PBa element crosses the red-to-light blue
breakpoint in 9q13, but is truncated in the 9pter and 2qFus sequences;
consistent with the creation of an isochore transition in 9pter/2qFus
(Figs. 1 and 4). (2) An AluJb element is truncated in 19pter at the
dark blue-to-red breakpoint of homology with 2qFus, but crosses the
breakpoint in 2qFus. The breakpoint also creates an L-to-H-2 isochore
transition in 19pter, but leaves the H1-2 isochore intact in
9pter/2qFus (not shown). (3) An L1ME element is truncated by the
duplication from 2qFus to 2q11.2 (red-to-light green breakpoint); it
crosses the breakpoint in 2qFus. This case is the only one encountered
in which the direction of transfer indicated by the repeat-element is
opposite that inferred from the isochore-transition pattern.
Sequence Variation Across the 2q13-2q14.1 Fusion Site
The head-to-head arrays of repeats at the fusion site in
RP11-395L14 have degenerated significantly (14%) from the near
perfect arrays of (TTAGGG)n found at telomeres. Comparison of
the fusion site in RP11-395L14 with an 1873-bp sequence from a
different individual (M73018) (Ijdo et al. 1991
) reveals a high degree
of variation in the length and sequence of the head-to-head arrays of
degenerate telomere repeats (not shown). Overall, the two sequences show only 90% sequence identity. More differences are observed within
the degenerate telomere arrays (88% identity) than in the sequences
immediately flanking them (97.6% identity; 94.9% when each of the
bases in insertions and deletions, which range in size from 1 to 8 bp,
are counted as mismatches). Only 48% of the 127 repeats in
RP11-395L14 and 46% of the 158 repeats in M73018 are perfect TTAGGG
or TTGGGG units. Deviation from the canonical telomeric repeat appears
to be randomly distributed across the fusion site in both alleles (not shown).
Two short arrays of degenerate telomere repeats, in addition to the
arrays marking the fusion site, are found within 2qFus. They are 181 bp
and 248 bp long, and 17 kb and 21 kb distal of the fusion site,
respectively. Interstitial arrays of degenerate telomere arrays are
common in the human genome, particularly in subtelomeric regions
(Riethman et al. 2001
). Like the array at the fusion site, these arrays
are highly diverged from the prototypic telomeric repeats (70% and
86% identical to [TTAGGG]n, respectively). A SATR1
(satellite) repeat cluster within the block common to 2qFus, 9pter,
9q13, and 9p11.2-B (asterisks in Fig. 3) also shows high variability in
length, especially when compared with the overall high identity of
these blocks.
| |
DISCUSSION |
|---|
|
|
|---|
The gross characteristics of the chromosomal fusion that gave rise
to human chromosome 2 were apparent 20 years ago, when Yunis and
Prakash aligned the high-resolution banding patterns of human,
chimpanzee, gorilla, and orangutan chromosomes (Yunis and Prakash
1982
). The identities of the fusion partners were confirmed 10 years
later when human chromosome-2 specific DNA was observed to "paint"
chimpanzee chromosomes 12 and 13 (Jauch et al. 1992
; Wienberg et al.
1992
). Because the fused chromosome is unique to humans and is fixed,
the fusion must have occurred after the human-chimpanzee split, but
before modern humans spread around the world, that is, between ~6 and
~1 million years ago (Mya; Chen and Li 2001
; Yu et al. 2001
) (Fig.
5). This gross karyotypic change may have
helped to reinforce reproductive barriers between early Homo
sapiens and other species, as the F1 offspring would have had
reduced fertility because of the risk of unbalanced
segregation of chromosomes during meiosis.
|
Molecular Characteristics of the Fusion
When observed at the sequence level, the ancestral chromosomes
appear to have undergone a straightforward fusion. The sequence of
RP11-395L14, like the cosmid partially sequenced by Ijdo et al.
(1991)
, shows two head-to-head arrays of degenerate telomere repeats at
the 2q fusion site, with no other sequence between the arrays. This
observation indicated that the two ancestral chromosomes had joined
end-to-end within the terminal telomeric repeats, with subsequent
inactivation of one of the two centromeres. Kasai et al. (2000)
showed
using FISH that the chromosomes underwent no gross alteration in
structure: The relative order of 38 cosmids derived from 2q12-2q14 was
the same on human chromosome 2 and the short arms of
chimpanzee chromosomes 12 and 13. Although the sequence is not yet
available from the terminal regions of chimpanzee chromosomes 12p and
13p with which to compare to human 2q13-2q14.1, the human sequence is
very similar to two extant human subtelomeres (9pter and 22qter) (Fig.
3, Table 1). Very little, if any, distal material is unaccounted for in
the two comparisons. Although neither 9pter nor 22qter has been
sequenced into the telomeric arrays, the available sequences for these
chromosomes match 2qFus to within 21 kb and 1.4 kb of the array at the
fusion site, respectively, and PCR assays indicate that homology with
9pter extends to at least 8.4 kb from the array.
If the fusion occurred within the telomeric repeat arrays less than
~6 Mya, why are the arrays at the fusion site so degenerate? The
arrays are 14% diverged from canonical telomere repeats (not shown),
whereas noncoding sequence has diverged <1.5% in the ~6 Mya since
chimpanzee and humans diverged (Chen and Li 2001
) (Fig. 5). There are
three possible explanations: (1) Given the many instances of degenerate
telomeric arrays within the subtelomeric regions of human chromosomes
(Riethman et al. 2001
), the chromosomes joined at interstitial arrays
near, but not actually at, their ends. In this case, material from the
very ends of the fusion partners would have been discarded. (2) The
arrays were originally true terminal arrays that degenerated rapidly
after the fusion. This high rate of change is plausible, given the
remarkably high allelic variation observed at the fusion site. The
arrays in the BAC and the sequence obtained by Ijdo et al. (1991)
differ by 12%, which is high even if some differences are ascribed to
experimental error. (3) Some array degeneracy could be a consequence of
sequencing errors. We have not been able to PCR successfully across the
fusion site, which would be required to assess the contribution of
sequencing errors to this measure of fusion-site sequence polymorphism.
However, explanation 2 is supported by the high variability among
allelic copies of other interstitial telomeric repeats and associated regions sequenced by Mondello et al. (2000)
(AF236886 and AF236885). Considering the high mutability of interstitial telomere repeat arrays,
the fusion partners could have joined either within terminal or
subterminal arrays to form chromosome 2.
Segmental Duplications
By using PCR analyses of a hybrid panel, genomic sequence alignment,
and FISH, we demonstrate that
360 kb of the region
surrounding the fusion site is duplicated at least once elsewhere in
the genome. These paralogous segments are distributed primarily in
subtelomeric and pericentromeric locations, consistent with the
distribution of segmental duplications found in a recent whole genome
survey (Bailey et al. 2001
) and earlier FISH analyses (Ijdo et al.
1991
; Trask et al. 1993
; Hoglund et al. 1995
; Martin-Gallardo et al. 1995
; Ning et al. 1996
; Lese et al. 1999
; Ciccodicola et al. 2000
; Park
et al. 2000
; Bailey et al. 2002
; Martin et al. 2002
). Subtelomeric homology spans ~258 kb. The long blocks shared by 9pter or 22qter on
the proximal and distal side of the fusion site, respectively, account
for the bulk of this homology. In addition, highly dispersed, multicopy
blocks comprise the 68 kb directly surrounding the fusion site. These
blocks are relatively short and show 93%-99% identity to various
subtelomeres. This complex pattern of homology among present-day
subtelomeres and the fusion site indicates that various DNA segments
had duplicated among subtelomeric regions, including those of the
fusion partners, before the fusion took place (see following).
Very large segments of 2qFus also have homology with nontelomeric sites. These interstitial paralogs are less similar to the 2qFus sequence than are the subtelomeric paralogs (Table 1) and presumably result from earlier duplication events (see following). The fusion region and 2q11.2 share at least 100 kb as the result of large intrachromosomal duplications. Intrachromosomal duplications have also generated at least three large interstitial blocks of homology on chromosome 9, in addition to 9pter.
Many other cross-hybridizing sites were observed in the genomes of nonhuman primates (Fig. 2), reflecting the evolutionary mobility of sequences homologous to the region surrounding the fusion site.
The size and high similarity of these duplications have been
problematic for the automated assembly of human genome sequence across
these regions. For example, RP11-15J10 has migrated from 2q11.2, 9q13,
9p23, 9q21, and 9q12 in various versions of the genome assembly. It
contains Sequence-tagged Sites (STSs) that have been
assigned to chromosomes 2, 9, 7, and X, but none is single copy in the
genome. Based on our FISH and hybrid-panel results, this clone most
likely derives from 9p11.2. RP11-143M1 has migrated from 9q13, to
9p22, to 9pter, its true location. The numbers in Table 1 provide
justification for some of this confusion: 9pter and 9q13 are ~98%
identical over a span of >150 kb. We have no explanation for why
several 2qFus-related clones have been assigned at one time or another
to 9p22-9p23 (RP11-15J10,
403A15,
143M1 and
174M15); none
produces a FISH signal there. Unfortunately, some of these localization
errors have been propagated in publications (e.g., Mah et al. 2001
),
which augments the confusion. These examples and the deceptive Y-hybrid
results we encountered illustrate the need for multiple mapping methods
to address the challenges encountered in the study of segmental duplications.
The History of the Paralogous Sequences
Large Duplications and Pericentromeric Inversions
Based on our sequence comparisons, the oldest event involving 2qFus paralogous blocks was the duplicative exchange between 2q11.2-A and the progenitor of the centromere-proximal side of the fusion site (Fig. 5). These sequences have since diverged by at least 4%. The FISH results indicate that, at the time of this duplication, both regions were located on the p arm of the ancestral chromosome that was later to be a fusion partner. If there has been no ectopic recombination or gene conversion between these two regions since the original duplication, and the two copies have diverged at a rate typical for the hominid noncoding DNA (Chen and Li 2001
150-kb
block from what are now 9q13 and the ancestor of 9pter or
the 2q-forming chromosome. Several lines of evidence indicate that the
ancestral copy of this block is now in 9q13. The transfer of material
from 9q13 to 2qFus/9pter disrupted an L1PBa element, the
PGM5 gene (Fan et al. 2002
42-kb segment, and the other a
110-kb segment. (We surmise that these blocks derive from 9p11.2, but they may represent additional copies from 9q13, as FISH signals are equally bright in
9p11.2 and 9q13.) These blocks adjoin in the 9q13 sequence, but are
distinct in the regions represented by the 9p11.2-A and -B sequences
(Fig. 3). The blocks could have transposed independently, or together
and then been separated by the insertion of other material. Assuming
that there has been no further exchange between the blocks in these two
bands, the degree of their divergence (1.0% and 1.2%) also dates the
duplication(s) soon after the human-chimpanzee split (Fig. 5). After
human and chimpanzee diverged, the human 9q12 heterochromatic region
expanded, placing the 9q13 paralogous segment much further from the
centromere on the human chromosome than the chimpanzee ortholog.
Chromosome 9 also underwent a second inversion along the chimpanzee
branch after the chimpanzee-human split. Although one inversion
breakpoint maps at cytogenetic resolution close to the 9p11.2 paralog,
this sequence is unlikely to be involved in the rearrangement, because
it appears at this location only on human chromosome 9. These
rearrangements may explain why Martin et al. (2002)Subtelomeric Exchanges
Our study also adds to the complex picture of interchromosomal subtelomeric duplications. Duplications among subtelomeres generated a block common to the chromosome destined to become human 2p and the ancestor of 9pter, and another block common to the chromosome destined to become human 2q and the ancestor of 22qter. Although we are not able to infer the direction of the 9pter-2qFus transfer from the available information, 22qter represents the ancestral state and 2qFus the derived state: The breakpoint is marked by an isochore transition in 2qFus, not 22qter, and the ACR gene is intact in 22qter, but truncated in 2qFus (Fan et al. 2002
that the duplication from 22qter to the 2qFus
ancestor occurred just before the human-chimpanzee-gorilla split
the
two blocks must have undergone homogenizing ectopic exchanges at least
up until the fusion event to reconcile the fact that these sequences are now only 1.4% different. The fact that the blocks in 2qFus and
22qter now carry the same AluYb8 insertion is strong evidence that
these blocks exchanged sequence since humans and chimpanzee diverged.
Members of the AluYb8 family have been actively retrotransposing only
since human-chimpanzee divergence and occur almost exclusively in the
human genome (Carroll et al. 2001Breakpoints
Are there sequences at the breakpoints of homology blocks that might shed light on the duplication and exchange processes that have acted on these regions? Half of the breakpoints defining the pairs of major paralogous blocks in Table 1 can be pinpointed at the sequence level because sequence of both partners is available where the homology breaks down. We observe no element that is common to the available breakpoints of paralogous segments. Several occur in LINE elements, and others are in nonrepeat sequences that have no homology with each other. Four breakpoints appear to occur within a common L1PBa element, but all concern the same events
the displacement of 9q13 homology in
the ancestor of 2qFus and 9pter by sequences common to multiple
telomeres (see also Fan et al. 2002Paralogy and (Deleterious) Rearrangements
We have provided two examples in which blocks paralogous to the
fusion site are potentially involved in gross chromosomal rearrangements that have occurred during hominid evolution. These large
blocks of homology may also sporadically mediate gross rearrangements in humans. Bands 9p11.2 and 9q13 contain the breakpoints of common pericentromeric inversion polymorphisms in humans (Samonte et al.
1996
). The highly similar blocks identified here in these bands
(
40-kb blocks of 99%) could mediate homologous
recombination and cause some of these inversions. This hypothesis could
be tested by comparing the sequence of the common and inverted forms of chromosome 9; the breakpoints should map within the paralogous blocks.
In addition, we would expect the two interacting blocks to lie in
opposite orientation on the chromosome. The tentative orientations of
the blocks in 9q13 and 9p11.2-A (Fig. 3) are consistent with this
expectation. These blocks may also be involved in the formation of a
dicentric chromosome 9 with tandem head-to-tail duplication of the
9p11-q13 region reported by Lukusa et al. (2000)
. Further analyses
will be needed to determine if these blocks of homology bound the
duplicated segments.
The 2qFus-paralogous blocks are also good candidates for involvement in recombination events that cause other de novo rearrangements of human chromosomes. For example, a deletion of the material between 2q11 and 2qFus has been noted in a patient with acute myeloid leukemia in the Mitelman catalog of chromosome abnormalities (http://cgap.nci.nih.gov/chromosomes/CytSearchForm). The catalog also contains at least 60 cases from a wide variety of tumors in which one of the bands containing 2qFus paralogy is joined to unidentified material to form an unbalanced rearrangement.
It has also been suggested that interstitial telomeric sequences are
sites of preferential chromosome breakage, amplification, and
recombination (Bertoni et al. 1994
; Boutouil et al. 1996
; Slijepcevic
et al. 1996
; Simi et al. 1998
; Desmaze et al. 1999
). Some internal
telomeric repeats map at cytogenetic resolution together with mapped
fragile sites (Musio and Mariani 1999
). The inverted telomeric repeat
array was a candidate for the FRA2B, which is located in 2q13 (Williams
and Howell 1977
; Sutherland and Mattei 1987
), but published data (Ijdo
et al. 1992
) and our own experiments (CF, YF, and BT; unpublished
results) show that the FRA2B site maps proximal of the 614-kb region
described here.
In the accompanying paper (Fan et al. 2002
), we characterize 11 genes
within the 2qFus sequence. As a consequence of the various intra- and
interchromosomal duplications documented here, 9 of these genes are
present in the human genome in more than one copy. Thus, in addition to
their historical contributions to the gross structural changes among
hominid chromosomes and possible involvement in chromosomal
rearrangements in humans, duplications and rearrangements of
2qFus-paralogous blocks also have functional relevance.
| |
METHODS |
|---|
|
|
|---|
Database Mining and Sequence Analyses
The sequence of RP11-395L14 served as the entry point for this
study. Homologous sequences were obtained iteratively from GenBank
(Benson et al. 2002
) by BlastN (http://www.ncbi.nlm.nih.gov/BLAST/). Finished sequences from different clones were assembled into the same
contig only if their overlap was contiguous and
99.7%
identical, a reasonable allowance for polymorphism and
sequencing errors. We screened for interspersed repeats with
RepeatMasker
(http://ftp.genome.washington.edu/cgi-bin/RepeatMasker). We used
GESTALT (http://bioinformatics.weizmann.ac.il/GESTALT/) (Glusman and
Lancet 2000
) to characterize GC and repeat content. Pairwise alignments
were performed with BLAST2, without repeat masking and with
gap-initiation and gap-extension parameters adjusted to minimize
breaking long matches into pieces at sites of insertion/deletions. Percent identities of paralogous blocks were calculated from the BLAST2
output as follows (Linardopoulou et al., in prep.). First, the number of nucleotide substitutions between the two sequences was
counted and divided by the number of aligned bases (both numbers exclude gaps). This observed proportion (p) was entered in the Jukes-Cantor equation to estimate K, the number of nucleotide substitutions per site. The Jukes-Cantor equation takes into account that multiple substitutions might have occurred at the same site and is
as follows: K =
(3/4) ln [1-(4p/3)] (Jukes and Cantor 1969
). The
percent identity of the aligned sequences is therefore 100%*(1
K). The number and size of gaps in alignments
caused by insertions and deletions were also extracted from the BLAST2
output. The assembled 2q13-q14.1 sequences are available from our Web site (http://www.fhcrc.org/labs/trask/subtelomeres/index.html).
Monochromosomal Hybrid Panel Analyses
Forty-eight PCR assays were designed across the 614-kb assembled 2q13 sequence by Primer 3 (http://www.genome.wi.mit.edu/cgi-bin/primer/primer3.cgi) (Supplementary Table A, also available online at http://www.genome.org). None amplified a product of the predicted size from control rodent cell lines. The PCR reactions contained 80-100 ng of DNA from the NIGMS Human Genetic Cell Repository Somatic Cell Hybrid Mapping Panel #2 (version 3, Coriell Cell Repository), 250 µM deoxyribonucleoside triphosphates (dNTPs), 0.4 µM each primer, and 1 unit Perkin Elmer AmpliTaq Gold. Cycling conditions were 95°C for 5 min, 35 cycles of 30 sec at 94°C and 1 min at 60°C, followed by 10 min at 60°C. The products were analyzed on ethidium-bromide-stained 1% agarose gels.
Fluorescence In Situ Hybridization (FISH)
Metaphase spreads were prepared from the following cells or cell
lines using published procedures (Trask 1999
) for FISH analyses: peripheral blood cells from various healthy human donors and human male
cell line CGM1; a male chimpanzee (Pan troglodytes) cell line
CRL-1857 from ATCC; a male gorilla (Gorilla gorilla) cell line
CRL-1854 from ATCC; and a female orangutan (Pongo pygmaes) cell line CRL-1850 from ATCC. DNAs from BAC clones were biotinylated by
nick translation and hybridized to metaphase cells fixed on slides.
Methods for preparation of the slides and probe, hybridization, washing, detection with FITC, fluorescent banding, and analysis are
described elsewhere (Trask 1999
). We also used FISH techniques for
conventional and reciprocal chromosome painting as described elsewhere
(Trask et al. 1991
; Trask 1999
) to identify human chromosomal material
contained in the Y hybrid.
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://bioinformatics.weizmann.ac.il/GESTALT; GESTALT.
http://cgap.nci.nih.gov/chromosomes/CytSearchForm; Mitelman catalog.
http://ftp.genome.washington.edu/cgi-bin/RepeatMasker; RepeatMasker.
http://genome.ucsc.edu/; UCSC Human Genome Working Draft.
http://www.fhcrc.org/labs/trask/subtelomeres/index.html; Trask laboratory Web site for supplementary information.
http:www.genome.washington.edu/phrap_documentation.html; cross_match.
http://www-genome.wi.mit.edu/cgi-bin/primer/primer3.cgi; Primer3.
http://www.jgi.doe.gov; DOE Joint Genome Institute.
http://www.ncbi.nlm.nih.gov; NCBI genome resources.
http://www.sanger.ac.uk/HGP/; Sanger Centre.
| |
ACKNOWLEDGMENTS |
|---|
We are grateful to Tera Newman for help with the figures and Mitelman catalog queries, Colbey Harris for administrative assistance, and Janet Young and other members of the Trask lab for discussion and comments on earlier drafts of the manuscript. Some information for this paper was derived through use of the Celera Discovery System and Celera Genomics' associated databases. This work was supported in part by grant GM57070 from NIH.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL btrask{at}fhcrc.org; FAX (206) 667-4023.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.337602.
| |
REFERENCES |
|---|
|
|
|---|