|
|
|
|
Vol. 12, Issue 1, 1-2, January 2002
COMMENTARY
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ARTICLE |
|---|
|
|
|---|
Whereas the role of genome duplication(s) in yeast
and plants has been widely accepted, the hypothesis of genome
duplication in early vertebrates (Ohno 1970
) is still under controversy
(Wolfe 2001
). According to the current version, the 2R model, there
were two rounds of polyploidization: one occurring before the
divergence of jawless vertebrates and the other just after (Sidow
1996
). Recently, doubt has been raised about the 2R model because the evidence was found to be weaker than previously thought (Wolfe 2001
).
For the proponents of the 2R model, this doubt may be
explained as a combination of rapid gene deletion, sequence diversity,
and chromosome rearrangement (Nadeau and Sankoff 1997
; Wang and Gu 2000
). For the opponents, however, the lack of strong evidence is
sufficient to refute the 2R model, applying "Ockham's Razor" (Hughes et al. 2001
; Makalowski 2001
).
Alternatively, the model of small-scale tandem duplications (TDs)
followed by translocations was invoked (Hughes et al. 2001
). Moreover,
Hughes et al. (2001)
used the parsimony to test whether the TD
hypothesis is "better" than the 2R hypothesis. The basic procedure
is to infer the minimum number (G) of genetic events to
explain the gene's current distribution on human chromosomes under
each competing model. Here the genetic events include gene duplications
(D), losses (L), and translocations (T),
that is, GM = D + L + T,
where the subscript M = 2R for the 2R model or TD for the TD model. Under this parsimony, the TD hypothesis
is favored if GTD < G2R; otherwise, the
2R model is favored. After examining 20 vertebrate gene families,
Hughes et al. (2001)
showed that in 14 cases the TD hypothesis was more
parsimonious than the 2R hypothesis.
It should be noted that any test based on parsimony has assumptions.
Hughes et al.'s test (2001)
is valid only if these genetic events,
that is, gene duplication, loss, and translocation, occurred at
approximately the same evolutionary rate. If so, a smaller GM value between the 2R and TD models reflects which
model is more likely to be true. Without reliable data, however, it is difficult to test whether this assumption holds.
Instead, we adopt the testing-data approach, that is, use genome
sequence data in which genome duplication(s) is almost uncontested. We
found that the Arabidopsis genome is suitable for this purpose (The Arabidopsis Genome Initiative 2000
; Blanc et al. 2000
).
Vision et al. (2000)
conducted a genome-wide search, resulting in 103 paralog blocks (http://www.igd.cornell.edu/~tvision.arab). One
paralog block has two copies that are located in different chromosomal
regions. A duplicate gene pair appears in both copies, whereas a
singleton gene appears only in one of them. For most paralog blocks,
the number of singleton genes (S) is much larger than that of
duplicate pairs (x). Let n = S + x
be the total number of predicted ancestral genes (Vision et al.
2000
). Thus, the retention frequency q = x/n
provides an estimate for the survival rate of both duplicate genes in a
paralog block.
Under the model of block duplication (BD), the paralog block was generated by the segmental duplication of one chromosome. Single genes within the paralog block are the consequence of gene deletion (Fig.1A). Some of them may be translocated from other regions (after duplication), but the count would not be affected. Apparently, the total number of genetic events of a block is GBD = (1-q) n + 1, that is, the total number of gene losses, (1-q) n, plus one-time BD.
|
Under the model of TD (TD), gene pairs in the paralog blocks
were generated via TDs followed by translocations (Fig.1 panel B).
Because there are qn gene pairs, each of which has two events, the total number of genetic events is
GTD = 2qn. Then, the parsimony test uses the difference
|
(1) |
> 0
favors TDs, and
< 0 favors BD. The sampling
variance of
is given by Var(
) =
9n2Var(q), where
Var(q) = q(1-q)/n under
the binomial distribution. The statistical significance of rejecting
the null hypothesis
= 0 (GBD
= GTD) is assessed approximately by the
standard z-test.
We have computed q and
for 103 paralog
blocks (Fig.2). Surprisingly, the majority
(94) of paralog blocks have
> 0, indicating that the TD
model is favored. For instance, block 10 has two homologous regions
located in chromosomes 1 and 2, respectively (Vision et al. 2000
).
There are 254 ancestral genes, among which 52 are paired, resulting in
q = 0.205,
= 99 and z = 5.13
(p < 0.01). In total, 68 paralog blocks show
> 0
significantly, whereas two blocks show
< 0 significantly
(p < 0.05, z-test).
|
If all 103 duplicated blocks are the result of m-round genome
duplications, the sum of genetic events is m duplication
events plus the sum of gene losses over 103 blocks, that is,
GR =
m +
i(1-qi)
ni, where qi and
ni are the retention frequency and the number of
ancestral genes in block i, respectively. Note that m
ranges from 1 (The Arabidopsis Genome Initiative 2000
) to 5 (Vision et al. 2000
). Because under the TD model, the total number of genetic events (duplication + translocation) over all blocks
is
i2qi ni, the
difference (GR-GTD) turns out to
be
|
(2) |
R) =
9
ini2Var(qi).
From Vision et al. (2000)
ini = 11847, and
iqi ni = 2794,
resulting in
R = 3465 + m and
Var(
R) = 151.84. Thus, for m = 1-5,
z = 281.3-281.6, which means
R > 0 highly
significantly (p < 10-5), and the TD model is
strongly favored.
In summary, when the parsimony test of Hughes et al (2001)
is applied
for the Arabidopsis genome sequence data, the TD model is
statistically superior to the BD model or the genome duplication model.
However, this inference is contrasted with substantial evidence
supporting the genome (block) duplication(s) in the
Arabidopsis (Vision et al. 2000
). This dilemma is probably due
to the fast rate of gene loss after gene (genome) duplication (note
that the mean of q is 0.23). In the yeast, only ~ 15%
duplicate pairs maintained after the genome duplication (Wolfe 2001
).
Some theoretical models predict that the rate of gene loss should be at
least an order of magnitude higher than the rate that both duplicates
survive (Ohta 1988
; Walsh 1995
). In addition, the blocks with
< 0 (BD favored) are generally those with the highest retention
frequency (Fig.2).
We conclude that the evolutionary trajectory of gene duplication, loss,
and translocation may not follow the parsimony principle formulated by
Hughes et al. (2001)
. Therefore, the potential misleading should be
fully recognized when the parsimony test (Hughes et al 2001
) is used
for testing the 2R model in vertebrates. Of course, the parsimony test
is only one of anti-2R arguments in Hughes et al. (2001)
, so the debate
is not over yet.
| |
ACKNOWLEDGMENTS |
|---|
This work is supported by the NIH grant RO1 GM62118 to Xun Gu.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL xgu{at}iastate.edu; FAX 515-294-8457.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.214402.
| |
REFERENCES |
|---|
|
|
|---|
Received September 12, 2001; accepted in revised form November 2, 2001.
This article has been cited by other articles:
![]() |
W. Wang, J. W. Park, J. L. Wang, and R. J. Patterson Immunoprecipitation of spliceosomal RNAs by antisera to galectin-1 and galectin-3 Nucleic Acids Res., October 6, 2006; (2006) gkl673v3. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Gu and H. Zhang Genome Phylogenetic Analysis Based on Extended Gene Contents Mol. Biol. Evol., July 1, 2004; 21(7): 1401 - 1408. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Remington, T. J. Vision, T. J. Guilfoyle, and J. W. Reed Contrasting Modes of Diversification in the Aux/IAA and ARF Gene Families Plant Physiology, July 1, 2004; 135(3): 1738 - 1752. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Taylor, I. Braasch, T. Frickey, A. Meyer, and Y. Van de Peer Genome Duplication, a Trait Shared by 22,000 Species of Ray-Finned Fish Genome Res., March 1, 2003; 13(3): 382 - 390. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Blanc, K. Hokamp, and K. H. Wolfe A Recent Polyploidy Superimposed on Older Large-Scale Duplications in the Arabidopsis Genome Genome Res., February 1, 2003; 13(2): 137 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Larhammar, L.-G. Lundin, and F. Hallbook The Human Hox-bearing Chromosome Regions Did Arise by Block or Chromosome (or Even Genome) Duplications Genome Res., December 1, 2002; 12(12): 1910 - 1920. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||