Genome Res. 13:1638-1645, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
Letter
Divergence in the Spatial Pattern of Gene Expression Between Human Duplicate Genes
Kateryna D. Makova1 and
Wen-Hsiung Li2
Department of Ecology and Evolution, University of Chicago, Chicago,
Illinois 60637, USA
 |
ABSTRACT
|
|---|
Microarray gene expression data provide a wealth of information for
elucidating the mode and tempo of molecular evolution. In the present study,we
analyze the spatial expression pattern of human duplicate gene pairs by using
oligonucleotide microarray data,and study the relationship between coding
sequence divergence and expression divergence. First,we find a strong positive
correlation between the proportion of duplicate gene pairs with divergent
expression (as presence or absence of expression in a tissue) and both
synonymous (KS) and nonsynonymous divergence
(KA). The divergence of gene expression between human
duplicate genes is rapid, probably faster than that between yeast duplicates
in terms of generations. Second,we compute the correlation coefficient
(R) between the expression levels of duplicate genes in different
tissues and find a significant negative correlation between R and
KS. There is also a negative correlation between
R and KA,when KA ≤ 0.2.
These results indicate that protein sequence divergence and divergence of
spatial expression pattern are initially coupled. Finally,we compare the
functions of those duplicate genes that show rapid divergence in spatial
expression pattern with the functions of those duplicate genes that show no or
little divergence in spatial expression.
Ever since Ohno (1970 ), the
evolution of duplicate genes has been a subject of extensive theoretical
modeling and empirical research. Lately, there has been much interest in
whether a positive correlation exists between coding region divergence and
gene expression divergence. In particular, two recent studies
(Wagner 2000 ;
Gu et al. 2002b ) used yeast
microarray data to test the presence of such correlation on a genome-wide
scale. Wagner (2000 ) explored
the relationship between protein sequence divergence and mRNA expression
divergence among 144 yeast duplicate genes. The expression was measured at
multiple time points in four physiological processes. No significant
correlation was observed, implying decoupling of coding sequence (CDS)
divergence and expression divergence. Gu et al.
(2002b ) investigated expression
divergence in a larger sample of yeast duplicate genes (400 pairs) and used
the microarray expression data from 14 processes. The expression divergence
between duplicate genes was significantly correlated with their synonymous
divergence (KS) and with their nonsynonymous divergence
(KA) when KA ≤ 0.30, contrary to
the conclusion of Wagner
(2000 ).
In the present study, we investigate the relationship between CDS
divergence and spatial expression divergence among human duplicate genes
(paralogs). To our knowledge, this is the first study that uses microarray
data to analyze the evolution of human gene expression on a genome-wide scale.
Specifically, we focus on the following questions: (1) how quickly do human
paralogs diverge in their expression; (2) does expression divergence increase
with gene sequence divergence, that is, evolutionary time; (3) what are the
functions of gene pairs with rapid divergence in expression; and (4) does the
present study of spatial expression of human paralogs support the conclusion
drawn from the study of temporal expression of yeast paralogs
(Gu et al. 2002b )? It is
believed that transcription regulation is more complex in mammals than in
lower eukaryotes, for example, in yeast
(Huang et al. 1999 ). We intend
to explore whether this has any implications for the tempo of gene expression
evolution. The studies on yeast investigated temporal expression only, as it
is difficult to study spatial expression in a single cell organism. Because
there are no comprehensive data on temporal gene expression in humans
(Ly et al. 2000 ;
Cho et al. 2001 ), we used the
data of Su et al. (2002 ), who
generated a spatial gene expression profile for human genes by using the U95A
oligonucleotide array (Affymetrix). It is the largest study of spatial
(tissue) expression of human genes available to date.
 |
RESULTS
|
|---|
Identification of Duplicate Genes
Human U95A oligonucleotide array contains 12,387 probes. These include
probes from 7565 human genes with annotated CDSs in GenBank. The other probes
predominantly correspond to ESTs, which were not used in this study. Duplicate
genes with annotated CDSs were identified and grouped into multigene families
by using a rigorous method developed by Gu et al.
(2002a ; see Methods). From this
analysis, we estimated that the U95A array contains 875 multiple gene
families.
A total of 1404 independent duplicate gene pairs were selected for further
analysis. KS and KA divergences
between duplicate genes were calculated. The expression data for these gene
pairs studied in 25 independent and nonredundant tissues were retrieved from
Su et al. (2002 ).
Proportion of Gene Pairs With Diverged Expression Increases With
Time
To study the dynamics of spatial expression divergence, we calculated the
proportion of gene pairs with diverged expression among all pairs duplicated
at approximately the same time, that is, having the same
KS value. This analysis was limited to 1230 gene pairs for
which at least one member of a pair is expressed in at least one tissue (for
the definition of a gene being expressed, see Methods). Two duplicate genes
are said to have diverged expression in a particular tissue if one gene is
expressed in that tissue and the other is not. We used two definitions of gene
expression divergence. In the first one, a gene pair is said to have diverged
in expression if it shows diverged expression in at least one of the tissues
studied. In the second definition, a gene pair is said to have diverged in
expression if it shows diverged expression in at least two of the
tissues studied. The latter definition is more robust against errors in
microarray typing. Both definitions are conservative because they exclude
cases in which both genes are expressed, in which both genes are not
expressed, or in which one is expressed (or not expressed) and the other is
marginally expressed. These definitions are also conservative in a sense that
they do not take into account quantitative differences in expression. Thus,
they underestimate the divergence in expression. However, they highlight the
evolution of tissue-specific expression. The measure that takes into account
the quantitative differences in expression is described in the next
section.
First, we used KS as a proxy of divergence time. A high
positive correlation (although not significant) is observed between the
proportion of gene pairs with diverged expression and KS
(Fig. 1A). This is true for the
proportion of genes with diverged expression in at least one tissue and in at
least two tissues. Strikingly, 73.3% of the gene pairs with an average
KS of only 0.064 already have diverged in expression in at
least one tissue, whereas 56.7% of these genes have diverged in expression in
at least two tissues. These percentages increase to 90.0% and 73.3%,
respectively, for gene pairs with an average KS of 1.2.
Thus, rapid divergence in spatial expression pattern is observed between
duplicate genes. The relationship between divergence time (measured by
KS) and the proportion of gene pairs with diverged
expression is approximately linear.

View larger version (8K):
[in this window]
[in a new window]
|
Figure 1 The relationship between sequence divergence and the proportion of human
gene pairs with diverged expression. (A) Synonymous divergence
(KS) is used to represent sequence divergence. Each point
represents 30 gene pairs. (B) Nonsynonymous divergence
(KA) is used to represent sequence divergence. Each point
represents 60 gene pairs. Solid diamonds represent the proportion of
gene pairs with diverged expression in at least one tissue, and open
diamonds represent proportion of gene pairs with diverged expression in
at least two tissues. Solid and punctured lines are the corresponding linear
regressions.
|
|
A statistically significant positive correlation is observed between
KA and the proportion of gene pairs with diverged
expression in either at least one or in at least two tissues
(Fig. 1B). However, the
correlation coefficient is smaller than the one observed when
KS is used because KS is a better
proxy of evolutionary time (see Discussion). Again, divergence in gene
expression occurs very rapidly. Indeed, at an average KA
of 0.044, 78.3% of gene pairs have diverged in expression in at least one
tissue, and 60% of them have diverged in expression in at least two tissues.
The proportion of genes with diverged expression increases rapidly and reaches
a plateau at KA = 0.2. At an average
KA of 0.212, almost all gene pairs (98.3%) have diverged
in expression in at least one tissue, and 88.3% of gene pairs have diverged in
expression in at least two tissues. Thus, even when we used
KA as a proxy of evolutionary time, we observed rapid
divergence in gene expression among duplicate genes and a significant
correlation between KA and the proportion of gene pairs
with diverged expression.
Correlation Between CDS Divergence and Expression Divergence
Another way of measuring similarity in expression pattern between two genes
is to compute the Pearson correlation coefficient (R) between the
expression levels of two genes over the tissues studied. As will be explained
in the Discussion, this measure is less desirable than that described above,
but it can also give insights into the dynamics of gene expression divergence.
First, we considered cases in which both copies of a gene pair were expressed
in at least five of the tissues studied. Only 269 gene pairs (group A)
satisfied this criterion. Next, we considered cases in which at least one of
the two copies was expressed in at least five of the tissues studied. This
adds some noise to the calculation of R; however, it allows us to
increase the sample size. A total of 895 gene pairs were selected originally,
but later only 841 gene pairs (group B) were retained for the final analysis
because in the other 54 gene pairs only one gene of the pair was expressed,
resulting in R = 0. We used the transformation ln[(1 +
R)/(1 - R)] and then carried out the normal linear
regression between each pair of KS (or
KA) and the transformed R.
A significant negative correlation was found between ln[(1 +
R)/(1 - R)] and KS for genes in group A
(R = -0.65, P < 0.0004;
Fig. 2A) and in group B
(R = -0.34, P < 0.0012;
Fig. 2B). To test whether the
transformation changed our conclusion, we also carried out the linear
regression between KS and R (data not shown).
This again resulted in a significant negative correlation for both group A
(R = -0.63, P < 0.0005) and group B (R = -0.31,
P < 0.0164). Thus, the correlation coefficient of gene expression
between duplicate genes decreases approximately linearly with divergence time
as measured by KS.

View larger version (8K):
[in this window]
[in a new window]
|
Figure 2 The relationship between synonymous rate (KS) and the
transformed correlation coefficient of gene expression values between
duplicate genes: in which both genes are expressed in at least five tissues
(24 gene pairs, group A; A) and in which at least one gene is
expressed in at least five tissues (94 gene pairs, group B; B). Only
gene pairs with KS < 1.4 were included.
|
|
A weak negative correlation (data not shown) was observed between
KA (KA < 0.70) and ln[(1 +
R)/(1 - R)] for group A (R = -0.26, P <
0.0001) and group B (R = -0.19, P < 0.0001). However,
this correlation becomes stronger for both groups (R = -0.42,
P < 0.0006 for group A and R = -0.38, P <
0.0001 for group B) when only gene pairs with KA < 0.2
are examined (Fig. 3A,B). With
KA > 0.2 (Fig.
3C,D), the correlation is considerably weaker and no longer
statistically significant (R = -0.15, P < 0.0643 for
group A and R = -0.05, P < 0.21). The choice of
KA < 0.2 as a dividing point is arbitrary; however, the
correlation coefficient changes only slightly from R = -0.41
(R = -0.36 for group B) for KA < 0.15 to
R = -0.36 (R = -0.37 for group B) for KA
< 0.25. Therefore, initially there is a coupling between gene expression
divergence and KA.

View larger version (21K):
[in this window]
[in a new window]
|
Figure 3 The relationship between nonsynonymous rate (KA) and
the transformed correlation coefficient of gene expression values between
duplicate genes: 60 gene pairs with KA < 0.2 from group
A (A); 153 gene pairs with KA < 0.2 from group
B (B); 165 gene pairs with 0.2 < KA < 0.7
from group A (C); and 609 gene pairs with 0.2 <
KA < 0.7 from group B (D).
|
|
Functions of Gene Pairs With Rapid Divergence or No Divergence in
Expression
It is interesting to look into the functions of duplicate genes that show
rapid divergence in expression. Thus, we investigated the functions of the
duplicate gene pairs with KS < 0.3 and with diverged
expression (as presence or absence of expression in a tissue) in at least 50%
of the tissues studied (we considered only the tissues in which at least one
gene of a pair is expressed). There were 38 such gene pairs
(Table 1). Also, we examined
duplicate gene pairs with KS < 0.3 and a correlation
coefficient of gene expression (R) < 0.5. There were 18 gene pairs
in this group (Table 1).
Interestingly, most of the gene pairs in these two groups overlapped. Thus,
the results from the two measures concur. The functions of these genes were
retrieved from LocusLink
(http://www.ncbi.nlm.nih.gov/LocusLink/ )
manually. The gene pairs in these two groups encode enzymes (oxidoreductases,
hydrolases, transferases, and an isomerase), proteins of the immune system
(e.g., lymphocyte antigen, cytokine gro-beta, MHC proteins, and
immunoglobulins), transcription factors, structural proteins (e.g.,
amelogenin, keratin, and skeletal muscle protein), and receptors
(Table 1). To determine whether
any of the functions were overrepresented among genes with rapid divergence in
expression, we compared their functions with the functions of the other
duplicate genes using the Gene Ontology database
(Camon et al. 2003 ). There was
indeed a significantly higher proportion of immune response genes among gene
pairs with rapid divergence in expression compared with other gene pairs in
our study (P < 0.009 for gene pairs with KS
< 0.5 and diverged expression in at least 50% of studied tissues;
P < 0.001 for gene pairs with KS < 0.5 and
R < 0.5).
It is also interesting to look into the function of duplicate genes that
show no or little expression divergence, even though they duplicated a long
time ago. Thus, we investigated gene pairs with KS > 3
and with no divergence in tissue expression (a total of 33 gene pairs;
Table 2). Interestingly, two
thirds of these gene pairs are almost ubiquitously expressed (expressed in 24
to 25 out of the 25 tissues analyzed), and another 15% are expressed in one
tissue only (i.e., tissue-specific). Then we added gene pairs with R
> 0.8 and KS > 3 (a total of six gene pairs). Only
one gene pair is shared between the two groups. The gene pairs that have been
well conserved in expression are enzymes (transferases, hydrolases, and
helicases), transcription factors, membrane-bound proteins (e.g., adducins and
connexins), structural proteins (keratin and tubulin), and proteasome
components (Table 2). However,
as the number of the proteins in each functional class is small, none of these
classes is found to be significantly overrepresented among the gene pairs with
slow divergence in expression, when they are analyzed using the Gene Ontology
database.
 |
DISCUSSION
|
|---|
We found that a large proportion of human duplicate genes have diverged
rapidly in their spatial expression. Assuming that the average synonymous rate
in higher primates is 1.5 x 10-9 nucleotide substitutions per
site per year (Yi et al.
2002 ), 75.5% of human paralogs diverge in their expression in at
least one tissue after only 25 Myr (KS = 0.068). It is
likely that the true proportion of gene pairs with diverged gene expression is
even higher than shown here, because only 25 tissues were analyzed and only a
single (presumably normal) physiological condition was studied. In addition,
the classification of tissues used by Su et al.
(2002 ) does not correspond to
the histological classification. For example, such complex organs as pancreas
are called tissues in Su et al.
(2002 ), whereas in reality
they are composed of multiple tissues. These organs by themselves are likely
to exhibit a wealth of differential spatial gene expression. We estimate that
the rate of expression divergence in human paralogs is 40 times slower
than that of yeast paralogs (Gu et al.
2002b ), if the absolute time of divergence is considered. However,
the generation time is several orders of magnitude shorter in yeast than in
humans. Thus, when calculated per generation, expression divergence is more
rapid in humans than in yeast. This might be due to a more complex
transcription regulation in mammals than in lower eukaryotes
(Huang et al. 1999 ). It could
also be because of more possibilities in which such divergence can be
manifested; for example, gene expression is regulated in a larger number of
tissues in humans than in yeast. Alternatively (or additionally), this could
be intrinsic to the spatial pattern of gene expression. A study of temporal
gene expression divergence in humans should distinguish between these two
possibilities.
Expression divergence increases approximately linearly with
KS and, thus, with the evolutionary time. Therefore,
similar to that in yeast duplicates (Gu et
al. 2002b ), gene sequence divergence and expression divergence are
coupled for human duplicates. Interestingly, the linear relationship between
expression divergence and KS, when extrapolated to time 0,
does not pass through the origin. We propose two possible factors for this
observation. First, this might reflect that expression divergence is more
discrete in nature compared with sequence divergence, which is continuous.
Second, this might be partly because a duplication might have not included all
the regulatory elements, so that the two duplicates had already differed in
expression to some extent right after duplication.
Note that the correlation coefficient (R) was calculated over many
tissues (tissues in which at least one of the duplicates is expressed). Such
pooling of data will include genes that are not relevant to the experiment
under consideration. Such genes may show similar expression patterns, and
thus, their inclusion would tend to increase the correlation of expression and
underestimate the divergence in expression.
Initially, R and KA are coupled
(KA < 0.2). After KA becomes
>0.2, R is not correlated with KA. Note that
at KA = 0.2, almost all duplicates have already diverged
in their expression in at least one tissue.
In this study, KS and KA, but not
protein sequence divergence (d), were used as proxies of time since
gene duplication. KS is a more appropriate proxy of
divergence time compared with the other two measures because
KS varies substantially less among genes than does
KA or d (Li
1997 ). Both KA and d are much
affected by selection, which may differ greatly among genes.
KS, on the other hand, is less affected by selection,
particularly in mammals, in which there is no evidence for strong selection on
codon bias (Urrutia and Hurst
2001 ). However, KS is affected by regional
variation in mutation rate within a genome
(Li 1997 ;
Lercher et al. 2001 ;
Williams and Hurst 2002 ). As a
result, KS is still variable among genes, which may partly
explain why we do not observe a strong correlation between
KS and expression divergence measured by R.
The expression data obtained by the hybridization of RNA to the
oligonucleotide arrays are supposed to be more accurate than is cDNA
microarray data (Wodicka et al.
1997 ). The Affymetrix array probes are designed to represent the
unique portions of a gene. Each probe sequence is scanned against the
available genomic sequences, minimizing cross-hybridization between duplicate
genes. This approach has a drawback of excluding recently duplicated genes
from an array, as unique probes cannot be designed for them. The arrays based
on cDNAs are more prone to cross-hybridization of duplicate genes to the same
probe. Nevertheless, our results based on oligonucleotide array data are in
agreement with the results of Gu et al.
(2002b ), who used mainly cDNA
arrays. Still, the microarray data are expected to be quite noisy, decreasing
the strengths of correlations inferred in the present study.
It is important to note that cross-hybridization tends to underestimate the
degree of expression divergence. Therefore, the presence of
cross-hybridization should reinforce rather than contradict our conclusion of
rapid expression divergence between duplicate genes.
Nevertheless, to test the ability of the Affymetrix arrays to discriminate
between the paralogs under study, we performed two tests. First, we compared
the probe sequences between two genes for each duplicate pair. (Each gene was
represented by 16 oligonucleotide probes; each probe was 25 nucleotides long.)
From the original 1404 independent gene pairs selected, only seven had one or
more probes (two to seven probes in each case) with identical (or reverse
complement) sequences. Additional four gene pairs had probes with one
nucleotide mismatch (one to five probes in each case). Thus, it seems that
cross-hybridization between duplicate genes was not a serious problem and did
not significantly affect our results.
Second, we considered duplicate pairs that were expressed in multiple
tissues and that showed differing expression in at least one tissue. These
were the cases in which the probes were apparently able to discriminate
between the duplicates to some degree at least. Here the genes were considered
diverged in expression in a tissue if their expression values differed as
average difference (AD) > 200 (see Methods). Most duplicate gene pairs
satisfy this criterion. We tested for a relationship between expression and
sequence divergence in the remaining tissues for these genes. The results
(data not shown) did not significantly differ from the original results,
ensuring that the correlation is real.
Our investigation of gene pairs that have rapidly diverged in their
expression indicates that typically spatial expression pattern alters both in
terms of presence or absence in particular tissues and in terms of the
absolute amounts of mRNA transcripts (Table
1). An interesting observation regarding gene pairs that show no
divergence in their expression over extensive evolutionary time is that they
are usually either ubiquitously expressed or tissue-specific
(Table 2). For these gene
duplicates, both copies are preserved in the genome without a change in their
spatial expression and are most likely maintained by purifying selection. We
speculate that in such cases, it is advantageous to have a higher dosage of
the gene transcript in the cell.
We found a large number of proteins involved in the defense system of an
organism among the duplicate pairs with rapid divergence in spatial
expression. This is in agreement with a strong selective pressure for
adaptation in such proteins (Hughes and Nei
1988 ; for review, see Wolfe
and Li 2003 ).
It is worth noting that only a subset of human duplicate genes has been
included in this study. These included largely the well-characterized genes
that had been discovered before the completion of the Human Genome Project.
This could have introduced a bias, for instance, toward inclusion of
duplicates that have differing functions and that, therefore, may be more
likely to have differing expression patterns than randomly selected duplicate
gene pairs would.
This study examines divergence in one of the phenotypic manifestations of
duplicate genes, namely, divergence in the pattern of spatial expression. It
would be of great interest to investigate the molecular basis of such
divergence, that is, divergence in the regulatory regions of gene
expression.
 |
METHODS
|
|---|
Identification of Duplicate Genes
The GenBank accession numbers for the sequences of the U95A array
(Affymetrix) were downloaded from the Affymetrix Web site
(http://www.affymetrix.com ).
The corresponding nucleotide sequences were retrieved by using Batch Entrez.
Then, GenBank entries were parsed, and only the entries with the annotated CDS
(CDS tag) were used in a subsequent analysis.
To identify duplicate gene pairs, we followed the method of Gu et al.
(2002a ). Briefly, every protein
was used as the query to search against all other proteins by using FASTA
(E = 10). Two proteins are scored as forming a link if (1) the
FASTA-alignable region between them is >80% of the longer protein, and (2)
the identity (I) between them I ≥ 30% if the alignable
region is longer than 150 aa and I ≥ 0.01n +
4.8L-0.32[1 + exp(-L/1000)]
(Rost 1999 ) for all other
protein pairs, in which n = 6 and L is the alignable length
between the two proteins. Proteins with the same sequence, but different
names, were deleted from the database. Clustering was performed by using the
single-linkage clustering algorithm. All protein pairs with identity
(excluding gaps) >97% were manually inspected, and isoforms were deleted.
Each protein was used as the query to search against the database of human
repetitive elements. If the proteins formed a link because of their homology
with the same repetitive element, they were deleted. All steps were repeated
in the second-round grouping to identify gene families.
The yn00 module (Yang and Nielsen
2000 ) of PAML (Yang
1997 ) with default parameters was used to calculate the number of
synonymous substitutions per synonymous site (KS) and the
number of nonsynonymous substitutions per nonsynonymous site
(KA). Independent pairs of duplicate genes were selected
by using the following procedure. For each multiple gene family, gene pairs
were sorted by KS in ascending order. The pair with the
smallest KS was selected first. Later, we proceeded by
selecting independent pairs (pairs that do not contain genes already selected)
with increasing KS.
All gene pairs were aligned using CLUSTALW
(Thompson et al. 1994 ).
Duplicate genes with KS > 1.4 were excluded because of
difficulties to obtain reliable estimates. Likewise, gene pairs with
KA > 0.7 were also excluded.
Expression Data Analysis
The expression data for the 25 human tissues were retrieved from
http://expression.gnf.org
(Su et al. 2002 ). Expression
values were averaged among replicas. We followed the method of Su et al.
(2002 ) in defining expressed
and not expressed genes. For calculating the proportion of gene pairs with
altered expression, an AD value of >200 was used to call a gene expressed
in a particular tissue (this corresponds to approximately three to five copies
of mRNA per cell). Similarly, a gene was called not expressed if AD was
<100. Genes with 100 < AD < 200 were called marginally expressed and
were excluded from the analysis. The gene pairs analyzed are given in
Supplemental Table 1, available
at
www.genome.org.
For studying the relationship between KS (or
KA) and the correlation coefficient of gene expression, we
analyzed only the gene pairs in which either both (group A; Suppl.
Table 2) or at least one (group
B; Suppl. Table 3) of the genes was expressed in at least five tissues (AD
> 200), and only these tissues were considered. The AD values were
log2-transformed. The Pearson correlation coefficient R
was transformed into ln[(1+R)/(1 - R)] to make the
scale more appropriate for the linear regression analysis. The linear
regression was carried out between each pair of KS (or
KA) and the transformed R.
 |
Acknowledgements
|
|---|
We are grateful to Z. Gu, H. Kaessmann, and T. Oakley for comments on the
earlier versions of the manuscript; to the reviewers for many excellent
comments improving our manuscript; and to A. Nekrutenko for help in revising
this manuscript. This study was supported by NIH grants.
The publication costs of this article were defrayed in part by payment of
page charges. This article must therefore be hereby marked
"advertisement" in accordance with 18 USC section 1734 solely to
indicate this fact.
 |
Footnotes
|
|---|
Article and publication are at
http://www.genome.org/cgi/doi/10.1101/gr.1133803.
1 Present address: Department of Biology, Penn State University, 208 Mueller
Lab, University Park, Pennsylvania 16802, USA. 
2 Corresponding author. E-MAIL
whli{at}uchicago.edu;
FAX (773) 702-9740. 
[Supplemental material is available online at www.genome.org.]
 |
REFERENCES
|
|---|
Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W.,
Kersey, P., Mulder, N., Oinn, T., Maslen, J., Cox, A., et al.
2003. The Gene Ontology Annotation (GOA) Project: Implementation
of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res.
13:
662-672.[Abstract/Free Full Text]
Cho, R.J., Huang, M., Campbell, M.J., Dong, H., Steinmetz, L.,
Sapinoso, L., Hampton, G., Elledge, S.J., Davis, R.W., and Lockhart, D.J.
2001. Transcriptional regulation and function during the human
cell cycle. Nat. Genet.
27: 48-54.[Medline]
Gu, Z., Cavalcanti, A., Chen, F.-C., Bouman, P., and Li, W.-H.
2002a. Extent of gene duplication in the genomes of
Drosophila, nematode, and yeast. Mol. Biol.
Evol. 19:
256-262.[Abstract/Free Full Text]
Gu, Z., Nicolae, D., Lu, H.H., and Li, W.-H. 2002b.
Rapid divergence in expression between duplicate genes inferred from
microarray data. Trends Genet.
18:
609-613.[CrossRef][Medline]
Huang, L., Guan, R.J., and Pardee, A.B. 1999.
Evolution of transcriptional control from prokaryotic beginnings to eukaryotic
complexities. Crit. Rev. Eukaryot. Gene Expr.
9: 175-182.[Medline]
Hughes, A.L. and Nei, M. 1988. Pattern of nucleotide
substitution at major histocompatibility complex class I loci reveals
overdominant selection. Nature
335:
167-170.[CrossRef][Medline]
Lercher, M.J., Williams, E.J., and Hurst, L.D. 2001.
Local similarity in evolutionary rates extends over whole chromosomes in
humanrodent and mouserat comparisons: Implications for
understanding the mechanistic basis of the male mutation bias. Mol.
Biol. Evol. 18:
2032-2039.[Abstract/Free Full Text]
Li, W.-H. 1997. Molecular
evolution. Sinauer Associates, Sunderland, MA.
Ly, D.H., Lockhart, D.J., Lerner, R.A., and Schultz, P.G.
2000. Mitotic misregulation and human aging.
Science 287:
2486-2492.[Abstract/Free Full Text]
Ohno, S. 1970. Evolution by gene
duplication. Springer-Verlag, New York.
Rost, B. 1999. Twilight zone of protein sequence
alignments. Protein Eng.
12: 85-94.[Abstract/Free Full Text]
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R.,
Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al.
2002. Large-scale analysis of the human and mouse transcriptomes.
Proc. Natl. Acad. Sci.
99:
4465-4470.[Abstract/Free Full Text]
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994.
CLUSTAL W: Improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, position-specific gap penalties and
weight matrix choice. Nucleic Acids Res.
22:
4673-4680.[Abstract/Free Full Text]
Urrutia, A.O. and Hurst, L.D. 2001. Codon usage bias
covaries with expression breadth and the rate of synonymous evolution in
humans, but this is not evidence for selection.
Genetics 159:
1191-1199.[Abstract/Free Full Text]
Wagner, A. 2000. Decoupled evolution of coding region
and mRNA expression patterns after gene duplication: Implications for the
neutralist-selectionist debate. Proc. Natl. Acad. Sci.
97:
6579-6584.[Abstract/Free Full Text]
Williams, E.J. and Hurst, L.D. 2002. Is the synonymous
substitution rate in mammals gene-specific? Mol. Biol.
Evol. 19:
1395-1398.[Free Full Text]
Wodicka, L., Dong, H., Mittmann, M., Ho, M.H., and Lockhart, D.J.
1997. Genome-wide expression monitoring in Saccharomyces
cerevisiae. Nat. Biotechnol.
15:
1359-1367.[CrossRef][Medline]
Wolfe, K.H. and Li, W.-H. 2003. Molecular evolution
meets the genomics revolution. Nat. Genet.
33:
255-265.
Yang, Z. 1997. PAML: A program package for
phylogenetic analysis by maximum likelihood. CABIOS
13:
555-556.
Yang, Z. and Nielsen, R. 2000. Estimating synonymous
and nonsynonymous substitution rates under realistic evolutionary models.
Mol. Biol. Evol. 17:
32-43.[Abstract/Free Full Text]
Yi, S., Ellsworth, D.L., and Li, W.-H. 2002. Slow
molecular clocks in old world monkeys, apes, and humans. Mol. Biol.
Evol. 19:
2191-2198.[Abstract/Free Full Text]
 |
WEB SITE REFERENCES
|
|---|
http://www.ncbi.nlm.nih.gov/LocusLink/;
LocusLink.
http://www.affymetrix.com;
Affymetrix Web site.
http://expression.gnf.org;
expression data for the 25 human tissues.
Received December 23, 2003;
accepted in revised format April 25, 2003.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
C. D. Schlichting
Hidden Reaction Norms, Cryptic Genetic Variation, and Evolvability
Ann. N.Y. Acad. Sci.,
June 1, 2008;
1133(1):
187 - 203.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Kafri, O. Dahan, J. Levy, and Y. Pilpel
Preferential protection of protein interaction network hubs in yeast: Evolved functionality of genetic redundancy
PNAS,
January 29, 2008;
105(4):
1243 - 1248.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. W. Ganko, B. C. Meyers, and T. J. Vision
Divergence in Expression between Duplicated Genes in Arabidopsis
Mol. Biol. Evol.,
October 1, 2007;
24(10):
2298 - 2309.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Freeling, L. Rapaka, E. Lyons, B. Pedersen, and B. C. Thomas
G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis
PLANT CELL,
May 1, 2007;
19(5):
1441 - 1457.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. C. Thomas, L. Rapaka, E. Lyons, B. Pedersen, and M. Freeling
Arabidopsis intragenomic conserved noncoding sequence
PNAS,
February 27, 2007;
104(9):
3348 - 3353.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Gu and Z. Su
Tissue-driven hypothesis of genomic evolution and sequence-expression correlations
PNAS,
February 20, 2007;
104(8):
2779 - 2784.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Prabagaran, A.H. Bandivdekar, V. Dighe, and V.P. Raghavan
HOXBES2: A Novel Epididymal HOXB2 Homeoprotein and Its Domain-Specific Association with Spermatozoa
Biol Reprod,
February 1, 2007;
76(2):
314 - 326.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Semon and L. Duret
Evolutionary Origin and Maintenance of Coexpressed Gene Clusters in Mammals
Mol. Biol. Evol.,
September 1, 2006;
23(9):
1715 - 1723.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Wang, K. Chong, and T. Wang
Divergence in spatial expression patterns and in response to stimuli of tandem-repeat paralogues encoding a novel class of proline-rich proteins in Oryza sativa
J. Exp. Bot.,
August 1, 2006;
57(11):
2887 - 2897.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Freeling and B. C. Thomas
Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity
Genome Res.,
July 1, 2006;
16(7):
805 - 814.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. C. Thomas, B. Pedersen, and M. Freeling
Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes
Genome Res.,
July 1, 2006;
16(7):
934 - 946.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B.-Y. Liao and J. Zhang
Low Rates of Expression Profile Divergence in Highly Expressed Genes and Tissue-Specific Genes During Mammalian Evolution
Mol. Biol. Evol.,
June 1, 2006;
23(6):
1119 - 1128.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S.-H. Kim and S. V. Yi
Correlated Asymmetry of Sequence and Functional Divergence Between Duplicate Proteins of Saccharomyces cerevisiae
Mol. Biol. Evol.,
May 1, 2006;
23(5):
1068 - 1075.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B.-Y. Liao and J. Zhang
Evolutionary Conservation of Expression Profiles Between Human and Mouse Orthologous Genes
Mol. Biol. Evol.,
March 1, 2006;
23(3):
530 - 540.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. A. Chapman, J. E. Bowers, F. A. Feltus, and A. H. Paterson
Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication
PNAS,
February 21, 2006;
103(8):
2730 - 2735.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. He and J. Zhang
Transcriptional Reprogramming and Backup Between Duplicate Genes: Is It a Genomewide Phenomenon?
Genetics,
February 1, 2006;
172(2):
1363 - 1367.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. M. Duarte, L. Cui, P. K. Wall, Q. Zhang, X. Zhang, J. Leebens-Mack, H. Ma, N. Altman, and C. W. dePamphilis
Expression Pattern Shifts Following Duplication Indicative of Subfunctionalization and Neofunctionalization in Regulatory Genes of Arabidopsis
Mol. Biol. Evol.,
February 1, 2006;
23(2):
469 - 478.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Yang, A. I. Su, and W.-H. Li
Gene Expression Evolves Faster in Narrowly Than in Broadly Expressed Mammalian Genes
Mol. Biol. Evol.,
October 1, 2005;
22(10):
2113 - 2118.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Khaitovich, I. Hellmann, W. Enard, K. Nowick, M. Leinweber, H. Franz, G. Weiss, M. Lachmann, and S. Paabo
Parallel Patterns of Evolution in the Genomes and Transcriptomes of Humans and Chimpanzees
Science,
September 16, 2005;
309(5742):
1850 - 1854.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Lemos, B. R. Bettencourt, C. D. Meiklejohn, and D. L. Hartl
Evolution of Proteins and Gene Expression Levels are Coupled in Drosophila and are Independently Associated with mRNA Abundance, Protein Length, and Number of Protein-Protein Interactions
Mol. Biol. Evol.,
May 1, 2005;
22(5):
1345 - 1354.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. He and J. Zhang
Rapid Subfunctionalization Accompanied by Prolonged and Substantial Neofunctionalization in Duplicate Gene Evolution
Genetics,
February 1, 2005;
169(2):
1157 - 1164.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. H. Oakley, Z. Gu, E. Abouheif, N. H. Patel, and W.-H. Li
Comparative Methods for the Analysis of Gene-Expression Evolution: An Example Using Yeast Functional Genomic Data
Mol. Biol. Evol.,
January 1, 2005;
22(1):
40 - 50.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. C. Moore, S. R. Grant, and M. D. Purugganan
Molecular Population Genetics of Redundant Floral-Regulatory Genes in Arabidopsis thaliana
Mol. Biol. Evol.,
January 1, 2005;
22(1):
91 - 103.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|