|
|
|
|
Vol. 12, Issue 8, 1190-1200, August 2002
LETTER
|
| |
ABSTRACT |
|---|
|
|
|---|
The basic region-leucine zipper (B-ZIP) (bZIP)
protein motif dimerizes to bind specific DNA sequences. We have
identified 27 B-ZIP proteins in the recently sequenced Drosophila
melanogaster genome. The dimerization specificity of these 27 B-ZIP
proteins was evaluated using two structural criteria: (1) the presence of attractive or repulsive interhelical
g
e` electrostatic interactions and (2) the presence of
polar or charged amino acids in the `a' and `d' positions of the
hydrophobic interface. None of the B-ZIP proteins contain only
aliphatic amino acids in the`a' and `d' position. Only six of the
Drosophila B-ZIP proteins contain a "canonical"
hydrophobic interface like the yeast GCN4, and the mammalian JUN, ATF2,
CREB, C/EBP, and PAR leucine zippers, characterized by asparagine in
the second `a' position. Twelve leucine zippers contain polar amino
acids in the first, third, and fourth `a' positions. Circular
dichroism spectroscopy, used to monitor thermal denaturations of a
heterodimerizing leucine zipper system containing either valine (V) or
asparagine (N) in the `a' position, indicates that the V-N
interaction is 2.3 kcal/mole less stable than an N-N interaction and
5.3 kcal/mole less stable than a V-V interaction. Thus, we propose
that the presence of polar amino acids in novel positions of the `a'
position of Drosophila B-ZIP proteins has led to leucine
zippers that homodimerize rather than heterodimerize.
| |
INTRODUCTION |
|---|
|
|
|---|
Basic region-leucine zipper (B-ZIP) transcription
factors bind as dimers to sequence-specific DNA and regulate gene
expression. The transcriptional potential of B-ZIP proteins is often
regulated by posttranslational phosphorylation in response to cellular
signals (Hurst 1995
). The recent completion of the Drosophila
melanogaster genome sequence (Adams et al. 2000
) provides the
opportunity to identify the complete list of B-ZIP proteins in a
complex eukaryote. Previously, a genomewide analysis using the
Automated InterPro Motif Identification Resource identified a B-ZIP
domain in 29 genes in Drosophila (Rubin et al. 2000
). This
number compares with 31 B-ZIP proteins identified in the
Caenorhabditis elegans genome, 17 in the Saccharomyces
cerevisiae genome, 71 in the Arabidopsis thaliana genome
(Riechmann et al. 2000
), and 65 in the human genome (Tupler et al.
2001
). Knowing all the B-ZIP proteins in a genome allows us to predict
all the dimerization partners of a particular B-ZIP protein, something
that has eluded investigators in the past. A prediction of potential
dimerization partners of B-ZIP proteins should focus the efforts of
Drosophila geneticists as they examine possible dimerization
between B-ZIP containing genes.
When bound to DNA, B-ZIP monomers are long
-helices, the N-terminal
half binds in the major groove to sequence-specific double-stranded DNA, and the C-terminal half mediates dimerization to form a parallel leucine zipper coiled coil (Landschultz et al. 1988
; Vinson et al.
1989
; Ellenberger et al. 1992
) (Fig. 1).
The leucine zipper dimerization domain is typically composed of four to
five heptad repeats of amino acids, with the seven unique positions in
the heptad labeled `a', `b', `c', `d', `e', `f', and `g'
(McLachlan and Stewart 1975
). The `g', `a', `d', and `e'
positions are critical for dimerization stability and specificity. The
shorter leucine zippers have less protein sequence flexibility because
amino acids must be optimized for dimerization stability. Longer
leucine zippers allow better regulation of dimerization specificity
because they can contain amino acids that are suboptimal for stability
but favor interaction with a particular partner.
|
Amino acids in the `a' and `d' positions are typically hydrophobic
and are on the same side of the
-helix, creating a hydrophobic interface that contributes to dimerization stability (Landschulz et al.
1989
; Moitra et al. 1997
). Typically, the `d' position is occupied by
leucine and the `a' position by valine. An exception in B-ZIP leucine
zippers is the second heptad `a' position that often contains an
asparagine. Asparagine in the `a' position from one monomer can
hydrogen bond interhelically with asparagine in the `a' position of
the second monomer to promote dimerization and prevent higher order
oligomerization (Harbury et al. 1993
). In contrast, asparagine does not
form stable interactions with isoleucine in the `a' position of
leucine zipper proteins, preventing heterodimerization (Zeng et al.
1997
). Conversely, charged amino acids in the `a' position inhibit
homodimerization and promote heterodimerization (unpublished data from
Vinson group). An example is the Myc|Max leucine zipper, in which a
Myc homodimer is unstable because of an E in the `a' position.
The g and e positions of the leucine zipper flank the hydrophobic
interface and frequently contain charged amino acids (Cohen and Parry
1990
; Vinson et al. 1993
). X-ray structures of leucine zipper
coiled-coil proteins reveal interhelical interactions between oppositely charged amino acids in the g position and the following e` position in the dimer (O'Shea et al. 1991
; Glover and Harrison 1995
; Chen et al. 1998
; Lavigne et al. 1998
; Day and Alber
2000
). We refer to this interaction as g
e`; the prime (`)
indicates a residue on the second
-helix of the leucine zipper.
Interacting amino acids in the g and e` positions lie across
the hydrophobic interface such that their side-chain methylene groups
pack with amino acids in the `a' and `d' positions of the
hydrophobic core (Alber 1992
). The g
e` interactions between
oppositely charged amino acids are attractive and promote dimerization
(Vinson et al. 1993
; Krylov et al. 1994
; Zhou et al. 1994
; Krylov et
al. 1998
), whereas g
e` interactions between similarly
charged amino acids, for example, E
E or R
R, are repulsive and
inhibit homodimerization. For example, in the mammalian FOS protein,
repulsive glutamate g
e` interactions (E
E) prevent
homodimerization and thus help drive heterodimerization with JUN
(Nicklin and Casari 1991
; O'Shea et al. 1992
).
Twelve Drosophila melanogaster B-ZIP genes have been isolated,
including Vri (George and Terracol 1997
), sis-A (Erickson and Cline
1993
), crc (Hewes et al. 2000
), cap`n'collar (cnc) (Mohler et al.
1991
), giant (gt) (Capovilla et al. 1992
), slbo (Rorth and Montell
1992
), pdp1 (Zhang et al. 1990
), crebB-17A (Usui et al. 1993
), crebA
(Smolik et al. 1992
), A3-3 (Heitzeberg 1999
), kay, and Jra (Perkins et
al. 1988
, 1990
; Zhang et al. 1990
).
In this study, we have refined the estimate of the number of B-ZIP
proteins in Drosophila melanogaster to 27 members using sophisticated search strategies and have subsequently inspected each
potential B-ZIP protein for characteristics that affect dimerization specificity. Mammalian counterparts were identified for 21 Drosophila B-ZIP proteins, and conservation both throughout
the entire protein and within the B-ZIP domain was evaluated. 13 Drosophila melanogaster leucine zippers contain a conserved
asparagine in the second heptad `a' position, as observed in
mammalian B-ZIP proteins. Eight proteins contain asparagines in the
first, third, or fourth heptad `a' positions. We quantitate
experimentally that the heterotypic interaction between asparagine and
valine in the `a' position is less stabilizing than either homotypic
interaction. Coupling this additional insight into dimerization
specificity with our knowledge of g
e` interactions, we have
predicted dimerization partners among the Drosophila
melanogaster B-ZIP proteins.
| |
RESULTS |
|---|
|
|
|---|
Identifying Drosophila B-ZIP Domains
We searched for the B-ZIP protein motif (Vinson et al. 1989
) in the
recently completed Drosophila melanogaster DNA genome sequence
(Adams et al. 2000
). Previously, B-ZIP proteins have been identified in
the yeast genome using a query based on the most conserved part of the
B-ZIP motif, the basic region (Fernandes et al. 1997
). We have used a
modification of this query (Methods) to identify B-ZIP proteins in
Drosophila melanogaster. Eighteen potential B-ZIP proteins
were identified after searching the 14,100 predicted
Drosophila open reading frames. Because the query represents the basic region without a leucine zipper, each of the 18 sequences (gi# 7290135, 7290320, 7290774, 7291080, 7291250, 7293451, 7294270, 7296965, 7298587, 7298780, 7300970, 7301182, 7301826, 7302191, 7302252, 7302350, 7302542, and 7303798) was inspected for an amphipathic
-helix located at an invariant distance in the C terminal direction from the basic region. Three sequences (gi7291080,
gi7302191, and gi7302542) were discarded based on the absence of a
satisfactory leucine zipper or basic regions, or the presence of
-helix breaking prolines within the motif.
To retrieve additional B-ZIP domains that may not conform precisely to
the basic region query, we chose four sequences at random and subjected
them to PSI-BLAST analysis (Altschul et al. 1997
)
performed to convergence. These were gi7290320 and gi729077, both PAR
family members, gi7290135, an ATF3 homolog, and gi7298028. All hits
with E values above the threshold of 0.001 were compared with the
original set of 15 B-ZIP sequences. Eleven new sequences were
identified (gi7291773, 7294768, 7295189, 7296431, 7297639, 7298028, 7300452, 7302966, 7298025, 7298026, and 7296993). After discarding one
sequence (gi7296431) based on the absence of a satisfactory basic and
leucine zipper region, the expanded set contained 25 sequences.
Three of the 15 original sequences were not reidentified by PSI-BLAST analysis and were thus considered the most distinctive. To identify other less related members of the B-ZIP family, we then used these three outlying sequences (gi7298587, gi7301182, and gi7302350) as queries in PSI-BLAST searches. PSI-BLAST analysis of gi7301182 retrieved only itself, and gi7298587 retrieved known sequences; however, gi7302350 retrieved five novel sequences. Multiple alignment of these sequences allowed four of the five new sequences to be eliminated based on the absence of satisfactory zipper or basic regions, leaving a total of 26 sequences in the set.
A separate regular expression query was performed against the same database, this time using the B-ZIP "regular expression" that contains both basic region and leucine zipper constraints (see Methods). Nine sequences were identified (gi7290320, 7290774, 7292623, 7293451, 7294270, 7295189, 7300970, 7302966, and 7303798), but only one of these (gi7292623, SisA) was a new addition to the existing set of B-ZIP motif proteins. To retrieve distant relatives of SisA, we used it as a query in a PSI-BLAST search. However, the search retrieved only SisA itself. Thus, the final tally for the number of B-ZIP proteins in Drosophila melanogaster is 27.
Mammalian Homologies
Mammalian counterparts were identified for 21 of the 27 Drosophila B-ZIP sequences using BLAST analysis
with the B-ZIP region as the query. Table 1
presents the Drosophila name, synonyms, and the most related
mammalian B-ZIP protein. In most cases, the most closely
related human and mouse sequences are listed. This listing is intended
to be representative rather than exhaustive.
|
Six B-ZIP proteins score the highest matches in reciprocal queries against the databases and also align over >50% of the length of the sequence and have been tentatively designated orthologs (Table 1). These include Pdp1, an HLF ortholog; CG3136, an ATF4 ortholog; CG12850, an ATF2 ortholog; CG9954 and CG10034, both possible MAF orthologs; Jra, a Jun homolog; and CG6272, a C/EBP homolog.
Table 1 also presents several measures of the relatedness between a Drosophila B-ZIP protein sequence and the closest related human protein sequence. The existence of an identical mouse counterpart to the human sequence is indicated by a # sign in column 4, showing evolutionary conservation within vertebrates. To represent conservation between the homologous Drosophila and human sequences, we calculated % identities for (1) the basic region, (2) the first five heptads of the leucine zipper region, and (3) the `g', `a', `d', and `e' positions of the leucine zipper that are critical for dimerization stability and specificity. The basic regions are more highly conserved than the leucine zipper. Within the leucine zipper, the `g', `a', `d', and `e' positions are more conserved than the entire leucine zipper, indicating that the determinants of dimerization specificity were actively conserved during the divergence of the insects and mammals. CREB is the most conserved B-ZIP domain, with 75% conservation throughout the leucine zipper region.
Figure 2 presents a phylogenetic analysis of an alignment of the Drosophila B-ZIP proteins and their mammalian counterparts based only on their B-ZIP motif protein sequence. Each Drosophila sequence clusters very closely with its mammalian counterpart. This indicates that the Drosophila B-ZIP proteins are more closely related to their human counterpart than they are to other Drosophila B-ZIP proteins. This is not true for the four PAR proteins that are more closely related to each other than they are to any human protein. The five Drosophila sequences lacking any mammalian relative cluster together. These are unusual B-ZIP sequences and the question of whether they are true B-ZIP proteins is considered later in the discussion.
|
Alignment of the Protein Sequences of the 27 Drosophila B-ZIP Domains
The protein sequence alignment of the 27 identified Drosophila
melanogaster B-ZIP motifs is shown in Figure
3. The sequences begin four amino acids at
the N terminus of the conserved asparagine (N) in the basic region
(Vinson et al. 1989
) and continue to the natural C terminus of the
protein or until the leucine zipper contains a proline or glycine that
is predicted to terminate the
-helix. We have highlighted `a' and
`d' positions that contain polar or charged amino acids (black boxes)
and the g
e` interactions are color coded (green, orange,
blue, or red), as described in the figure legend. Figure
4 presents a schematic of a coiled-coil
dimer that graphically describes the color code used in Figure 3. A
similar analysis has been done for the 53 identified human
B-ZIP proteins (Vinson et al. 2002
).
|
|
Surface Charge of Leucine Zippers: `g' and `e' Interactions
We have observed pronounced preferences in the frequency of charged
and polar amino acids in the `a', `d', `e', and `g' positions for each heptad of the Drosophila B-ZIP proteins (Table
2). For the `g' and `e' positions,
charged amino acids are concentrated in the first four heptads. The
fifth heptad rarely contains either attractive or repulsive
g
e` interactions and may represent a natural limit for the
length of the dimerization domain of B-ZIP proteins. An exception to
this is CG9415, which contains attractive g
e` interactions
in the fifth, sixth, and seventh heptads but not in the first four
heptads. An additional indication of the natural limit of the leucine
zipper is the high frequency of
-helix breaking prolines and
glycines in the fifth and sixth heptad of the leucine zippers (Fig. 3).
|
Of the 216 `g' and `e' positions in the first four heptads of these
27 proteins (4 heptads × 2 positions per heptad × 27 sequences), 54%
are occupied by a charged residue, equally distributed between basic
and acidic amino acids. There are approximately twice as many arginines
as there are lysines. The shorter aspartic acid is significantly
underrepresented relative to glutamic acid, as has been observed for
other coiled-coil proteins (Cohen and Parry 1990
). This likely reflects
the fact that aspartic acid is over 1.0 kcal/mole less stabilizing than
is glutamic acid in the `g' position (Krylov et al. 1998
).
Of the 108 possible g
e` interactions in the first four
heptads, 25% are attractive and only 6% are repulsive. Attractive g
e` interactions show a bias in the orientation of the amino
acids. In the first heptad, all attractive g
e` interactions have the same polarity, the `g' position contains a basic amino acid,
and the `e' position contains an acidic amino acid (e.g., R
E or
K
E). In the second heptad, the orientation of the g
e` interaction is reversed (e.g., E
R, E
K or D
K). The PAR family proteins exemplify this observation. In the third and fourth heptad, both orientations of attractive g
e` interactions are
observed. Only one B-ZIP protein, CG17836, has both attractive and
repulsive g
e` pairs.
Forty-eight percent of the g
e` interactions contain only a
single charged amino acid. Leucine zippers with incomplete g
e` interactions will have more promiscuous dimerization activity. They do not contribute to the stability of the homodimer. However, in a heterodimer, they can form complete attractive
g
e` interactions and contribute to stability.
Polar or Charged Amino Acids in the `a' and `d' Hydrophobic Interior
All of the Drosophila B-ZIP proteins contain either a polar or charged amino acid in at least one `a' or `d' position in the first four heptads of the leucine zipper. The frequency of polar or charged amino acids in the `a' and `d' position is shown in Table 2. Nineteen proteins contain a polar (N, T, H) and 3 contain a basic (K, R) amino acid in the `a' position of the second heptad, as is frequently observed in mammalian B-ZIP proteins. However, 13 Drosophila B-ZIP proteins (e.g., Pdp1 and CG3136) contain polar amino acids in the `a' position of the first, third, and fourth heptads. These may help prevent heterodimerization with proteins that do not contain a polar amino acid in this position. Charged amino acids are found at the `a' and `d' positions of the leucine zipper in nine B-ZIP proteins, being more frequent in the `a' position. These amino acids should discourage homodimerization. There are more basic amino acids than acidic amino acids in the `a' and `d' positions.
The Energetics of the Valine-Asparagine Interaction in the `a' Position
The large number of polar amino acids in the `a' position of
Drosophila B-ZIP leucine zippers prompted us to examine
whether these amino acids can affect dimerization specificity. Because `a' position amino acids interact interhelically with the same `a'
position of the opposite monomer of the dimer, we needed to use a
heterodimerizing system to address dimerization specificity. We
previously generated a heterodimerizing system that forces dimerization
of leucine zippers (Krylov et al. 1994
). This system contains one
monomer in which homodimerization is inhibited by repulsive acidic
g
e` interactions containing glutamic acid in the `g' and
`e' positions (E
E) of the third and fourth heptads (previously
named EE34). We refer to this protein as B-EE34(V): the B represents the basic region and the V highlights the valine in
the third `a' position, the amino acid that is changed in this study.
The second monomer in the heterodimerizing system contains arginine in
the `g' and `e' positions of the third and fourth heptads,
resulting in repulsive basic g
e` interactions (R
R) in the
potential homodimer (previously named RR34). We refer to this
protein as RR34(V). We replaced the basic region of
RR34(V) with a synthetic acidic amphipathic extension (Krylov
et al. 1995
; Olive et al. 1997
; Moll et al. 2000
) to produce
A-RR34. The acidic amphipathic extension of A-RR34
heterodimerizes with the basic region of EE34,
increasing the stability of the EE34|A-RR34
heterodimer by 2.5 kcal/mole (Moll et al. 2000
).
We compared the thermal stability of three heterodimers with either
valine or asparagine in the third `a' position. The first heterodimer
had valine in both third heptad `a' positions, the second had
asparagine in both third heptad `a' positions, and the third had a
valine in the third `a' position of one monomer and an asparagine in
the second monomer of the dimer. The monomers with asparagine in the
third `a' position are B-EE34(N) and A-RR34(N). Comparing the stability of a B-EE34(V) and
A-RR34(N) mixture with the stabilities of
B-EE34(V)|A-RR34(V) and
B-EE34(N)|A-RR34(N) allowed us to determine
whether stability of the valine-asparagine interaction contributes to
dimerization specificity. Table 3 presents
the thermodynamic parameters derived from thermal denaturations, as
assayed by circular dichroism spectroscopy at 222 nm, of these heterodimers, assuming this is a two-state denaturation
process. For the four homodimer denaturations, we find that
the valine is more stabilizing than asparagine, as has been observed in
another coiled-coil system (Wagschal et al. 1999
). Analytical
ultracentrifugation of the three heterodimer samples in Table 3
(EE34(V)|A-RR34(V), EE34(N)|A-RR34(N), and
EE34(V)|A-RR34(N)) indicate they are dimers (data
not shown). The three mixtures are more stable than the four single
proteins, indicating that the mixtures form heterodimers. The
EE34(V)|A-RR34(V) heterodimer that produces a
valine-valine interaction is 3.0 kcal/mole more stable than the
EE34(N)|A-RR34(N) heterodimer that produces the
asparagine-asparagine interaction. This value is consistent with the
2.8 kcal/mole observed in another guest-host leucine zipper
system comparing valine-valine interactions with
asparagine-asparagine interactions in the `a' position
(Wagschal et al. 1999
). The EE34(V)|A-RR34(N)
heterodimer that produces an asparagine-valine interaction is 2.3 kcal/mole less stable than an asparagine-asparagine interaction and
5.3 kcal/mole less stable than a valine-valine interaction. These data
show that the presence of an asparagine will disfavor
heterodimerization with valine and instead drive homodimerization.
|
Predicted Dimerization Partners
To predict the dimerization partners for the 27 B-ZIP proteins from
Drosophila, we used a two-step approach. In step 1, we examined the number of attractive and repulsive interhelical
g
e` interactions in the leucine zipper of each homodimer and
the 26 possible heterodimers. In step 2, we examined the amino acid
composition of the `a' and `d' positions. The presence of polar or
charged amino acids in these positions caused us to modify our
predictions of dimerization specificity based on the g
e`
interactions determined in step 1.
In Figure 3, the B-ZIP proteins are clustered by the number of
attractive minus the number of repulsive g
e` interactions in
the homodimer. These values range from three pairs of attractive g
e` interactions for the PAR-like proteins to
2 pairs for
the FOS-like protein.
Table 4 lists the predicted dimerization
partners for representatives of each cluster. The number of attractive
interactions for each predicted pair is shown (column 4) and the basis
for each prediction is summarized (column 5).
|
Eight B-ZIP proteins have no attractive or repulsive g
e`
interactions. The lack of g
e` interactions and the presence
of a large number of polar or charged amino acids in the `a' and
`d' positions make prediction of dimerization partners for this set
difficult. Only SisA and CG9415 have been listed.
| |
DISCUSSION |
|---|
|
|
|---|
Previously, a computational annotation of the Drosophila
melanogaster genome identified 29 genes containing the B-ZIP motif (Rubin et al. 2000
). We have reexamined these data and identified 27 members, including 7 members not identified by the automated InterPro
Motif Identification Resource. Twenty-one Drosophila B-ZIP
proteins have B-ZIP regions that are highly related to mammalian B-ZIP
proteins, including the homodimerizing CREB, C/EBP, and PAR proteins,
and the heterodimerizing FOS, JUN, MAF, and NRE2 proteins. Searches
between Drosophila and vertebrate B-ZIP proteins identified
six that are conserved in both the B-ZIP domain and the rest of the
protein. These B-ZIP proteins are putative orthologs and are likely to
perform evolutionarily ancient functions.
Automated Versus Manual B-ZIP Protein Identification
Automated annotation by the InterPro Motif Identification Resource
(http://www.ebi.ac.uk/proteome) (Apweiler et al. 2001
) identified 29 genes containing the B-ZIP motif (Rubin et al. 2000
). We have
reexamined these data and the Drosophila genome sequence and
identified 27 B-ZIP genes, including 7 new members not previously annotated as B-ZIP proteins.
Eight proteins were identified as B-ZIP proteins by the InterPro Motif Identification Resource that do not pass our criteria of what constitutes a B-ZIP protein. One, CG17894 (gi10726715), is identical to the cnc protein (gi73000970) but has an additional 275 amino acids at the N terminus. The remaining seven proteins in the Interpro listing (CG6129, CG18266, CG9274, CG2848, CG18553, CG11774, and CG11745) are not canonical members of the B-ZIP family based on the following criteria. BLAST analysis using each of these proteins as queries failed to identify any known B-ZIP proteins in the protein database (the search was restricted to Drosophila proteins). A search of each protein against the Conserved Domain Database (CDD, National Center for Biotechnology Information [NCBI]) failed to identify the B-ZIP domain, although other domains were identified. And finally, five of the seven Interpro hits fail to meet the significance thresholds set by the databases for "true" hits with B-ZIP signatures and are therefore "false". Thus, manual query methods for identification of B-ZIP proteins identified six bona fide proteins not found by automated domain identification methods. Furthermore, automated methods identified several putative "false" positives.
Noncanonical B-ZIP Proteins
Both the manual and automated methods that are used to identify the
complete set of B-ZIP proteins in a genome are constrained by our
current lack of understanding of these proteins. The most efficient
method of identifying B-ZIP proteins that are similar to the
well-characterized mammalian B-ZIP proteins is to identify a canonical
basic region and then subsequently identify an amphipathic
-helix
placed at an invariant distance in the C-terminal direction from the basic region. This approach is flawed by its failure to
recognize the possible existence of a class of B-ZIP-like transcription factors in which dimerization is mediated by a leucine zipper but in
which DNA binding is mediated by a novel or less-conserved motif. For
example, mammalian CHOP-10 (Gadd 153) contains a C/EBP-like leucine
zipper but a divergent basic region containing two prolines, and it was
initially thought not to bind DNA (Ron and Habener 1992
).
C/EBP|CHOP-10 heterodimers, however, are able to bind novel DNA
elements (Ubeda et al. 1996
). These types of B-ZIP proteins are
difficult to identify because there are so many amphipathic
-helices
in the genome. Another example may be CG11774 (gi7299089), one of the
proteins identified by InterPro but not by our manual analysis. This
sequence, and others not discussed, possesses a canonical zipper with
good g
e` salt bridge interactions, but lacks a convincing
basic region.
Other proteins have an obvious basic region but an ambiguous
amphipathic
-helix. For example, there are putative monomeric proteins containing the basic region that bind to DNA. The skn-1 gene
in C. elegans (Bowerman et al. 1992
) has no dimerization motif
but does have a C-terminal four helix bundle to hold the extended
-helical basic region (Rupert et al. 1998
). Additionally, the skn
basic region has an N-terminal extension of the basic region that helps
to stabilize DNA binding (Carroll et al. 1997
). Only experiments will
determine whether these noncanonical sequences define novel X-ZIP or
B-X variants of the B-ZIP transcription factor family or whether they
are simply a subset of coiled-coil or basic region-containing proteins.
Structural Features of the B-ZIP Motif
There are several structural features that appear general to the
leucine zipper domain of most B-ZIP motifs in the Drosophila melanogaster genome. The leucine zipper is generally four heptads long. In Drosophila, attractive g
e` pairs in the
first heptad are always basic
acidic, whereas in the second
heptad, attractive g
e` pairs are reversed to
acidic
basic. Both orientations are observed in the third
heptad, whereas the fourth heptad g
e` pairs are
acidic
basic. Arginine is twice as common as lysine in the
`g' and `e' positions. A double-mutant thermodynamic cycle analysis
of g
e` interactions measures a coupling energy, indicative
of amino acid interactions, of
0.5 kcal/mole for the E
R
interaction compared with
0.3 kcal/mole for the E
K interaction.
This indicates that R confers more specific dimerization than does K
(Krylov et al. 1998
). The preference of R over K in the `g' and `e'
positions indicates that this position is used to increase dimerization
specificity instead of stability.
All Drosophila B-ZIP leucine zippers contain either a polar or
a charged amino acid in an `a' or `d' position. A nonaliphatic amino acid in the second heptad `a' position is observed in 25 of the
27 Drosophila melanogaster B-ZIP proteins, with asparagine occurring 13 times. Asparagine in the `a' position has been shown to
limit higher-order oligomerization in the yeast B-ZIP protein GCN4
(Harbury et al. 1993
), but it remains obscure why the second heptad
`a' position is so often used for this function. Eight B-ZIP proteins
with asparagines in the first, third, or fourth heptad `a' position
prompted us to determine the energetics of asparagine to dimerization
specificity using a heterodimerizing leucine zipper system. The data
indicate that asparagine prevents heterodimerization with valine. This,
it appears that in the Drosophila melanogaster genome,
asparagine has also been used in the first, third, and fourth heptad
`a' position to create leucine zippers that prefer to homodimerize
and not interact with other leucine zippers that contain aliphatic
amino acids in the `a' position, such as Pdp1, CG3136, CrebA, CG954,
and CG9415.
Based on our knowledge of the effects of amino acids in the `g', `a', `d', and `e' positions of the leucine zipper on dimerization stability and specificity, we have predicted the potential dimerization partners for the Drosophila melanogaster B-ZIP proteins. In vertebrates, the CREB, C/EBP, or PAR families have multiple members that can homodimerize and heterodimerize within the subfamily. In contrast, in Drosophila melanogaster, each of these families consist either of a single member or multiple members that we predict will only homodimerize and not heterodimerize, even within the subfamily.
Homodimerizing Proteins: The PAR Proteins
Four Drosophila proteins share the structural features of
PAR proteins. PAR appears to represent the prototypical leucine zipper
sequence found throughout metazoans. In the three known vertebrate PAR
family proteins, the first four heptads of the leucine zipper have
identical attractive g
e` interhelical interactions, R
E,
E
R, E
R, E
R. They also have similar hydrophobic interfaces with
an asparagine in the second heptad `a' position. These mammalian
proteins are known to form homodimers and to heterodimerize within the
family (Hunger et al. 1992
; Inaba et al. 1992
).
An examination of the PAR-related B-ZIP proteins in Drosophila indicates that two structural strategies have been used to generate new leucine zippers that homodimerize but do not heterodimerize with other PAR family members. One strategy is illustrated by Pdp1 that contains an asparagine in the `a' position of the fourth heptad in addition to the asparagine in the `a' position of the second heptad. We have shown in this study that an asparagine in the `a' position prevents heterodimerization with valine. An asparagine-valine interaction in the `a' position is 2.3 or 5.3 kcal/mole less stable than an asparagine-asparagine or a valine-valine interaction, respectively. Thus, Pdp1 will not interact with the other PAR-like proteins that contain an aliphatic amino acid in this position. The large number of polar amino acids in the `a' position of leucine zippers of Drosophila melanogaster indicates that this mechanism has been used to generate new homodimerizing leucine zippers by changing a single amino acid.
A second strategy to produce new homodimerizing leucine zippers is seen
in CG4575 in which the third g
e` pair is reversed from E
R
to R
E. We calculate this would destabilize heterodimerization with
PAR proteins containing an E
R salt bridge in the third position by
2.9 kcal/mole (Krylov et al. 1998
). Reversal of a single salt bridge in
a vertebrate PAR family protein has been shown experimentally to
prevent heterodimerization (Moll et al. 2000
). Interestingly, in a
heterodimer, the energetic cost of combining leucine zippers with an
E
R and an R
E salt bridge is similar to the cost of forming an
asparagine-valine pair, indicating that either strategy is capable of
producing a new homodimerizing leucine zipper.
Conservation of B-ZIP Proteins Between Drosophila and Humans
Comparisons between Drosophila and vertebrate B-ZIP
proteins identified six proteins that are conserved in both the B-ZIP domain and the rest of the protein. These B-ZIP proteins are putative orthologs and are likely to perform evolutionarily ancient functions. Twenty-one Drosophila B-ZIP proteins have B-ZIP regions that
are highly related to mammalian B-ZIP proteins, including the
homodimerizing CREB, C/EBP, and PAR proteins and the heterodimerizing
FOS, JUN, MAF, and NRE2 proteins. We have evaluated whether amino acids that we predict are critical for regulating dimerization specificity in
the Drosophila B-ZIP proteins are conserved in the human
homolog. The four positions critical for regulating dimerization
specificity (`g', `a', `d', and`e') are more conserved than the
entire heptad, indicating that dimerization specificity is actively
selected for during evolution. For example, the fourth heptad `a'
position asparagine and the third heptad basic-acidic g
e`
pair are conserved throughout evolution in CrebA, the
Drosophila homolog of the Oasis family in humans. The
histidine found in the fifth heptad of Jra, A3-3, and kay are
conserved in their human homologs JUN, ATF3, and FOS.
Six putative Drosophila B-ZIP proteins do not have human
counterparts. They also do not have any attractive or repulsive
g
e` pairs as is observed for canonical leucine zippers. This
indicates either that they are not real B-ZIP proteins or are a group
of new B-ZIP proteins that have evolved in the insects. The observation that sisA, an insoluble protein, interacts in a yeast two-hybrid screen
with two proteins, CG16813 and CG16815 (J. Erickson, pers. comm.),
which we have independently identified as putative Drosophila B-ZIP proteins without human homologs, indicates that these proteins heterodimerize, as would be expected for B-ZIP proteins. The function of these proteins in Drosophila sex determination my represent a new function for B-ZIP proteins in the insects.
Dimerization Partner Predictions
Based on our knowledge of the effects of the `g', `a', `d',
and `e' positions of the leucine zipper on dimerization stability and
specificity, we have predicted the potential dimerization partners for
the Drosophila melanogaster B-ZIP proteins. It is likely that
dimerization specificity is influenced by other factors in addition to
the two simple criteria we have taken into account. For example, the
prediction of dimerization partners is complicated by the fact that the
DNA sequence bound by B-ZIP dimers can alter dimerization preference
(Hai and Curran 1991
). Nonetheless, it seems reasonable to explore the
idea that these criteria may have some predictive value. For example,
our simple rules lead to predicted interactions between Vri, a member
of the C/EBP family, and kay, a FOS family member. C/EBP-FOS
heterodimers have been observed (Hsu et al. 1994
; Ubeda et al. 1996
).
Likewise, the well-known interaction between JUN and FOS is also
predicted by these rules. As our understanding of the energetics of
leucine zipper dimerization increases, more valuable predictions will
be possible. In the absence of other predictive information, these
rules may be a practical starting place in formulating a hypothesis for
experimental analysis of possible dimerization partners for the
Drosophila B-ZIP proteins.
| |
METHODS |
|---|
|
|
|---|
Pattern Matching
A database of the translated Drosophila melanogaster
genome sequence was created from the data released by Celera (dros_na) and deposited into GenBank. This database consists of 14,100 open reading frames. Two types of regular expressions were used to query the
database using the gref utility of the SEALS package (NCBI). The
"B-ZIP" regular expression
([RKFS]XXXNXX[ASYK][AVK][RKASQNE][SCFYL] R[RKAIFDNQ]XXXXXXXX[LIVTS]XXX[VRATSC] XX[LYVM]XXX[NKVR]XX[LIY]) was generated from a multiple alignment of representative B-ZIP proteins that included VBP, C/EBP
, FOS, P45, CREB, ATF3, GCN4, P18,
Sis-A, Cnc, ATF2, and ATF4. Every residue represented at a given
position was included in the regular expression. The "BASIC" regular expression
([RQK][NLE][TRK]X[ASY][ASQ]XX[CSFYG][RDL] X[RK][RKL]) was modified from Fernandes et al. (1997)
. This expression corresponds to the basic region of 30 B-ZIP proteins from various organisms. It was
modified at position 2 to include the L found in Yap8p from S. cerevisiae, as well as the more commonly found N. Positions 2, 3, 6, 9, 10, and 13 were modified to include additional residues based on
an alignment of CREB sequence from Drosophila melanogaster.
BLAST and PSI-BLAST Analyses
BLAST analysis (Altschul et al. 1990
) was performed on
the NCBI web server using short B-ZIP sequences as queries. The
ungapped parameter was used to force global alignments. Filtering was
turned off. Other options were left at their default settings. PSI-BLAST analysis (Altschul et al. 1997
) was done using the SPLAT routine of the SEALS (Walker and Koonin 1997
). The analysis was performed to convergence with an include threshold (-h) of 0.001.
Multiple Alignment and Phylogenetic Tree Analysis
Multiple alignments were performed with Clustal W
(Higgins et al. 1996
) [Thompson et al. 1994
] using the
default options. The pairwise ordering mode was set to fast and
approximate. Phylogenetic analysis was performed using
TreeView (v. 1.6.1;
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html).
Proteins
The sequence of the 96 amino acid EE34 (Krylov et al.
1994
), also named EE34(V) in this manuscript, is
ASMTG GQQMGRDP-LEE-KVFVPDEQKDEKYWTRRKKNNVAAKRSRDARRLKENQTI RAAFLEK
ENTALRT E(V)AELEK EVGRCEN IVSKYETRYGPL. The leucine zipper is separated
into heptads, as presented in Figure 3. The valine in parentheses was
changed to N to create EE34(N). The first 13 amino acids are
from
10, the next three amino acids are a cloning linker, and the
remaining 80 amino acids comprise the basic region followed by a
leucine zipper of EE34. The protein sequence of A-RR34(V) is ASMTGGQQMGRDP-LEE-
LEQRAEELARE NEELLEKEAEELEQENAELE RAAFLEK ENTALRT R(V)AELRK
RVGRCRN IVSKYKYETRYGPL. The valine in parentheses was changed to N to
create A-RR34(N). The LE in bold is the Xho I site
that is the border between the leucine zipper and the N-terminal acidic
extension. Proteins were expressed in E. coli using the T7
IPTG-inducible system and purified as described previously (Olive et
al. 1997
).
Circular Dichroism
Circular dichroism (CD) studies were performed using a Jasco J-720 spectropolarimeter. All protein stock solutions were in 12.5 mM potassium phosphate (pH 7.4), 150 mM KCl, and 0.25 mM ethylenediamine tetraacetic acid. One millimolar dithiothreitol and 2 µM of protein sample in 1 ml stock buffer was heated to 65°C for 20 min, cooled to room temperature for 5 min, and added to a 5-mm rectangular CD cell.
Thermodynamic Calculations
Melting temperature (Tm) and enthalpy (
H) values were determined
from denaturation curves, assuming a two-state equilibrium dissociation
of
-helical dimers into unfolded monomers using
Cp of
2.04
kcal/mole/°C, as described previously (Krylov et al. 1997
).
G
values are reported at 37°C.
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html; tree-drawing software by Rod Page (University of Glasglow) for displaying phylogenies. Programs can be downloaded.
http://www.ebi.ac.uk/proteome; Proteome Analysis database for comprehensive statistical and comparative analyses of the predicted proteomes of fully sequenced organisms.
| |
ACKNOWLEDGMENTS |
|---|
We thank Jim Erickson for communicating unpublished observations.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Corresponding author.
E-MAIL: vinsonc{at}dc37a.nci.nih.gov; FAX (301) 496-8419.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.67902.
| |
REFERENCES |
|---|
|
|
|---|
-helical coiled coils and bundles: How to design an
-helical protein.
Proteins
7:
1-14[CrossRef][Medline].