|
|
|
Published online before print
May 16, 2002, 10.1101/gr.204902. Article published online before print in May 2002
Vol. 12, Issue 6, 985-995, June 2002
METHODS
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Messenger RNAs that have the stability determinants,
adenylate uridylate-rich elements (AREs), in their 3' untranslated
region (UTR) code for key products that regulate early and transient biological responses. We used a computational laboratory approach for
amplification of large, including full-length, protein-coding regions
for ARE genes. Statistical analysis of the initiation regions in the 5'
UTR of ARE-mRNAs was performed. Accordingly, several 5'
primers and a single universal 3' primer that targeted the initiation
consensuses and ARE regions, respectively, were designed. Using
optimized conditions, the primers were able to enrich and amplify large
protein-coding regions for the ARE gene family. The selective
amplification of ARE cDNAs was verified using specific polymerase chain
reactions (PCRs) to known ARE mRNA molecules and monitoring the
abundance of the non-ARE
-actin signal. A mini-library from the
amplified ARE products was constructed for further confirmation of ARE
selection. Distinct ARE amplified cDNA pools were selectively generated
by distinct 5' primers. The biological utility of the method was shown
with differential display. The up-regulation of several ARE-mRNAs,
including the full-length coding region of the small inducible cytokine
A4 (SCYA4) gene, was shown in endotoxin-stimulated monocytic
cells. The integrated computational and laboratory approach should lead
to enhanced capability for discovery and expression analysis of early
and transient response genes.
| |
INTRODUCTION |
|---|
|
|
|---|
A subset of the genome that is essential in cellular
growth and in early and transient response to exogenous agents such as inflammatory inducers, growth stimuli, stress, and microbes is the
adenylate uridylate (thymidylate)-rich element (ARE)-containing gene
family. Our recent analysis showed that the ARE-gene family encodes a
large number of previously unrecognized ARE-mRNAs and constitutes as
much as 8% of human mRNAs (Bakheet et al. 2001
). In addition, the
ARE-gene family contains functionally diverse proteins that mediate
different biological processes including cell growth and
differentiation, transcription, innate immune response, inflammation,
signal transduction, and many others (Bakheet et al. 2001
). A common
trait of the ARE-mRNAs is that they are expressed early and
transiently. In a cDNA microarray study using B cell lymphoma and
peripheral blood mononuclear cells (Lam 2001
), it was concluded that
our computationally extracted ARE motif (Bakheet et al. 2001
) was
preferentially found in the most unstable mRNA (<2 hr) and observed
with decreasing frequency in stable mRNAs (>8 hr). This is opposite to
that of non-ARE genes in which stable mRNA constituted the majority
(60%) of mRNAs studied (Lam 2001
). Stabilization of the ARE mRNAs can
cause prolonged responses that may subsequently lead to diseased
states. It has been shown that certain AREs act as instability
determinants (Lagnado et al. 1994
; Chen and Shyu 1995
). For instance,
the stable
-globin mRNA was rendered unstable when its 3'
untranslated region (UTR) was replaced with the GMCSF multiple ARE 3'
UTR (Shaw and Kamen 1986
), whereas the unstable interleukin-1
mRNA
was rendered stable when AREs were removed (Kastelic et al. 1996
).
Despite the accumulating evidence of the functional role of AREs in
mRNA stability, the repertoire of the ARE genes and their regulatory
pathways remains largely unknown.
Current approaches in gene discovery and expression profiling methods
have several limitations. Many methods yield partially informative
sequence data such as expressed sequence tag (EST) sequencing (Adams et
al. 1991
), degenerate PCR, serial analysis of gene expression (SAGE)
(Velculescu et al. 1995
), and conventional differential display (Liang
and Pardee 1992
). Also, there are methods that require previous
presence of sequences such as cDNA and oligonucleotide microarrays
(Duggan et al. 1999
) and yield overwhelming technical and analytical
tasks that arise from genome-wide analysis. In addition, these
techniques are biased toward a certain threshold of mRNA abundance.
Gene prediction approaches from the human genome project using computer
programs have several limitations, such as the variable degree of exon
locations and accuracy and the fact that cDNA clones are not available
for study of the protein function. Thus, strategies are needed to
address these limitations. One solution is the use of bioinformatics
approaches to facilitate laboratory methods for targeting informative
protein-coding regions. In this paper, we targeted the ARE gene family,
a subset of the genome that is functionally and structurally related.
This exercise was facilitated by different bioinformatics programs that
included database sequence retrieval, UTR assembly and alignment, and
statistical analysis. A system of gene discovery and expression
analysis integrated with computational means leading to the
amplification of the ARE-mRNA repertoire and its full-length
protein-coding sequence is presented here.
| |
RESULTS |
|---|
|
|
|---|
ARE mRNA Sequence and Initiation Site Analysis
A total of 605 ARE-mRNA sequences that include 3' UTR, full-length
CDS, and at least 10 bp of 5' UTR were obtained from the AU-rich
element-containing mRNA database (ARED) (Bakheet et al. 2001
) by use of the Assemble program (GCG). The initiation context sequences in the 5' UTR, that is, those that flank the start
codon, ATG, in the ARE-mRNA database were analyzed. It has been
reported that the initiation regions contain conserved elements that
are important in translation (Kozak 1987a
,b
). Thus, we chose this
region to design 5' primers for use in a PCR-based protocol. Sequences
were divided into 16 subsets by using the formula, NATGN, where N = A
or C or G or T. This is followed by alignment of the truncated 5' UTR
(
7 bp, ATG, +2 bp). Sixteen consensus patterns at the certainty level
of 75% at each position were derived from the alignment (Table
1). The overall consensus initiation site in the ARE-mRNAs was SSMAMSATGRM at 50% certainty level at each position.
|
Statistical analysis of the 16 10-mer (
6 ATG, +1) consensus sequences
was performed (Table 2). The most common
consensus in initiation regions was Cg consensus VVVVR SCATGGM (Table
2), which occurs in 22% of all ARE-mRNA sequences analyzed. Other frequent initiation consensuses were Ca, Ag, and Gg (Tables 1 and 2);
each accounts for 9% to 10% of all ARE mRNAs. Not all consensuses
were unique to the initiation regions; depending on which consensus,
there were varying degrees of internal sites in addition to the
initiation region. The most common consensus sequence around any ATG
was Aa consensus (Tables 1 and 2) that exists in 35% of the entire
ARE-mRNA molecules, whereas the least occurring consensus sequences
were those that were flanked by T base upstream of ATG, for example,
Ta, Tc, Tg, and Tt consensuses (Table 2). The highest proportion of
consensus in initiation regions in any subset was Gc consensus in which
71% of the sites (initiation plus internal) were initiation sequences.
The consensus site per mRNA (number of total consensus/number of mRNAs,
Table 2) ranged from 1.0 to 1.65.
|
Ten nucleotides from each consensus sequence were used for
incorporation in the 5' primers. The 5' primers were designed with two
parts: one variable part of 10-mer, which incorporated consensus sequence (
6 bpATG1 + bp; Table 1), whereas the core part contained fixed sequences. Thus, each primer of the 16 primers (Tables 1 and 2)
together with the single 3' ARE primer targets a subset of ARE cDNAs in
a single PCR reaction.
Taq-Mediated Amplification of ARE cDNA
The monocytic leukemia cell line, THP-1, was used as a cellular
study model. It is a tumor cell line that corresponds to immature monocytic cells and constitutively produces the ARE mRNAs,
interleukin-8 (IL-8) and tumor necrosis factor (TNF-
), and non-ARE
mRNA, IL-8 receptor (IL-8R) (Khabar et al. 1997
; Murayama et al. 1997
;
Al-Humidan et al. 1998
). In many of the experiments, the cells were
treated with both lipopolysaccharide (LPS), a potent inducer of
cytokines, and cycloheximide (CHX). CHX blocks the synthesis of
proteins, enhances expression of early response genes, and increases
stability of the transient ARE mRNAs (Shaw and Kamen 1986
; Reeves and
Magnuson 1990
).
By use of a universal 3' primer that targets the ARE region and one of
the 5' primer set, selective amplification of ARE cDNA was achieved
using optimized conditions of Taq-derived amplification (termed here ARE-cDNA PCR). The goal of ARE-cDNA PCR is to amplify the
typical ARE-containing mRNA sequences (ARE mRNAs) with suppression of
the amplification of non-ARE mRNA sequences. To verify the selective
amplification of ARE cDNAs, we subsequently set up second specific PCR
reactions for several examples of ARE molecules: IL-8 mRNA, TNF-
mRNA, and c-fos mRNA, each having different lengths of the pentamer
repeat ATTTA. The 5' primers for the specific PCRs were designed at or
near the initiation sites so that amplification is enriched toward the
large (i.e., containing the full-length regions) and not the shorter
amplified ARE products, if any. The non-ARE messages of
-actin and
IL-8R that contain a single ARE pentamer in a non-ARE context region in
the 3' UTR were used as controls. The choice of these rigorous controls
was to monitor the stringent selective amplification of the typical
ARE-cDNAs but not non-ARE-cDNAs. The second specific PCR was performed
under relatively stringent conditions, for example, only 4 ng of ARE cDNA and the use of low dNTPs concentration and cycle number within the
exponential phase of amplification to allow semiquantitative comparison
and to eliminate or minimize amplification of original cDNA carried
over from original cDNA template. Thus, the specific signals would be
attributed to the amplification of the coding region containing
products in the ARE-cDNA PCR, as described below.
Optimum conditions in the ARE-cDNA PCR were initially fine-tuned from
many trials that included amount of RNA, type of reverse transcriptase
(RT), amount of input cDNA, primer concentration, type of Taq
enzyme, annealing temperatures, and start conditions. We observed that
the optimum selectivity for amplification of the ARE-cDNA of IL-8 was
dependent on the use of CHX and start condition of PCR using either
SuperScript II or Moloney murine leukemia virus (MMLV) (Fig.
1a). The optimum selectivity, that is, the
specific amplification of ARE-cDNA was verified by observing the
enhanced IL-8 amplified signal and the minimum or lack of
-actin
cDNA signal. The results showed that CHX treatment of the cells before
RNA extraction increased the ARE cDNA signal as expected (Fig. 1a).
Also, ARE-cDNA PCR was optimal with the use of
anti-Taq-mediated start conditions when compared with direct hot start or regular PCR. Subsequently, we used the optimum conditions of the ARE-CDNA PCR, which were the adoption of CHX treatment of the
cultured cells before RNA extraction, and the use of
anti-Taq-mediated PCR reaction (Fig. 1). In comparison with
regular abundance of IL-8 and
-actin cDNA signals as observed by
RT-PCR (Fig. 1b), an almost reversal of abundance was achieved with the
optimized ARE-cDNA PCR. The specific amplification of IL-8 that
resulted in a strong signal was not attributable to original cDNA
carried over from the original cDNA (40 ng), used in the ARE-PCR, to
the second specific PCR (Fig. 1c). The residual and minimum
-actin signal was apparently from original cDNA carried over to the second specific PCR and not from ARE-cDNA PCR (Fig. 1c). The
-actin cDNA
amplification can also be eliminated by dilution, for example, 1/10, of
the ARE-cDNA PCR into a second ARE-cDNA PCR (data not shown). In
addition, we used PCR with specific primers to the non-ARE IL-8
receptor mRNA, which resulted in no IL-8 receptor specific signals
(data not shown).
|
The effect of the initial annealing temperatures on the amplification
of cDNAs with different ARE structures was studied (Fig. 2). The results showed that small
differences in ARE annealing temperatures, that is, during the first
four cycles, had significant effects in the case of IL-8, which has
discontinuous multiple nonamers, TTATTTAWW (Fig. 2a) but not with
TNF-
, which has continuous overlapping multiple nonamers (Fig. 2b).
In other words, IL-8 has two overlapping pentameric repeats
(ATTTATTTA), whereas TNF-
has five overlapping pentameric repeats.
The normally abundant
-actin signal was largely suppressed in all
lanes because the optimized conditions adopted from previous
experiments were used. The optimum ARE annealing temperature for
ARE-cDNA PCR in regard to selectivity of IL-8 amplified products was
35°C when compared with ARE annealing temperatures of 40°C and
32.5°C (Fig. 2a). In contrast, there were no significant effects of
variations in the initial annealing temperature (32.5-40°C) of the
ARE-cDNA PCR on amplification of continuous multiple ARE-cDNA as in the case of TNF-
(Fig. 2b). Thus, the 15-mer two overlapping nonamers in
the 3' ARE primer appeared to anneal to multiple targets of two
overlapping nonamers (more than three pentameric repeats), leading to
enhanced amplification of TNF-
. Unlike the case with IL-8 cDNA,
amplification at lower initial temperature still led to significant
TNF-
signals as a result of the presence of multiple (more than six)
partial continuous and overlapping repeats (Fig. 2b). Temperatures as
high as 45°C were also slightly tolerated in the case of TNF-
cDNA
amplification, and amplification was decreased dramatically at 50°C,
whereas no specific amplification was seen at 55°C (data not shown).
The minimum number of cycles required for optimum PCR signal for
TNF-
and IL-8 was 25; the increasing number of cycles did not result
in further improvement of signals, indicating that the amplification
was above exponential linearity at a cycle number higher than 25 (Fig.
2, lanes 1-3). In all of the experiments, DNA contamination was
monitored by lack of larger PCR products because primers for the
specific PCRs were designed to span more than one exon.
|
Mini-Libraries and Random Cloning and Sequencing
To show the utility of the computationally facilitated ARE-cDNA PCR,
and to analyze and confirm the amplified sequences, we adopted a random
cloning and sequencing approach. The random cloning and sequencing
approach was performed by construction of a mini-library in which the
amplified ARE products from ARE-cDNA PCR were cloned into pUC19 or
pCR2.1 vectors, and clones were randomly picked for sequencing. We also
used the gel-format differential display with modifications to display
the amplified ARE cDNAs (details below). Sequence data confirmed the
presence of AREs with different lengths that were dependent on the
initial annealing temperature. Most of these partial cDNA fragments
(57%) are novel, that is, 23 of the 30 cDNAs are uncharacterized, as
determined by BLAST search against GenBank and EST
databases (Table 3). Also, this indicates
that enrichment of transient rare ARE-mRNAs were made possible by this
method. Among the previously known cDNA fragments, several sequences
corresponded to known cDNAs (Table 3) having a 15-bp ARE pattern
consisting of one to four overlapping nonamers. Among them, several
were noted to be typical ARE-mRNAs, interleukin-1
, c-fos, and
plasminogen activator inhibitor protein (PIA2). In addition, there were
several matches to the human dbEST databases.
|
Although the mini-library from the ARE-cDNA PCR was constructed for the
purpose of validating the ARE selectivity, we assessed the size
distribution in a sample of 10 randomly picked clones. Despite the fact
that insert size can be influenced by library construction methods and
size exclusion procedures, the mini-library produced an average of 0.9 kb and a range of 800-1100 bp among nine clones; one clone had a size
of 400 bp. These mini-libraries are within the size range and average
insert size that are comparable or superior to those reported by others
in constructing PCR-based mini-libraries (Bertiol et al. 1994
; Peterson
et al. 1998
).
Primer-Target Annealing Characteristics and Size Distribution of Amplified ARE-cDNA Products
The annealing specificity of our primers under different annealing
temperatures (35°C, 40°C, and 45°C) was assessed using the
optimized ARE-cDNA PCR conditions, including the use of
anti-Taq start conditions. The use of anti-Taq start
conditions minimized primer-target mispriming when compared with
regular or hot start conditions (data not shown). We analyzed 26 target
sites for the 3' and 5' primers (from 13 ARE-PCR products) that were
retrieved by BLAST search and found to be portions of
seven known genes, six EST records, and seven hits in the human genome
project database with no characterized cDNA/EST among the 30 sequences in Table 3. The ARE 3' primer annealed longer than the 5' primer; approximately two to three AT bases for each one G/C base in the 5'
primer. There was complete homology between at least eight bases at the
3' end of both the 5' and the ARE 3' primers and the target region,
whereas mispriming occurred only toward the 5' end (Fig.
3a). In addition, mispriming was reduced
with higher annealing temperatures (Fig. 3a). Figure 3a also shows and
confirms the findings of the specific PCRs performed with IL-8,
TNF-
, and c-fos: The higher the annealing temperatures, the higher
proportion of cDNAs with longer ARE stretches were amplified, whereas
those cDNAs with shorter stretches were not efficiently amplified.
Thus, we have chosen to perform most of our subsequent ARE-cDNA PCRs with a temperature of 40°C to 42.5°C, which allowed the
amplification of a 13-bp ARE pattern with at least two overlapping
ATTTA repeats.
|
In addition to the fact that the ARE amplified products containing the
full-length coding regions (1.0, 1.2, and 1.8 kb) that belong to the
IL-8, TNF-
, and c-fos cDNAs, respectively, were efficiently
amplified as described previously, the size distribution of the ARE-PCR
products was assessed. The ARE-cDNA products were visualized by
ethidium bromide smear of agarose gel showing a range of up to 3.0 kb
when the extension time was increased to 3 min (Fig. 3b). The agarose
gel smear was further evaluated by using radioactively labeled ARE-PCR
products (Fig. 3c). The size distribution of the amplified ARE-cDNA
products ranged from 200 bp to more than 1.65 kb using the standard
ARE-cDNA PCR (Fig. 3c, lane 1). The proofreading polymerase,
pfu, an enzyme that is suitable for generating long PCR
products, was also added in small amounts (0.1 U) to the ARE-cDNA PCR
to improve the size distribution; the extension time was increased to
3.5 min. As a result, the size distribution favored longer ARE-PCR
products, ranging from 0.5 kb to more than 5 kb (Fig. 3c, lane 2).
Column purification, which was applied before gel electrophoresis to remove access dNTPs, primers, and short PCR products, resulted in size
distribution ranging from ~300 bp to 1.5 kb and 0.8 kb to >5 kb in
the absence or presence of pfu, respectively (Fig. 3c, lanes 3 and 4). It should be noted that the agarose gel smear is useful for
size distribution but not for visualization of discrete PCR products as
in the case of polyacrylamide gel-based electrophoresis (PAGE), which
are limited to shorter DNA sizes; the PAGE differential display was
used in the next experiment.
Specificity of ARE-PCR Subsets and Biological Utility
The specificity of 5' primers that target the distinct subset of the
16 amplified ARE pools was verified by using two 5' primers that
amplify distinct subsets of the ARE-cDNAs. This was shown with c-fos
amplification (Fig. 4a) by the 5' primer Ga
but not the 5' primer Cg (Table 1). The optimum initial annealing
temperature for the two overlapping nonamers containing c-fos was
40°C (Fig. 4a). The specificity of the 5' primers was also verified
in the case of specific TNF-
amplification by the primer Ca but not the primer Ag (data not shown).
|
The long-range differential display (Fig. 4b) also showed that
different amplified ARE-cDNA patterns were distinctly displayed using
the 5' primers Ca and Gt (Table 1). In this particular experiment, we
used RNA from untreated, LPS + CHX treated, and phorbol
myristate acetate-treated cells. Many bands were
up-regulated in response to LPS and CHX. Several bands with known
sequence identity were overexpressed as a result of LPS + CHX,
namely, IL-1
, c-fos, plasminogen activator inhibitor protein (PIA),
and small inducible cytokine A4 (SCYA4); each has significant sequence information. In particular, the full-length CDS of SCYA4
(278 bp CDS, which is a part of the 450-bp band in addition to a
portion of the 3' UTR) was displayed on the gel, indicating the
feasibility of targeting full-length coding regions for small mRNA
molecules in the long-range differential display gel. Although we have
not confirmed the differential expression of these ARE genes, data in
the literature using the same cell line and inducers as in our study
(Collart et al. 1987
; Schwartz and Bradshaw 1992
; Kastelic et al. 1996
)
support the differential expression of these genes.
| |
DISCUSSION |
|---|
|
|
|---|
The ARE-gene family comprises a large component of the human
transcriptome, 8% of the total mRNA population (Bakheet et al. 2001
),
and encodes proteins that are involved in transient biological processes and are important in several disease states. In this study,
we devised an integrated computational laboratory approach not only to
selectively amplify the ARE-cDNAs but also to target a large proportion
of the protein-coding region of the ARE-cDNA family. The ARE-cDNA PCR
was used for several pilot applications to show its utility, namely,
the mini-library construction, EST generation, and differential display.
The computational approach in targeting large, including full-length,
protein-coding regions of the ARE-cDNA family relies on designing one
universal primer that targets ARE regions (the ARE 3' primer) and
several 5' primers that target consensus sequences largely unique to
regions near to the 5' UTR, including the initiation sites. The ARE
primer is based on the computationally derived 13-bp pattern that is
specific to the 3' UTR and not to coding regions (Bakheet et al. 2001
)
or 5' UTR (our unpublished observations), thus allowing priming
downstream of the coding region. It should be noted that we used a
simple computational method for the derivation of ARE consensuses
(Bakheet et al. 2001
) that may not permit the subtlety of other
computational models such as those using a matrix, profile, or Markov
models. Our consensuses in the initiation regions contained the
conserved pattern CACCATGG in 30% of total ARE mRNAs similar to the
Kozak sequence: CRCCATG (Kozak 1987a
,b
). It is also similar to the
pattern of a larger list available in the TransTerm database: CAMCATGGC
(Dalphin et al. 1999
). The overall consensus initiation site in the
ARE-mRNAs was SSMAMSATGRM with 50% certainty level at each position.
In comparison, the initiation consensus of nonclustered random human
sequences was SSSRMSATGRM (Dalphin et al. 1999
). We noted that the
initiation context consensus occurs also inside the coding regions. The
internal sites predominantly occur toward the initiation codon and the
5' UTR. The presence of more than one ATG codon that includes the Kozak
sequence in the 5' UTR has been recently noted (Suzuki et al. 2000
).
The presence of internal initiation consensus may indicate the
possibility of alternative translation sites leading to different
protein isoforms. Alternative translation initiation sites have been
experimentally determined in many mRNA species (van der Velden and
Thomas 1999
).
Previous attempts of arbitrary PCR/RNA fingerprinting protocols that
target gene families, including zinc fingers, 3'-ARE-containing UTR
regions, and MADS-box coding sequences (Asson-Batres et al. 1994
;
Fischer et al. 1995
; Johnson et al. 1996
; Gonsky et al. 1997
; Dominguez
et al. 1998
), were tried but had limitations that our ARE-cDNA
procedure avoids. For example, in a study that targeted G-protein
coupled receptors (Lopez-Nieto and Nigam 1996
; Consalez et al. 1999
), a
statistically designed primer set targeted at the protein-coding
regions of mammalian G-protein coupled receptors needed 496 pair of
primer sets to span 77.7%. The detection rate of the two subsets of
ARE-mRNAs, those containing two and three ARE pentamers, using the
computational approach exceeds 80% with only a definitive set of the
primers that equals 16 primers. Procedures by others targeted the short
regions between the 3' ends and the AREs (Asson-Batres et al. 1994
;
Gonsky et al. 1997
; Dominguez et al. 1998
) that use 3' primers to the
polyA tail and 5' primers to ARE regions, resulting in largely AT-rich
regions that are characteristically homologous, redundant, and in which
size distribution is restricted. Also, none of these approaches by
others address the computational targeting of large or full-length
protein-coding regions as described in this paper.
Although most of the ARE-cDNA PCR reactions in our approach yielded a
mixture of two size populations of the coding regions (full and
truncated), the amplified ARE products primed with the 5' primer at
internal sites yielded a significant proportion of the coding region.
The large fragments find their use with construction of ARE-cDNA
libraries, microarray expression approaches, and isolation of
full-length cDNA using 5' RACE or from the human genome project data.
The significant coding information generated from our ARE-cDNA PCR
protocol is an improvement in contrast with other approaches such as
restriction fragment length polymorphism- (RFLP) PCR (Fischer et al.
1995
), SAGE, (Velculescu et al. 1995
), and restriction enzyme-digested
cDNA amplification (READS), (Prashar and Weissman 1996
), which yield
products with much less sequence information. The informative
protein-coding regions can be compared for homology with known genes in
the databases. The amplified fragments shorter than 1.5 kb can be used
with long-range differential display techniques. Use of conventional
differential display is limited to ~600 bp. Primarily, the
conventional differential display is intended for smaller ARE products
(<600 bases); thus, shortening with restriction enzymes can also be
used with a protocol such as RFLP differential display (Fischer et al.
1995
; Kato 1995
; Ivashuta et al. 1999
). Unlike specific
domain-targeting PCR approaches such as those used in conjunction with
differential display (Fischer et al. 1995
; Johnson et al. 1996
),
focusing on fewer genes, this method targets a broader family of genes
because of the presence of ARE-elements, yet it is not as complex as
the overall expressed genome repertoire.
The ARE-cDNA approach was largely successful in the selective
amplification of ARE-cDNA as verified by (1) selective amplification of
IL-8, TNF-
, and c-fos cDNA and (2) by the presence of AREs in the
amplified ARE-cDNA fragments, as revealed by sequencing, which were
either excised from differential display gels or randomly picked from
pUC19 mini-libraries. The thermoprofiling control of the initial
annealing temperatures allows extra flexibility in targeting subsets of
ARE-cDNAs on the basis of the number of AREs. None of the cDNA
fragments lack AREs, indicating the specificity of the ARE-cDNA PCR.
This is probably because the mispriming that is frequently encountered
with arbitrary PCR and mRNA fingerprinting has been significantly
minimized with the use of anti-Taq and temperatures at or
higher than 40°C. Anti-Taq has been successfully used for
its outcome in enhanced specificity of primer annealing in PCR
(Morrison et al. 1998
). A limitation that is commonly inherited in PCR
protocols is that longer cDNAs are amplified less efficiently than
shorter cDNAs. Our approach showed that IL-8, TNF-
, and c-fos cDNAs, in which their respective predicted size of the products that were generated from ARE-cDNA PCR (i.e., 1.0, 1.2, and 1.8 kb,
respectively) including the full-length coding regions, were efficiently and selectively amplified, as verified by use of specific primers in which the 5' primer was designed at or near initiation sites. In addition, assessment of size distribution of electrophoresed amplified ARE-cDNA products showed a size distribution of up to 3.0 kb
when the PCR extension step was increased to 3 min; longer PCR products
(>4 kb) were also generated when pfu polymerase was used.
An additional advantage of the ARE-cDNA PCR is the ability of detecting
rare genes. About half of the sequence information obtained from the
pUC19/pCR2.1 mini-libraries and bands excised from differential display
gels did not match any mRNA/cDNA or EST database entries. This
indicates that the ARE-cDNA PCR targets rare messages that are
otherwise masked by overexpressed genes in many techniques or not
normally represented in conventional cDNA libraries. Among the
previously known cDNAs are interleukin-1
, c-fos, plasminogen
activator inhibitor protein (PIA2), small inducible cytokine A4
(SCYA4), hypoxia-induced factor alpha, amyloid A4 protein, and
diacylglycerol delta kinase (all have typical ARE stretches), whereas
IL-1
, c-fos, and PIA2 belong to previously characterized ARE-mRNAs
as such (Chen et al. 1994
; Kastelic et al. 1996
; Maurer et al. 1999
).
Differential display results indicated the strong up-regulation of
several typical ARE mRNAs including IL-1
because of the treatment of
the potent cytokine inducer LPS and the protein synthesis inhibitor
CHX. IL-1
is a typical ARE mRNA with three ARE clusters that is
known to be up-regulated by LPS (Kastelic et al. 1996
). The presence of
the ARE-cDNA bands, despite the treatment with CHX as a protein
synthesis inhibitor, indicates the expected biology of ARE-mRNAs, in
which many of them encode transient and early response proteins that
are independent of protein synthesis of other gene products. The
full-length CDS of SCYA4 (278-bp CDS contained in a 450-bp
band that also includes a portion of the 3' UTR) was displayed on the
gel, indicating the feasibility of targeting full-length coding regions
by the computational/laboratory strategy. Although we have not
confirmed the differential expression of these ARE genes, the data in
the literature support differential expression of the genes (IL-1
, c-fos, PIA2) in monocytic cells (Collart et al. 1987
; Schwartz and
Bradshaw 1992
; Kastelic et al. 1996
). PMA, which is known to
differentiate THP-1 cells to monocytes (Tsuchiya et al. 1982
), also
induced differential expression when compared with control cells,
although fewer bands were observed than with LPS and CHX treatment.
PIA2 was also expressed in response to PMA in accord with PMA induction
of PIA2 in monocytic cell lines (Gyetko et al. 1988
).
The ARE-cDNA PCR method can be used as both a discovery and expression
tool. There are several advantages of the ARE-cDNA PCR as a discovery
tool for ARE genes when compared with gene prediction from the human
genome: (1) The amplified ARE products are the result of modulated
transcripts and tissue specificity, whereas predicted genes are not;
(2) the sequences are likely more accurate than the predicted genes;
and (3) there is no need for downstream laboratory verification for
exon accuracy and expression as with the predicted genes. Unlike those
expression-profiling approaches such as those that depend on existing
clones of known sequences for use with cDNA microarrays, the ARE-cDNA
PCR when coupled with cDNA microarray yields information on both known and novel genes. In contrast with techniques for discovery of modulated
transcripts such as subtractive hybridization and differential screening of cDNA libraries that cannot be used for mRNA profiling, ARE-cDNA PCR can be used for mRNA profiling when coupled with differential display, RLFP-differential display, and microarray. Many
techniques may be biased toward a certain threshold of mRNA abundance
and require largely high RNA input samples (Duggan et al. 1999
). In the
ARE-cDNA PCR, as little as 40 ng of total RNA can be used for one
single PCR reaction; this allows its utility with tissues in which the
amount of RNA is of great concern. A combined discovery and expression
profile tool is SAGE (Velculescu et al. 1995
), which has the limitation
of exact identification of genes by the short sequence tags and lack of
physical clones. Thus, the described method circumvents many of these
limitations in addition to the multiple other potential uses of the
ARE-cDNA PCR method.
In brief, the described computationally facilitated PCR approach and its putative applications in gene discovery and expression analysis reduces the limitations of other technologies and contains novel and significant improvements. The broader aspect of the ARE-cDNA PCR is obvious because of the diversity of the ARE-mRNA repertoire; this diversity spans many biological processes that are not limited to cellular growth, differentiation, immune response, inflammation, and cardiovascular toning. Thus, the method should constitute a valuable tool in discovering novel genes and pinpointing suspicious genes that are dysregulated, for example, in cancer but not in normal states. In addition, as more sequences are discovered, the method will ultimately help to understand the diversity, complexity, and potential involvement of expressed ARE genes in human disease.
| |
METHODS |
|---|
|
|
|---|
Sequence Retrieval and Analysis
Sequence retrieval and analysis was performed using the GCG
Wisconsin Package (Genetics Computer Group [GCG]/Oxford Molecular Co.) and source codes written in PERL (Practical Extraction and Report Language). A human minimally redundant AU-rich element (ARE)-containing mRNA/cDNA database (ARED) was
previously constructed using GCG and written
PERL codes using GenBank Release 113 (National Center for
Biotechnology Information, NCBI) and contains 895 sequences (Bakheet et
al. 2001
).
A 12-bp region comprising the 7 bp before (
7 bp) the start of the
coding region domain sequence (CDS) in the 5' UTR, ATG, and 2 bp after
the start codon (+2 bp) from each sequence that belongs to ARED was
constructed using the Assemble (GCG package). The
truncated 5' UTR list was divided into 16 subsets according to the
formula NATGN, using the FindPattern program (GCG). The
Pileup program (GCG), progressive pairwise algorithm, was
performed to align the truncated 5' UTR sequences in each of the 16 subsets; its output was written in multiple sequence format file (MSF).
The MSF file was edited to be used as input to the
Consensus program (GCG) to calculate nucleotide
frequencies in positions flanking ATGN or NATGN; consensus patterns
were generated for use in the design of 5' primers. All of the 16 consensuses were subsequently used as patterns in the FindPattern program (GCG) to search for hits in the
overall ARE mRNA/cDNA database, which comprised the 16 subsets. ARE
motif positions were deduced using FindPattern output
(GCG); calculation of the fragment lengths (from beginning of the CDS to the AREs) and statistical analysis was performed using the Excel program (Microsoft).
Primer Design
Primers for PCR were synthesized by the genomics facility at King Faisal Specialist Hospital and Research Center or GIBCO-BRL. The 3' ARE primer was designed to incorporate sequences that target the 15-mer ARE target. The ARE primer sequence is 5'-GGCGGATCCGGGCTAAATAWATAAATWA-3'. The primer contains a BamHI site, GGATCC. Sixteen 5' primers were designed to incorporate 10-mer initiation context consensus generated from the above analysis. All primers, in addition to the desired 5'-end sequences, were elongated with GC-rich sequence incorporating restriction sites to facilitate cloning and in vitro transcription. The upstream primers contain a common 5'-end sequence in addition to the variable 10-mer consensus patterns: 5'ACGACTCACTATAGGAA CAGA + 10-mer 3'. The computer program, Oligo 6.0 (Molecular Biology Insights, Inc.) was used to verify the physiochemical properties of the oligonucleotides for suitability in PCR reaction.
Cells and Reagents
The monocytic cell line THP-1 was obtained from the American Type Culture Collection (ATCC) and grown in RPMI 1640 supplemented with 10% FBS (low endotoxin-FBS; HyClone, UT) and antibiotics (Sigma Chemical Corp.). The cells (1 × 106 cells/mL in 25-cm2 flasks) were treated with 10 µg/mL CHX and 10 µg/mL of lipopolysaccharides (Sigma).
RNA and cDNA Synthesis
Total RNA was extracted from cells by the guanidine isothiocyanate method using Tri Reagent (Molecular Research Center). In many of the experiments, RNA was subject to DNAse I treatment (10 U per reaction, Promega) and followed by chloroform extraction, precipitation, and resuspension in diethyl pyrocarbonate (DEPC)-treated water.
The RT reaction was performed using 1 µg total RNA and 2 µM cocktail of anchored primers: oligo(dT)12A, oligo(dT)12C, and oligo(dT)12G primers, 50 µM each dNTP (Perkin Elmer), 40 U RNAsin (Promega), and 200 U of MMLV RT or SuperScript II (GIBCO BRL). The samples were heated to inactivate RT.
ARE-cDNA PCR
The cDNA (40 ng) was amplified using 1 U of Taq DNA polymerase (Amplitaq, Perkin Elmer) per reaction (20µL of total volume) that was treated with anti-Taq antibody (Clonetech). In certain experiments, direct hot-start PCR was used instead. The cDNA was subject to PCR using the single 3' universal primer (the ARE primer) and each of the 5' primers that was specific to initiation context consensus sequences as explained above. PCR conditions were performed with 1 µM final concentration of the primers, 10 µM each dNTP, 1.5 mM MgCl2, and 1X PCR buffer (Perkin Elmer). Thermal cycling using Gene Amp PCR System 9600 (Perkin Elmer) was first performed at 95°C for 2 min to inactivate the anti-Taq. This was followed by four cycles with denaturation of 94°C for 1 min, variable annealing temperature "controlled stringency" for 2 min, and an extension step at 72°C for 2-3 min. The variable annealing temperature in the controlled stringency step is any temperature between 35°C and 45°C depending on the ARE motif repeats targeted in ARE cDNA. The four initial cycle steps were then followed by 30 high stringency cycles of the following protocol: 94° C for 1 min, a fixed annealing temperature of 60 °C for 2 min, and an extension step at 72°C for 2-3 min.
In some experiments, a two-step PCR protocol was used to increase specificity and amount of ARE-cDNA PCR products for subsequent applications. Briefly, 1/100 of the PCR reaction was subjected to a second round of high stringency PCR using the following protocol: 95°C for 1 min, followed by a fixed annealing temperature of 60°C for 1 min, and an extension step at 72 °C for 2-3 min. Final concentrations of 1 µM of the same primers were used, 50 µM of dNTPs, 1 U Taq, and 1X PCR buffer.
Size Distribution of the Amplified ARE-cDNA Products
Two approaches were performed to assess the size distribution of
the amplified ARE-cDNA products. One approach was performing the
two-step PCR protocol (above) to visualize the ethidium bromide staining of the PCR products in agarose gels. The second approach was
performing ARE-cDNA PCR in the presence of 0.1 µL (
-32P)
dCTP (3000 Ci/mmole) per 20 µL reaction. In some of the experiments, pfu polymerase (Stratagene, Inc.) at 0.1 U per reaction was
used in mixture with Taq (1 U). Southern blotting was
performed using Zeta Probe nitrocellulose membranes. The blots were
exposed to x-ray films (30 min) and the autoradiograms were
subsequently developed.
Second Specific PCR
PCRs specific for ARE cDNAs, IL-8, TNF-
, c-fos, and the non-ARE
cDNAs were performed.
-Actin and IL-8 receptor PCRs were performed
under relatively stringent conditions to eliminate or minimize
amplification of cDNA carryover from original PCR (ARE-cDNA PCR) tubes
so that (enhanced) signals of specific targets would be attributed to
the ARE-cDNA PCR. Specifically, 2 µL from the ARE-cDNA PCR reaction
was amplified in the presence of 0.4 µM primers, 50 µM each dNTP,
1X PCR buffer, and with 1 U of Taq DNA polymerase per reaction.
The PCR products for IL-8, TNF-alpha, c-fos, and beta-actin have the
sizes of 289 bp, 548 bp, 624 bp, and 642 bp, respectively. To monitor
genomic DNA contamination, the primers were designed to span intronic
sequences so that the genomic PCR products of 1216, 1450, 2130, and
1320 bp for the respective molecules would indicate the contamination.
The sequences of the primer pairs used were as follows: IL-8 sense: ATG
ACT TCC AAG CTG GCC GTG GCT; IL-8 antisense: T CTC AGC CCT
CTT CAA AAA CTT CTC; TNF-alpha sense: CTT CTG CCT GCT GCA CTT TGG A;
TNF-
antisense: TCC CAA AGT AGA CCT GCC CAG A; c-fos sense: GGG GAT
AGC CTC TCT TAC TAC CAC; c-fos antisense: GCT GCA TAG AAG GAC CCA GAT
AG;
-actin sense: ATC TGG CAC CAC ACC TTC TAC AAT GAG CTG CG; and
-actin antisense: CGT CAT ACT CCT GCT TGC TGA TCC ACA. The 5'
primers were designed at or near the initiation sites so that
amplification is enriched toward the larger and not shorter amplified
ARE products. The cycling profile was as follows: 94°C for 1 min,
60°C for 1 min, and 72°C for 1 min for 25 cycles; the total
reaction volume was 20 µL. The PCR products were resolved on 2%
agarose gel and visualized with ethidium bromide. Size markers (100 bp)
were obtained from GIBCO and used to verify the size of PCR products.
Construction of Mini-Libraries, Random Cloning, and Sequencing
ARE-cDNA PCR products were cloned in either pCR2.1 vector (TA cloning kit; Invitrogen) or the pUC19 vector (T-7 blue blunt-end cloning system; Novagen) according to the manufacturer's instructions. Positive colonies were randomly picked and further propagated in LB medium. Plasmids were extracted by Qiagen Miniprep kit. Sequencing was initially performed with Taq dye terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, ABI) in automated fluorescent DNA sequencer 373A (ABI) according to the manufacturer's instructions. Sequencing was performed with vector-specific primers flanking the PCR products such as M13 forward and reverse primers. PCR products, 2 µL, were treated with 0.2 U exonuclease I (New England Biolabs, ) and 0.2 U sheep alkaline phosphatase in a 10-µL final volume to degrade excess primers and dNTPs, respectively. These reactions were performed for 1 hr at 37°C and terminated with heat (85°C, 15 min). Templates (2.5 µL each for M13R and M13F reaction) were mixed with the sequencing DYEnamic ET reagent premix (Pharmacia), along with 10 pM of the sequencing primers (M13). Cycling was performed according to the manufacturer's instructions. Sequencing reactions were precipitated, washed, air-dried, and resuspended in loading buffer. Parameters for injecting and running the samples were performed according to the manufacturer's instructions (Amersham-Pharmacia).
Differential Display, Long Range Differential Display, and Cloning of Differential Display Products
The cDNAs (40 ng/mL) were amplified as explained above in the ARE-cDNA PCR but with the use of 0.5 µM (35S)dATP (1200 Ci/mmole;). Aliquots of 4 µL from the PCR reactions were loaded on a 6% urea-denaturing polyacrylamide gel and electrophoresed for 2.5 hr in TBE buffer. Unless otherwise indicated, the long-range sequencing gel apparatus was used (Genomyx LR, Beckman) using 4.5% urea-denaturing polyacrylamide gel with 16 hr electrophoresis for resolution of long PCR fragments. Gels were fixed and dried; x-ray autoradiograms were developed for 48 hr. Bands were excised from the gel and rehydrated in 100 µL 10 mM Tris-HCl at pH 8.0 and 1 mM EDTA for 15 min at 25°C followed by elution at 95°C for 15 min. The PCR products were precipitated with sodium acetate/ethanol, centrifuged, and resuspended in 10 µL water. An aliquot of 5 µL was used for reamplification using the same 5' primer and 3' primer set that was originally used in ARE-cDNA PCR and final concentration of dNTPs, 100 µM each, and 30 cycles of 94°C for 45 sec, 60°C for 1 min, 72°C for 2 min, and final extension cycle of 72°C for 7 min. Aliquots of the PCR products were visualized on 2% agarose or low-melting temperature gel to confirm reamplification and PCR product size. The PCR products were cloned in pCR2.1 or in pUC19 as described above.
| |
ACKNOWLEDGMENTS |
|---|
We thank Dr. Brian Meyer and his team at the Genomics Service Facility for their helpful assistance. We also thank Dr. Meyer for critical review of the manuscript.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL khabar{at}kfshrc.edu.sa; FAX 966-1-442-7858.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.204902. Article published online before print in May 2002.
| |
REFERENCES |
|---|
|
|
|---|
Received July 10, 2001; accepted in revised form March 28, 2002.