|
|
|
Published online before print
July 19, 2002, 10.1101/gr.473902
Vol. 12, Issue 8, 1231-1245, August 2002
LETTER
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
In a previous proteomic study of the human spliceosome, we identified 42 spliceosome-associated factors, including 19 novel ones. Using enhanced mass spectrometric tools and improved databases, we now report identification of 311 proteins that copurify with splicing complexes assembled on two separate pre-mRNAs. All known essential human splicing factors were found, and 96 novel proteins were identified, of which 55 contain domains directly linking them to functions in splicing/RNA processing. We also detected 20 proteins related to transcription, which indicates a direct connection between this process and splicing. This investigation provides the most detailed inventory of human spliceosome-associated factors to date, and the data indicate a number of interesting links coordinating splicing with other steps in the gene expression pathway.
| |
INTRODUCTION |
|---|
|
|
|---|
Biogenesis of proteins in eukaryotes is a multistep process that involves the concerted action of several complex machineries. Multiprotein complexes containing RNA polymerase II are involved in transcribing genes into pre-messenger RNA. Most human genes contain introns that are removed by splicing, a process orchestrated and catalyzed by the large multiprotein/RNA complex termed the spliceosome. Polyadenylation of the mRNA is also catalyzed by a complex processing machinery before mRNAs are exported to the cytosol, where translation by ribosomes takes place. Although much is known about the individual processes in protein biogenesis, how the separate steps are integrated is much less clear.
The spliceosome is comprised of five small nuclear RNAs (snRNAs)
U1,
U2, U4, U5, and U6 snRNA
as well as many protein factors (Staley and
Guthrie 1998
). Some of these proteins are tightly associated with the
snRNAs, forming small nuclear ribonucleoproteins (snRNPs) that are
thought to assemble in a stepwise manner onto the pre-mRNA to form the
spliceosome. Work over the last decade has elucidated the temporal
sequence of recognition of the splice sites by the respective snRNPs
and protein factors (Hastings and Krainer 2001
). Interestingly, the
view of stepwise assembly of the spliceosome has recently been
challenged in favor of a more concerted mechanism involving preformed
spliceosomes (Stevens et al. 2002
). Besides the snRNP subunits, a large
number of non-snRNP proteins are known, which perform various functions
during the splicing reaction. For example, multiple members of the
DEAD-box helicase family are thought to control RNA
base-pairing interactions at different stages of spliceosome assembly
and catalysis, whereas members of the SR motif family are believed to
be link factors promoting protein-protein interactions during
spliceosome assembly. In all, ~100 different proteins have been
linked to splicing through biochemical and/or genetic evidence (for
review, see Will and Lührmann 1997
). However, it remains unclear how
complete this list might be.
In an alternative systematic approach to the traditional
characterization of single splicing factors, the spliceosome can be
purified and its components identified collectively using modern proteomic techniques. Initially, heterogeneous nuclear
ribonucleoprotein (hnRNP) complexes assembled on mammalian pre-mRNA
(Calvio et al. 1995
) and subunits of the yeast spliceosome
were purified and analyzed by mass spectrometric methods (Neubauer et
al. 1997
; Gottschalk et al. 1998
). Subsequently, our groups performed
the first large-scale analysis of a human multiprotein complex on in
vitro-assembled spliceosomes (Neubauer et al. 1998
) using 2D gel
electrophoresis followed by nanoelectrospray (Wilm et al. 1996
) mass
spectrometric analysis. The relation of many of the newly discovered
proteins to splicing was verified by fusing them to the green
fluorescent protein, transiently expressing them in human epithelial
(HeLa) cells, and showing that they colocalized in vivo with
known splicing factors. Further biochemical studies have confirmed a
role in splicing for all of the novel proteins analyzed so far in our
laboratories, showing the specificity of the spliceosome purification
method (Ajuh et al. 2000
, 2001
; Rappsilber et al. 2001
; Lallena et al.
2002
).
More recently, other mammalian protein complexes have also been studied
by similar methods, combining protein affinity purification with mass
spectrometry and database searches (Wigge et al. 1998
; Zachariae et al.
1998
; Rout et al. 2000
; Gavin et al. 2002
; Ho et al. 2002
). Recently,
our groups have reported the identification and analysis of 271 proteins in the human nucleolus, the largest study of an organelle to
date (Andersen et al. 2002a
; Fox et al. 2002
).
Mass spectrometric methods and human sequence databases have continued
to improve dramatically in recent years, allowing both increased
sensitivity and higher throughput. It is now possible to analyze
mixtures of hundreds or even thousands of peptides by liquid
chromatography coupled to tandem mass spectrometry (LC MS/MS; Griffin
and Aebersold 2001
; Washburn et al. 2001
). Improved software and
databases containing most human genes
known or putative
are also now
available, allowing automated data processing of the large volume of
acquired mass spectra.
Building on these advances, we decided to revisit the large-scale analyses of human spliceosome complexes, using these enhanced, state-of-the-art techniques. In the present study, splicing complexes formed on two separate pre-mRNAs were purified, but the resulting proteins were not separated by gel electrophoresis; rather, they were analyzed by automated tandem mass spectrometry of crude peptide mixtures resulting from digest of the entire protein mixture. Using differential mass range pulsing on a quadrupole time-of-flight instrument, a total of 311 proteins were identified. In addition to all the factors reported in our previous spliceosome study and all other essential human splicing factors known, we discovered a further novel 96 proteins about which little or no previous biological information existed. Many of these proteins have a domain structure implicating them directly in splicing/RNA processing. Surprisingly, a number of proteins involved in transcription and cellular regulatory mechanisms copurified with the spliceosome, indicating some form of coupling of these processes to splicing.
| |
RESULTS |
|---|
|
|
|---|
Proteomic Analysis of the Human Spliceosome
Preparation of the Spliceosome
A mixture of spliceosomal complexes was assembled on biotinylated, radioactively labeled RNA (see Methods). In contrast to our previous investigation, two standard splicing substrates, adenovirus (AD1) and
-globin (AL4) transcripts, were used in separate experiments. After
incubation, which led to the formation of active spliceosomes and
assembly intermediates, samples were subjected to gel filtration and
affinity selection of the biotinylated pre-mRNA on streptavidin beads.
To identify proteins binding to the beads directly, we performed gel
filtration of the nuclear extract without labeled RNA and biotin
affinity selection of the same fractions as above. The
protein mixture was then applied to a short, one-dimensional sodium
dodecyl sulfate (SDS)-poly acrylamide gel electrophoresis (PAGE)
gel that allowed removal of SDS, washing, rebuffering, and
efficient digestion according to protocols previously described (Shevchenko et al. 1996Mass Spectrometric Analysis of the Spliceosome
The peptides bound to the high-performance liquid chromatography (HPLC) column were eluted with an organic gradient into the ion source of a quadrupole Time Of Flight (PE-Sciex) instrument capable of high resolution and mass accuracy. A test run with 10% of the material indicated that the material was sufficient for three LC MS/MS experiments. Therefore, for each of the substrates, we performed three separate, identical chromatographic runs with a third of the material each. The pulsing ability of the QSTAR instrument (Chernushevich 2000
|
Data Analysis and Verification
More than 7000 ion peaks were fragmented. After acquisition, fragment mass lists were generated under script control (Analyst, PE-Sciex), added for all six experiments, and submitted to automated database searches using the Mascot search engine (Perkins et al. 1999Relative Abundance of the Different Classes of Proteins
For the interpretation of the results in a large-scale study involving hundreds of proteins and very high sensitivity, it is important to obtain some measure of quantification. The mass spectrometric signal for any given peptide is determined by many factors, most importantly its ionizability in electrospray. Therefore, direct quantification of proteins in an LC MS/MS experiment is difficult. However, there is a general correlation between the number of peptides sequenced per protein and the amount of protein present in the mixture. Because larger proteins can give rise to more peptides, we defined a protein abundance index (PAI), which represents the number of peptides identified divided by the number of theoretically observable tryptic peptides. Figure 2 plots the index for the seven different protein classes, which are explained below.
|
Categorization of Spliceosome-Associated Factors
We first subtracted 20 proteins (see Methods) from the list of identified proteins, for the following reasons: 16 proteins, mainly keratins, were also identified in the background fraction (beads only). One protein appears in a separate database entry as the C-terminal part of one of these 16 background proteins and was also removed. Finally, one protein that is also abundant in human keratinocytes, like the keratins, and two proteins with a very similar domain structure were also discarded. The remaining 292 proteins identified in this large-scale analysis of the human spliceosome were grouped into eight functional categories (Fig. 3). The low number of cytoskeletal, nuclear matrix, and heat-shock proteins usually highly abundant in less specific protein purifications indicates that our spliceosomal preparation is highly specific.
|
Known Splicing Factors, hnRNPs, and Other RNA-Processing Proteins
More than 40 percent of the identified proteins have a known function related either directly to splicing or more generally to RNA processing (Fig. 3; Table 1). Encouragingly, all the spliceosomal proteins identified in our previous proteomic characterization were also identified here. All of the core snRNP proteins (Sm proteins) that are present in U1, U2, U4, and U5 snRNPs were identified. These proteins are small and thus difficult to detect by standard 2D gel electrophoresis, having required separate one-dimensional gel analysis in our previous study (Neubauer et al. 1998
|
Novel Proteins
In addition to the expected splicing and RNA-processing factors, we discovered a large group of novel proteins with no known function (Table 2). We have also included proteins in this category that were previously observed in large-scale screens or that were cloned because of their homology to a known factor, if no further biological information was available (18 and 5 proteins, respectively). These 96 proteins were submitted for homology and domain searches. Interestingly, this analysis resulted in 55 proteins with sequence similarity to known splicing factors or domains that implicate them in RNA processing.
|
Proteins with a Function in Transcription
Table 3 lists proteins with a known function in transcription or translation, that is, the cellular processes that occur upstream and downstream of splicing during gene expression. Among the transcription factors, we found two subunits of RNA polymerase II, members of the histone H2A and H2B families, histone acetylase and deacetylase. Several initiation factors were also present.
|
Ribosomal Proteins and Associated Factors
A number of ribosomal proteins, particularly from the 40S subunit, were identified in the preparation. We furthermore identified several proteins from the signal recognition particle, which binds to the nascent protein chain as well as elongation factor 1.Proteins with Other Previously Described Functions
Seven proteins with potential regulatory roles were identified. Among these were several signaling-related proteins and cell cycle-associated proteins. The remaining proteins include nucleoprotein TPR, a component of the nuclear pore complex; several cytoskeleton-associated factors; and nucleolin, which is an abundant component of the nucleolus.| |
DISCUSSION |
|---|
|
|
|---|
Mass Spectrometric Analysis of the Spliceosome
In this study we have used enhanced MS technology to characterize
the protein composition of human spliceosomes. To ensure maximum
coverage of basal human splicing factors, we analyzed a combination of
active spliceosomes and intermediate splicing complexes that formed on
each of two separate pre-mRNA substrates derived, respectively, from
adenovirus (AD1) and
-globin (AL4) transcripts. Spliceosomes
assembled in vitro were purified, and the resulting protein mixture was
enzymatically degraded to peptides, which were analyzed by liquid
chromatography coupled on-line with mass spectrometric sequencing.
A quadrupole time-of-flight instrument provided high resolution and high mass accuracy in the peptide analysis. More than 7000 ion peaks were fragmented in six separate runs. Based on these high-quality data, a total of 311 proteins were identified unambiguously by a combination of automated database search and manual interpretation of peptide fragmentation spectra (Fig. 1). This surprisingly large number of factors is comprised of 125 proteins involved in RNA processing, 71 proteins involved in other, previously described functions, and 96 proteins that have not been functionally described before.
The larger number of proteins found in the present study compared with
our previous study (Neubauer et al. 1998
) is partly owing to the
increased sensitivity of the enhanced proteomics methods now available
and partly to the less stringent wash conditions used in this study.
The fact that two substrates were used, furthermore, helped to identify
additional factors. Human sequence databases have also improved
dramatically over the last few years. The previous study used a
combination of 2D gel electrophoresis and nanoelectrospray mass
spectrometry. The much higher throughput provided by on-line tandem
mass spectrometric peptide sequencing combined with automated database
searching made it realistic to deal with thousands of peptide
fragmentation spectra and even allowed multiple analysis conditions.
Figure 4 shows the calculated positions of the identified proteins in a plot of isoelectric point versus molecular weight (labeled virtual 2D gel). About 40% of the proteins fall outside of the coordinates of a standard 2D gel. For example, the Sm and Lsm proteins are too small, and many other RNA-processing proteins are too basic or too large to be represented on a 2D gel. Note that two proteins, which are outside the box in Figure 4, had migrated anomalously in the previous analysis such that they had been found at positions inside the coordinates of the previously analyzed 2D gels.
|
We observed a wide variation in the apparent quantity of the spliceosomal proteins (see Fig. 2). The more abundant factors were identified with dozens of sequenced peptides, whereas some of the least abundant factors were identified on the basis of a single peptide. This variation does not only reflect different stoichiometry in the different spliceosomal complexes that were purified, but is also a result of the differential response of the peptides in the analytical method used. To obtain a rough visualization of the abundance of different proteins, we defined a simple protein abundance index (PAI) as the ratio between the sequenced peptides of a protein and the total number of tryptic peptides predicted from the protein sequence (see Methods). Although the PAI in the form presented here is by no means an accurate measure of protein amount, it can be used as a guide for relative classification in abundant and less abundant proteins. For example, the novel proteins G10 protein homolog (EDG-2) and hypothetical protein ENSP00000292314 have a very high index and as such would be excellent candidates for detailed functional studies even though they lack sequence similarities to proteins previously found in splicing/RNA processing. Other proteins with a high index and sequence similarity to known splicing/RNA-processing proteins are the hypothetical proteins similar to U5 snRNP 200 kD, the hypothetical protein similar to U2 snRNP A`, the cyclophilin CGI-124 protein, and the RRM domain-containing Arsenite-resistance protein 2. Among the proteins involved in transcription, Interleukin enhancer-binding factor 2, which binds to the RNA-processing protein Interleukin enhancer-binding factor 3, and the nuclease-sensitive element binding protein 1 also appear to be abundant.
We originally expected that the proteins identified in our previous investigation would have been the most abundant of the much larger number of proteins identified here. However, the average PAI of that group was only moderately higher (0.85 compared with 0.61; data not shown), and many of the previously identified proteins were of low abundance, as indicated by the present analysis. This may reflect the fact that 2D gel electrophoresis with subsequent nanoelectrospray peptide sequencing is also very sensitive for the subgroup of proteins that are readily focused and visualized on the gel.
Proteins associated with the two different substrates were largely identical, especially for the core spliceosomal components. Differences in those components mainly occurred for proteins that were identified with very few peptides, indicating that these were missed in the other purification. However, there were also significant differences in non-core splicing proteins that appear to be unrelated to the analysis and that may have functional significance. As an example, among the clearest differences were the Fuse binding proteins (FBP) 1, 2 and 3, which are unique to the AL4 substrate. FBPs bind to the single-stranded far upstream element (FUSE) upstream of the c-myc gene. In addition to its transcriptional role, FBP1 and its closely related siblings FBP2 and FBP3 have been reported to bind RNA and participate in various steps of RNA processing, transport, or catabolism. Interestingly, the insulin growth factor (IGF)-II mRNA-binding protein 3 was also detected exclusively attached to AL4 and is known to recognize c-myc and IGF-II mRNA, respectively, and to regulate their expression posttranscriptionally. These substrate-specific factors will be the subject of a future investigation. Altogether, 79 factors were unique to the AL4 substrate and 44 to the AD1 substrate.
Discussion of Identified Factors
Significantly, all known U1, U2, and U6 proteins were identified in this large-scale study (Table 1). Virtually all of the other known spliceosomal proteins were also observed, which includes the SR proteins that were not detected in our previous study. Five proteins with a described role in U4/U6 and U4/U6 · U5 snRNPs were not identified with either substrate tested. It is possible that these factors are present in the samples but were missed because of low abundance, weak affinity, or other technical reasons affecting detection in this system. Alternatively, it is possible that these proteins are not, in fact, stable components of the spliceosomes formed on the pre-mRNAs analyzed.
A large proportion of the proteins that were detected have a known function in RNA processing. In addition to the known splicing factors, we identified 20 hnRNP proteins, some of which are also implicated in splicing. Likewise, there are several proteins in the category of other RNA-processing proteins that function in splicing.
Table 2 lists 96 novel proteins present in the spliceosomal
preparation. At first, this appears to be a surprisingly large number,
considering that the spliceosome has been studied intensively for many
years. However, we note that a recent analysis of human nucleolar
proteins showed that >30% of the factors detected were novel despite
more than two hundred years of research into nucleoli (Andersen et al.
2002a
). More than half of the novel spliceosome-associated proteins
detected here either showed strong similarity to known splicing factors
or had domains such as RRM, DEAD box. and/or PWI that implicate them in
RNA processing. Also, a cyclophilin, USA-CyP (Horowitz et
al. 2002
), has been shown to act in the spliceosome and with six novel
proteins that are likely members of the
cyclophilin-type-peptidyl-prolyl-cis-trans-isomerases. Thus this family may play an even larger role in splicing than previously thought.
Interestingly, these novel proteins also show a similar abundance pattern to the known splicing factors (Fig. 2). The fact that these proteins were identified in a spliceosomal preparation, combined with the bioinformatic evidence linking them to splicing, strongly indicates that these proteins are likely to be bona fide splicing factors.
Given the large proportion of proteins implicated in splicing or related RNA-processing activities, it is likely that many of the remaining 42 novel proteins are also involved in these functions. A detailed analysis of all these factors is beyond the scope of this study but will be addressed in future work.
Further studies will be required to assess which of the newly
identified spliceosome-associated proteins are directly involved in
splicing and which are involved in other activities relating to the
synthesis, processing, localization, or transport of nascent mRNA. In
this regard, it is interesting that our parallel analysis of host cell
factor (HCF), identified here as a spliceosome protein, shows that it
is required for splicing in vivo and in vitro (P. Ajuh and A.I. Lamond,
in prep.). However, it is likely that not all of the novel
spliceosome proteins are directly required for the catalysis of
splicing. Rather, we favor the interpretation that the splicing
machinery works in the context of a larger series of activities
required for the production and cytoplasmic export of mature mRNA.
Thus, some of the factors identified may have roles in affecting other
related RNA processing, editing, and transport events. Consistent with
this idea, the proteins detected include multiple components of the 3'
cleavage and polyadenylation machinery (Minvielle-Sebastia
1999
) as well as the double-stranded RNA-specific
adenosine deaminase (DRADA) RNA-editing enzyme that is known to
associate with a nuclear protein complex (Zhang 2001
). The
mRNA export machinery was represented by the proteins Aly (Zhou et al.
2000
), Tap (Gruter et al. 1998
), hHpr1 (Strasser et al. 2002
), and
possibly the nuclear pore protein TPR (Bangs et al. 1998
; Frosst et al.
2002
). Thus, our data provide further support for direct linkage
between splicing and mRNA export (Reed and Hurt 2002
).
We also identified a number of ribosomal proteins involved in protein translation in the cytosol. At present, we know of no direct evidence linking ribosomal proteins to splicing functions. The ribosomal proteins likely copurified owing to direct binding to the RNA bait, but alternatively may have bound to the mRNA export complex, components of which we have identified here. Thus, the significance of the ribosomal protein data needs to be evaluated cautiously until further studies can be carried out to test their potential link to spliceosomes.
Interestingly, a number of transcription-related proteins including
subunits of RNA polymerase II and other transcription factors were
identified in this analysis. This finding indicates a tight coupling of
transcription with splicing, consistent with recent in vivo and
biochemical data indicating such a link (for review, see Bentley 2002
).
For example, it is already known that CA150, which was identified in
our preparation, can bind to RNA polymerase II and SF1 (Goldstrohm et
al. 2001
). Scaffold attachment factor B can bind to RNA polymerase II
and SR proteins (Nayler et al. 1998
) and thus also represents a
possible direct link between transcription and splicing. In this
context, it is interesting to note that Protein inhibitor of activated
STAT1 (PIAS1) likewise has the characteristics of a scaffold-attachment
factor and has a speckled nuclear localization (Tan et al. 2002
), which
is typical for splicing factors.
Although it is known that splicing can be tightly regulated, for
example, in many instances of alternative splicing (Graveley 2002
),
less is known about the mechanisms involved in this regulation. In this
regard, it is interesting that a number of putative regulatory proteins
were found in association with the spliceosome. Three Death-box-containing proteins, one of which is a novel
protein, may link the spliceosome to apoptosis. These proteins may,
however, have a function similar to hHrp1, a Death-box-containing
protein acting in mRNA export. Two other proteins that were found,
protein phosphatase II inhibitor, a protein co-immuno-precipitating
with SPF30 (J. Rappsilber and M. Mann, unpubl) and
poly(ADP-ribosyl)transferase indicate other leads into the regulative
mechanisms of the splicing process that will be followed up in future
studies. It is also possible that splicing activity in vivo
may be regulated during the cell cycle. Consistent with this idea,
certain spliceosomal proteins were initially found as cell cycle
mutants or have been defined by their homology to cell cycle proteins,
for example, the human splicing factor CDC5-like protein (Ajuh et al.
2000
). This possible link to the cell cycle may be supported here by the presence of cyclin A1 and K in our spliceosomal preparations. It
will be interesting to determine whether either of these factors can
act on substrates associated with the spliceosome.
Prospects
We have shown here that the use of enhanced, state-of-the-art proteomic methods facilitate a more detailed characterization of the human spliceosome than was previously possible, as it incorporates both high sensitivity and rapid analysis. This opens up the prospect of detailed proteomic studies addressing the dynamics of the spliceosome, for example, in regulation and in differential splicing, particularly if methods for direct quantification of the proteins can also be used.
Bearing in mind that some of the splicing factors were identified with only one peptide and that we used only two separate substrate RNAs and specific purification conditions, we do not expect the present study to have delivered a final list of spliceosomal proteins. It will be interesting to study alternative purification methods for isolating spliceosomes, including different washing stringencies and different pre-mRNA substrates, to identify even more splicing factors.
There is supporting evidence for functions in splicing for many of the novel factors that we have identified here (see Table 2). For the factors without any domains or sequence identity that links them to splicing, future localization and/or functional studies will be performed to address their putative role in splicing.
The regulatory proteins associated with the spliceosome also prompt multiple new experimental possibilities to study the regulation of splicing both in vivo and in vitro, showing the utility of large-scale proteomic studies as a launch pad for the design of functional studies in molecular cell biology.
| |
METHODS |
|---|
|
|
|---|
Purification of the Human Spliceosome
Human complexes were prepared essentially as described, but using
less stringent wash conditions (Reed 1990
; Calvio et al. 1995
; Neubauer
et al. 1998
). Briefly, a mixture of spliceosomal complexes was
assembled on biotinylated, radioactively labeled RNA. Two splicing
substrates, adenovirus (AD1) and
-globin (AL4) transcripts, were
used in separate experiments. The substrates were each biotin-labeled
and incubated under splicing conditions with HeLa nuclear extracts in
1-mL reactions at 30°C for 1 h, forming both active spliceosomes and
assembly intermediates. After incubation the samples were immediately
loaded onto a 2.5 × 75-cm S-500 gel filtration column, and pooled
fractions from the spliceosome peak were affinity-selected on
streptavidin beads (Calvio et al. 1995
). Proteins bound to the beads
were washed three times in wash buffer (100 mM NaCl, 20 mM Tris-HCl at
pH 7.5), then eluted in 0.3 mL of elution buffer (2% SDS, 20 mM
Tris-HCl at pH 7.5, 20 mM DTT). Eluted proteins were precipitated with
1 mL of methanol together with 12 µg of slipper limpet glycogen
carrier and finally resuspended in 50 µL of elution buffer. This
procedure was repeated 12 times, and the resulting samples were pooled
separately for each of the pre-mRNA substrates. Based on the staining
with Coomassie blue, we estimate that each fraction contained ~6-10
µg of protein in total.
For the background control, nuclear extract was incubated without labeled RNA, followed by gel filtration as described above. Beads were mixed with the fractions that corresponded to the ones that contained labeled RNA in the above-described experiment. Beads were washed, and the bound material was eluted as above.
Sample Preparation for LC MS/MS
After purification, the volume of the pooled samples was reduced in
vacuo; 15% glycerin, 100 mM dithiothreitol, and Bromophenol blue were
added; and the samples were run on a 7.5% SDS-PAGE gel and stained
with Coomassie blue. The lightly stained area containing the total,
unseparated spliceosomal protein mixture was excised, then the proteins
were in-gel reduced, alkylated, and digested using trypsin following
described procedures (Shevchenko et al. 1996
). Peptides were extracted
using first 70 µL of acetonitrile then 100 µL of 50%
acetonitrile/2.5% acetic acid/0.01% heptafluoro butyric acid.
Extracts were combined with the respective supernatants and filtered,
and the volume was reduced in vacuo to ~25 µL.
LC MS/MS Analysis
Vydac 218MSB3 bulk material (3-µm prototype reversed phase
material, a generous gift from Grace Vydac) was packed into
pulled fused silica capillaries (PicoTip, New Objective) with a
100-µm ID and an 8-µm tip opening. Particles formed a
self-assembled particle frit (SAP-frit) at the tapered end according to
the principle of the stone arch bridge (Ishihama et al. 2002
). Peptides
were loaded using a sample loop. The following gradient was used:
buffer A (5% acetic acid/0.02% heptafluoro butyric acid) to buffer B (80% acetonitrile/5% acetic acid/0.02% heptafluoro butyric acid), having the profile: B7%
B15% (0
10 min),
B15%
B35% (10
70 min), B35%
B50% (70
80
min), B50%
B80% (80
85 min), B80% (90 min). The amount
of material was estimated to be sufficient for three analyses using two
initial LC MS/MS analyses of 10% of the sample. Subsequently, three
identical LC separations were performed with the significant difference
that the MS analysis software (Analyst, MDS-Sciex) was
instructed to select only precursors in a certain mass range
(m/z = 350-550,
m/z = 550-750, or
m/z = 750-1400, respectively) for fragmentation.
This was matched by pulsed extraction of fragments, enhancing on
m/z = 400, 600, or 800, respectively, as described
previously (Andersen et al. 2002b
). Tandem mass spectra were acquired
for 1.5 sec, and fragmented peptides were excluded from sequencing for
120 sec. The background control was less complex and contained less material and was therefore only run with one LC MS/MS analysis, pulsed
in the central region and with a precursor selection window of
m/z = 350-1400. Scripts in Analyst created peak
lists on the basis of the recorded fragmentation spectra.
Data Analysis
The combined peak lists of all eight runs contained the information
on 7019 fragmentation spectra. This list was searched against the
International Protein Index (IPI) database
(http://www.ebi.ac.uk/IPI/IPIhelp.html) using Mascot
(Matrix Science) on our in-house server. The most prominently
identified peptides were then used to recalibrate the data, and the
search was repeated to yield the initial list of identified proteins.
All protein entries that were identified with at least three
high-scoring peptide-query matches (individual Mascot
scores above 32) and where the peptides were ranked as the top
candidates were accepted as identified. All others were inspected
manually as described in Results. In cases of ambiguity, the
corresponding fragmentation spectrum was opened in
Inspector (MDS-Proteomics) and manually interpreted to
yield a peptide sequence tag (Mann and Wilm 1994
), which was then
searched against the IPI database using PepSea
(MDS-Proteomics). The following proteins were regarded as contaminants
on the basis of their occurrence in a blank purification (no
biotinylated pre-mRNA added; data not shown): Von Ebner's gland
protein (SWISS-PROT: P31025); Lysozyme C precursor (SWISS-PROT:
P00695); dermcidin (SWISS-PROT: P81605); NY-REN-6 antigen
(ENSP00000255069); trypsin (XP_094996); keratin 1 (SWISS-PROT: P04264);
similar to keratin 1 (ENSP00000301445); keratin 2a (SWISS-PROT:
P35508); similar to keratin 2a (ENSP00000252247); keratin 5 (ENSP00000252242); keratin, type II cytoskeletal 6F (SWISS-PROT:
P48669); keratin 9 (ENSP00000246662); similar to keratin, type I
cytoskeletal 10 (SWISS-PROT: P13645); keratin 10 (TREMBL: Q14664);
keratin 14 (SWISS-PROT: P02533); keratin 16 (ENSP00000301653). Also,
Huntington-interacting protein HYPA/FBP11 (ENSP00000288690) was
considered to be a contaminant, because together with NY-REN-6 antigen
it is a fragment of formin-binding protein 3 (NP_061255). Other
proteins identified here were also classified as contaminants on the
following basis: S100 calcium-binding protein A7 (SWISS-PROT: P31151)
based on its high expression in keratinocytes (Rasmussen et al. 1992
)
and the two hypothetical proteins ENSP00000295258 and ENSP00000271816
based on their domain structure, which is very similar to
calcium-binding protein A7.
The PAI is here defined as the number of sequenced peptides (fragmentation spectra assigned with significant score and as the top match to an individual identified protein) divided by the number of its calculated, observable peptides. Readily observable tryptic peptides were taken to be those in the mass range 800 to 2400 D. Fragmentation spectra matching the same peptide sequence but with different charge states, modification state, and containing missed cleavage sites were counted separately. For this reason, the index can be >1. The index is an expression describing not only the abundance of the protein in the sample but also its response to the measurement procedure. The latter is a complicated function of the efficiency of digestion, peptide solubility, extraction, ionization, and fragmentation for each protein and its peptides. In the future, more sophisticated versions of the PAI could take an increasing number of such factors into account.
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://srs.embl-heidelberg.de:8000/;access to SWISSPROT.
http://www.ebi.ac.uk/IPI/IPIhelp.html; International Protein Index (IPI) database.
http://www.ensembl.org;access to ENSEMBL.
http://www.pil.sdu.dk; complete list of peptides.
| |
ACKNOWLEDGMENTS |
|---|
We thank our colleagues in the Protein Interaction Laboratory for fruitful discussions. Jens Andersen helped in devising the analysis strategy, and Leonard Foster developed scripts algorithms that we used here in data handling and especially in parsing the output of Mascot to retrieve the list of identified peptides. Carmen de Hoog is acknowledged for critical reading of the manuscript. Work in M.M.'s laboratory is supported by a generous fund of the Danish National Research Foundation to the Center of Experimental Bioinformatics. A.I.L. is a Wellcome Trust Principle Research Fellow and is funded by a Wellcome Trust Programme grant. J.R. is a Marie Curie Fellow.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL mann{at}bmb.sdu.dk; FAX 45 6593 3929.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.473902. Article published online before print in July 2002.
| |
REFERENCES |
|---|
|
|
|---|