|
|
|
|
Published online before print
October 25, 2006, 10.1101/gr.5052606 Genome Res. 16:1414-1421, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00 OPEN ACCESS ARTICLE
Letter A highly divergent gene cluster in honey bees encodes a novel silk familyCSIRO Entomology, Canberra ACT 2601, Australia
The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin14) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-rich amid low GC intergenic regions. The genes encode similar proteins that are highly helical and predicted to form unusually tight coiled coils. Despite the similarity in size, structure, and composition of the encoded proteins, the genes have low primary sequence identity. We propose that the four fiber genes have arisen from gene duplication events but have subsequently diverged significantly. The silk-associated genes encode proteins likely to act as a glue (AmelSA1) and involved in silk processing (AmelSA2). Although the silks of honey bees and silkmoths both originate in larval labial glands, the silk proteins are completely different in their primary, secondary, and tertiary structures as well as the genomic arrangement of the genes encoding them. This implies independent evolutionary origins for these functionally related proteins.
Many holometabolous insects produce a silken cocoon in the late larval stage for protection during pupal metamorphosis. The pupal cocoon of the domesticated silkmoth, Bombyx mori, is the best known and most extensively studied among insect silks; it is less widely known that honey bee larvae also construct silken cocoons within which they pupate. Successive layers of silk coat the brood cell walls, accumulating with each larval generation to comprise a significant proportion of the hive. As beeswax is a thermoplastic material which loses strength and stiffness with increasing temperature, it is thought that the accumulated silk gives the comb tensile strength and mechanical integrity (Hepburn and Kurstjens 1988
B. mori silk is a composite of a 390-kDa protein (H-fibroin) associated with two smaller proteins, the 25-kDa L-fibroin and the 25-kDa P25 fibroin (Zhou et al. 2000
Honey bee and B. mori silks are both produced from modified salivary glands known as labial glands. The labial glands of Hymenoptera and Lepidoptera are thought to be homologous, originating from the lateral ventral placodes of the labial ectoderm (Julien et al. 2004
X-ray diffraction patterns obtained from native silk fibers from honey bee larvae have an
Identification of novel silk proteins Honey bee silk genes are expressed in the final instar specifically in the labial (modified salivary) gland. We constructed a cDNA library from the labial gland of late final instar larvae and obtained sequence information from 82 randomly selected clones. These cDNAs clustered into 46 groups, with 38 clusters represented by a single cDNA. The most abundant cluster comprised 13% of the analyzed library and other clusters, represented by more than a single cDNA, varied in abundance from 2% to 11%. A summary of the most abundant cDNAs is shown in Table 1. A full listing of the 102 ESTs and proteins identified in the larval labial gland can be found in Supplemental Table I.
The silk proteins were identified by matching the mass spectrometry peptide fragmentation patterns obtained after tryptic digestion of honey bee silk with in silico predictions from three data sets: cDNAs obtained from the labial gland, all protein sequences predicted by the honey bee genome sequencing project, and a database of translations in the six possible reading frames of each contiguous genomic DNA sequence provided by the bee genome project (Amel_3.0 release). Eight sequences matching tryptic peptides from silk were identified. Six of these proteins (and no others) were identified in silk fiber after treatment to remove nonfibrous proteins. Sequences corresponding to these six silk proteins were identified in the honey bee genome data setsfive corresponded to the existing automated protein annotations GB15233, GB12184, GB12348, GB17818, and GB19585. The sixth sequence matched a protein encoded within a large open reading frame spanning Contig2271 and Contig2272. The peptide hits to Contig2271 and 2272 are shown in Supplemental Figure I. As expected for extracellular proteins, all six proteins contained classical 19-residue secretory signal peptides, identified by SignalP 3.0 (Bendtsen et al. 2004 The silk was not fully digested by the tryptic enzyme described above. Thus there was a possibility that remaining proteins in the undissolved silk were not identified. Since silk is produced by the labial gland, it was expected that the silk proteins would be abundant in gland extracts. This organ, completed with lumen contents, was dissected from final instar larvae, extracted with SDS, and proteins were separated by polyacrylamide gel electrophoresis (SDS-PAGE) (Fig. 1). Bands were excised from the electrophoresis gel and analyzed by in-gel trypsin digestion and mass spectrometry. Consistent with expectations, the major Coomassie blue-stained bands visible after SDS- PAGE were identical to the proteins found in the native silk. The mass spectral fragments obtained from single bands often matched more than one silk protein and all the silk proteins are identified in more than one band (Fig. 1). We suggest that this reflects partial polymerization of the silk proteins in the labial gland. Less intense gel electrophoresis bands were identified as typical "house-keeping" proteins that are abundant in most tissues (see Supplemental Table I for a full listing of proteins identified).
Five of the silk proteins correspond to the most abundant clones from the labial gland cDNA library (GB15233, GB12184, GB12348, GB17818, GB19585). The predicted TATA boxes for these genes are close to the initiator methionine (5557 nt) resulting in short untranslated regions. The energy cost of transcription has selected for short introns in highly expressed proteins (Castillo-Davis et al. 2002 Overall, we have identified six genes encoding silk proteins. Structural analysis of the encoded proteins (see below) indicates that these genes fall into three groups: (1) four small genes corresponding to BeeBase annotations GB17818, GB19585, GB12184, and GB12348 encoding fibrous proteins of 3034 kDa that we have termed AmelFibroin14 (Apis mellifera fibroin), respectively; (2) a small gene corresponding to GB15233 that we have termed AmelSA1 (Apis mellifera silk associated) that encodes a possible glue of 42 kDa; and (3) a large gene spanning Contig2271 and 2272 that we have named AmelSA2 (Apis mellifera silk-associated) encoding a high molecular weight (500 kDa) protein whose role in the silk, if any, is currently unclear.
Four of the honey bee silk genes are encoded in a high GC gene cluster with a similar, nonsilk gene
Immediately upstream (1487 nt) of the silk gene cluster is another, slightly longer gene (1737 nt), with a somewhat elevated GC content (45%) and a short CpG island (Fig. 2). This gene is identified by a single cDNA in the labial gland library, and the protein (GB12085) is present at low levels in gland extracts (Supplemental Table I) but is not found in the cocoon silk. The protein has structural similarities to the AmelFibroin silk fiber proteins (see structural analysis below). We have named this open reading frame AmelFibroin-rel (and the protein AFrel, Amel Fibroin-related) due to the probability that this gene has arisen through gene duplication events from a common ancestor shared with the AmelFibroins.
The high GC content corresponding to the open reading frames of this cluster is partially the result of the abundance (29%33%) of the GCX codon, which encodes alanine. However, the alanine content of the gene product is not sufficient to completely account for the high GC content. Previously, genomic regions of GC bias have been associated with highly expressed genes in the human and Drosophila genomes (Versteeg et al. 2003 The high alanine content also contributes to a strong nucleotide bias with a high use of G (50%57%) and a low use of C (9%11%) at the first codon position and a high use of C (43%46%) and a low use of G (8%12%) in the second position. The same trends are found in AmelFibroin-rel although the bias is less (40% G and 11% C in first and 31% C and 9% G in second codon position). Despite the strong nucleotide bias in the first and second codon position, there is no GC bias in the third position, and this is reflected in an absence of strong bias in codon usage in the AmelFibroin genes.
The structure of the honey bee silk proteins is a novel coiled coil
The secondary structure algorithms PROFsec (Rost and Sander 1993
The amino acid composition of the AmelFibroin protein heptad repeats is quite distinctive compared with classical coiled coils. The silk proteins are particularly unusual in their high alanine content at positions a and d (between 43% and 71% alanine occupancy at either position, Table 2) and in the frequency of heptads with alanine occupancy in both a and d positions (27%38% of heptads). In naturally occurring coiled coils, alanine is among the least favored hydrophobic residue in these positions (Woolfson 2005
The high level of alanine in the core of the honey bee silk coiled coil regions may be expected to result in close spacing of the helices and the formation of very tight coiled coils. The X-ray diffraction pattern from honey bee silk fibrils was fitted to a tetrameric coiled coil model with an unprecedentedly short major helix radius R0 = 5.2 Å (Atkins 1967
The AmelFibroin-rel product, AFrel, was also predicted to be 72%
Generally positions other than a and d in the silk fiber protein heptads are populated by charged and polar residues, as expected for coiled coils. However, alanine is also quite abundant at these positions in the silk proteins (see Supplemental Table II). An increase in hydrophobicity outside positions a and d, in particular in positions e and g, is indicative of multistranded coiled coils, where more surface area per helix is buried and positions e and g contribute to core hydrophobic stability (Krammerer 1997
Divergence of honey bee silk genes The characteristics shared by the AmelFibroin genes (genetic location, length, nucleotide usage, and the proteins amino acid composition, secondary and tertiary structure) suggest that the genes are paralogs. Despite the similarity in genetic and protein characteristics, the genes have low primary sequence similarity and encode proteins with low primary sequence similarity (Table 3). It was difficult to obtain convincing alignments using the conventional alignment algorithms. However, we were able to utilize the proteins secondary structure predictions to manually align the genes as described in the Methods section. The protein sequence alignment (translated from the nucleotide alignment) is shown in Figure 4. The best estimate of phylogenetic relatedness between the four AmelFibroin genes is shown graphically in Figure 5.
It is likely that AmelFibroin-rel shares a common ancestor with the AmelFibroin genes due to its genomic colocation, genetic structure, and very similar protein amino acid composition and heptad substructure. The inability to align AmelFibroin-rel to the AmelFibroin genes suggests that AmelFibroin-rel is the most distantly related member of the gene family. Although it is found in the labial gland, AFrel is not found in the silk. Two ESTs from AmelFibroin-rel have been isolated from the honey bee brain (BB170026B10F06 and BB170026A20D06 from the adult bee brain library, BB17), an organ that does not produce silk. Coiled coils can form large regular structures (as proposed for the honey bee silks) or can mediate more dynamic interactions as transcriptional factors, receptors, and signaling molecules. The differential ex pression of AmelFibroin-rel suggests that AFrel may be playing a regulatory rather than structural role in silk production, consistent with its divergent sequence. Although AFrel is predicted to contain a 22-residue signal peptide, the k-NN algorithm suggests the protein is more likely associated with the mitochondria or targeted to the nucleus than secreted extracellularly.
One silk protein behaves like a glue but is unrelated to known sericins
The mature silk is composed of fibrous threads that are glued together to form sheets, so we expect to find at least one silk protein that acts to glue the coiled coils together. The predicted structure of AmelSA1 is mainly amorphous (52%), and, as the
The genetic organization of AmelSA1 is much simpler than the B. mori sericin genes. The silkworm sericins are encoded by two genes composed of multiple exons that are differentially spliced to generate proteins with different characteristics (Garel et al. 1997 The sericins are so named because their most abundant residue is the nonessential amino acid serine. In contrast, the most abundant amino acids found in AmelSA1 are the essential amino acids leucine (17%) and lysine (17%). Use of essential amino acids above what is obtained in the diet in nonrecycled proteins involves high metabolic cost, so the high levels of essential residues in the apparently abundant AmelSA1 suggest that these residues are particularly functionally important. The silk fiber proteins incorporate 11%16% acidic residues in the noncore positions of their heptads, so lysine residues in AmelSA1 may be involved in electrostatic interactions with the exposed surfaces of the coiled coils. Leucine is a bulky, hydrophobic amino acid, so the role of these residues may be to facilitate interactions of the silk sheets with the wax environment within the beehive.
AmelSA2
PSI-BLAST analysis of the protein sequence identified a similarity to the protein Nestin (highest match to a Nestin fragment, EMBL:AF110498 with a BLAST expectation value of 7e 94 over 355 residues, using default settings in Predict Protein). Nestin is a type VI intracellular intermediate filament (IF) protein that partially coassembles with other IF components to form heterodimer coiled coils leaving a long tail composed of highly charged peptide repeats in solution (Steinert et al. 1999
Conclusions
Tissue and silk preparation A. mellifera larvae and brood comb were obtained from domestic hives. The labial gland was dissected from late fifth instar A. mellifera immersed in phosphate buffered saline. The posterior end of the dissected gland was immediately transferred to RNAlater (Ambion) and stored at 4°C. The anterior end of the gland, the lumen of which contained silk proteins, was placed in LDS sample loading buffer (Invitrogen) including reductant (10% NuPage reducing agent, Invitrogen) and stored at 20°C for subsequent protein analysis.
Brood comb was washed extensively in warm water to remove water soluble contaminants and then extensively in three washes of chloroform to remove wax, producing raw silk. Further washing, analogous to treatment to remove sericins from B. mori silk cocoons, involved boiling the silk in aqueous 0.05% Na2CO3 for 30 min (Yamada et al. 2001
cDNA library construction
Silk and labial gland protein analysis
Alignment of the AmelFibroin sequences Initially AmelFibroin3 and AmelFibroin4 were aligned as MARCOIL gave only one heptad prediction for the proteins encoded by these sequences. AmelFibroin2 was then brought into the alignment, followed by AmelFibroin1, choosing the highest scoring alignment corresponding to any of the three subset heptad translations predictions in those sequences (Supplemental Table III). AmelFibroin-rel could not be aligned using this method.
We thank the Baylor College of Medicine Human Genome Sequencing Center for making the Apis mellifera and Tribolium castaneum gene sequences publicly available before publication. We acknowledge the financial support of the Grains Research and Development Corporation. We also thank Stephen Trowell for advice on the manuscript, Dennis Anderson for the supply of bees and fascinating discussions on their biology, and John Trueman for invaluable advice and help with alignment analysis.
1 Corresponding author.
E-mail tara.sutherland{at}csiro.au; fax +61 2 6246 4000. [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to the honey bee BeeBase Official_Gene_Set_1 under accession nos. GB17818, GB19585, GB12184, GB12348, and GB15233.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5052606.
Akai, H., Imai, T., and Tsubouchi, K. 1987. Fine-structural changes of liquid silk in silk gland during the spinning stage of Bombyx larvae. J. Seric. Sci. Jpn. 56: 131137. Atkins, E.D.T. 1967. A four-strand coiled coil model for some insect fibrous proteins. J. Mol. Biol. 24: 139141.[CrossRef] Bendtsen, J.D., Nielsen, H., von Heijne, G., and Brunak, S. 2004. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340: 783795.[CrossRef][Medline] Castillo-Davis, C.I., Mekhedow, S.L., Hartl, D.L., Koonin, E.V., and Kondrashow, F.A. 2002. Selection for short introns in highly expressed genes. Nat. Genet. 31: 415418.[Medline] Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., and Thompson, J.D. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31: 34973500. Compton, S.J. and Jones, C.G. 1985. Mechanism of dye response and interference in the Bradford protein assay. Anal. Biochem. 151: 369374.[CrossRef][Medline] Delorenzi, M. and Speed, T. 2002. An HMM model for coiled coil domains and a comparison with PSSM-based predictions. Bioinformatics 18: 617625. Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15: 330340. Edgar, R.C. 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 17921797. Garel, A., Deleage, G., and Prudhomme, J.C. 1997. Structure and organization of the Bombyx mori Sericin 1 gene and of the Sericins 1 deduced from the sequence of the Ser 1B cDNA. Insect Biochem. Mol. Biol. 27: 469477.[CrossRef][Medline] Hepburn, H.R. and Kurstjens, S.P. 1988. The combs of honeybees as composite materials. Apidologie. 19: 2536. Jay, S.C. 1964. The cocoon of the honey bee, Apis mellifera L. Can. Entomol. 96: 784792. Julien, E., Coulon-Bublex, M., Garel, A., Royer, C., Chavancy, G., Prudhomme, J.C., and Couble, P. 2004. Silk gland development and regulation of silk protein genes. In Comprehensive molecular insect science (ed. L. Gilbert). Vol. Vol. 2, pp. 369384. Pergamon Press, Oxford. Kneller, D.G., Cohen, F.E., and Langridge, R. 1990. Improvements in protein secondary structure prediction by an enhanced neural network. J. Mol. Biol. 214: 171182.[CrossRef][Medline] Krammerer, R.A. 1997. Kyte, J. and Doolittle, R.F. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105132.[CrossRef][Medline] Lee, C., Grasso, C., and Sharlow, M. 2002. Multiple sequence alignment using partial order graphs. Bioinformatic 18: 452464. Liu, J. and Lu, M. 2002. An alanine-zipper structure determined by long range intermolecular interactions. J. Biol. Chem. 277: 4870848713. Lucas, F. and Rudall, K.M. 1968. Extracellular fibrous proteins: The silks. In Comprehensive biochemistry (eds. M. Florkin and E.H. Stotz). Vol. Vol. 26B, pp. 475558. Elsevier, Amsterdam. Lupas, A.N. and Gruber, M. 2005. The structure of McClelland, J.L. and Rumelhart, D.E. 1988. Explorations in parallel distributed processing. MIT Press, Cambridge, MA. Notredame, C., Higgins, D., and Heringa, J. 2000. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Biol. 302: 205217.[CrossRef][Medline] Nunes, F.M.F., Valente, V., Sousa, J.F., Cunha, M.A.V., Pinheiro, D.G., Maia, R.M., Araujo, D.D., Costa, M.C.R., Martins, W.K., and Carvalho, A.F., et al. 2004. The use of open reading frame ESTs (ORESTES) for analysis of the honey bee transcriptome. BMC Genomic 5: 84. Rost, B. and Sander, C. 1993. Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232: 584599.[CrossRef][Medline] Rudall, K.M. 1962. Silk and other cocoon proteins. In Comparative biochemistry (eds. M. Florkin and H.S. Mason), pp. 397433. Academic Press, New York. Silva-Zacarin, E.C., Silva de Moraes, R.L., and Taboga, S.R. 2003. Silk formation mechanisms in the larval salivary glands of Apis mellifera . J. Biosci. 28: 753764.[Medline] Steinert, P.M., Chou, Y.H., Prahlad, V., Parry, D.A., Marekov, L.N., Wu, K.C., Jang, S.I., and Goldman, R.D. 1999. A high molecular weight intermediate filament-associated protein in BHK-21 cells is nestin, a type VI intermediate filament protein. Limited co-assembly in vitro to form heteropolymers with type III vimentin and type IV Stenoien, H.K. and Stephan, W. 2005. Codon mRNA stability is not associated with levels of gene expression in Drosophila melanogaster but shows a negative correlation with codon bias. J. Mol. Evol. 61: 306314.[CrossRef][Medline] Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., and Morgenstern, B. 2005. DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. Bioinformatic 6: 66. Swofford, D.L. 2002. PAUP*. In Phylogenetic analysis using parsimony (*and other method). Sinauer Associates, Sunderland, Massachusetts. Takahashi, Y., Gehoh, M., and Yuzuriha, K. 1999. Structure refinement and diffuse streak scattering of silk (Bombyx mori). Int. J. Biol. Macromol. 24: 127138.[CrossRef][Medline] Versteeg, R., van Schaik, B.D.C., van Batenburg, M.F., Roos, M., Monajemi, R., Caron, H., Harmen, J., Bussemaker, H.J., and van Kampen, A.H.C. 2003. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 13: 19982004. Wheeler, W.C., Gladstein, D.S., and De Laet, J. 2003. POY, Phylogeny reconstruction via direct optimization of DNA and other data. Version 3.0 http://research.amnh.org/scicomp/projects/poy.php. Woolfson, D.N. 2005. The design of coiled coil structures and assemblies. In Fibrous proteins: Coiled coils, collagen and elastomers (eds. D.A.D. Parry and J.M. Squire), pp. 79112. Elsevier Academic Press, San Diego, California. Yamada, H., Nakao, H., Takasu, Y., and Tsubouchi, K. 2001. Preparation of undegraded native molecular fibroin solution from silkworm cocoons. Mater. Sci. Eng. C. 14: 4146. Zhou, C.Z., Confalonieri, F., Medina, N., Zivanovic, Y., Esnault, C., Yang, T., Jacquet, M., Janin, J., Duguet, M., and Perasso, R. 2000. Fine organization of Bombyx mori fibroin heavy chain gene. Nucleic Acids Res. 28: 24132419.
Received December 13, 2005; accepted in revised format March 23, 2006. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||