|
|
|
|
Vol. 9, Issue 11, 1026-1039, November 1999 The Family of Caenorhabditis elegans Tyrosine Kinase Receptors: Similarities and Differences with Mammalian Receptors1 Laboratoire d'Oncologie Moléculaire, U.119 Institut National de la Santé et de la Recherche Médicale (INSERM), Marseille, France; 2 Institut Paoli-Calmettes, Marseille, France; 3 Laboratoire de Plasticité et Evolution du Génome, U.119 INSERM, Marseille, France
Transmembrane receptors with tyrosine kinase activity (RTK) constitute a superfamily of proteins present in all metazoans that is associated with the control and regulation of cellular processes. They have been the focus of numerous studies and are a good subject for comparative analyses of multigene families in different species aimed at understanding metazoan evolution. The sequence of the genome of the nematode worm Caenorhabditis elegans is available. This offers a good opportunity to study the superfamily of nematode RTKs in its entirety and to compare it with its mammalian counterpart. We show that the C. elegans RTKs constitute various groups with different phylogenetic relationships with mammalian RTKs. A group of four RTKs show structural similarity with the three mammalian receptors for the vascular endothelial growth factors. Another group comprises RTKs with a short extracellular region, a feature not known in mammals; the genes encoding these RTKs are clustered on chromosome II with other gene families, including genes encoding chitinase-like proteins. Most of the C. elegans RTKs have no direct orthologous relationship with any mammalian RTK, providing an illustration of the importance of the separate evolution of the different phyla. [The sequences in this paper have been submitted to GenBank under the following accession numbers: AF188748, AF188749, AF188750, and AF188751.]
Understanding the processes that have governed metazoan
genome evolution is an important issue of modern
biology that is largely nourished by the wealth of data generated by
genome projects devoted to the analysis of selected key species.
Comparison of the data obtained in several species has so far led to
the general belief that the number of genes per genome was doubled
after the separation of Protostomia and Deuterostomia and multiplied by
three or four times in the chordate phylum after the separation from
echinoderms. Such an increase in gene number could be due to
tetraploidization events, as first proposed by Ohno (1970) Investigating various species of triploblastic metazoans will provide
information about their common or specific evolution. In particular,
the availability of the nematode worm Caenorhabditis elegans
genome sequence allows for the first time the closest possible look at
a present-day metazoan genome and putative transcriptome and proteome
(The C. elegans Sequencing Consortium 1998 We are particularly interested in examining multigene families in
C. elegans and comparing them with the mammalian homologs. In
this paper we have studied the tyrosine kinase receptors (RTKs). RTKs
are type I transmembrane proteins that are involved in the control and
regulation of several key cellular processes during development and in
adult life. They are constituted of an extracellular ligand-binding
region of variable size made of various domains, a transmembrane
hydrophobic domain and an intracellular region bearing a kinase domain
(TK), which is sometimes split in two parts (TK1 and TK2). Based on
their structure and ligand-binding specificities, several classes of
RTKs have been distinguished in mammals. In three classes, the
extracellular region is made solely of immunoglobulin-like domains. To
date, two groups of RTKs have been described in C. elegans:
orthologs of RTKs known in mammals (e.g., DAF-2, EGL-15, KIN-8, LET-23,
and VAB-1) (Table 1) and RTKs identified only in
C. elegans (e.g., KIN-15 and KIN-16) (Morgan and Greenwald
1993
RTKs are an excellent model for studying evolution of cellular
processes in metazoans: First, they represent large families of modular
proteins; Second, they are major representatives of the intercellular
regulatory pathways that are specific of metazoans and Third, they
belong to the most widespread contingency-generating agents of
eukaryotic cells that are the protein kinases (Kirschner and Gerhart
1998 We describe here the organization of 21 new C. elegans genes that encode putative RTKs phylogenetically related to various mammalian RTK classes.
Identification of RTK Genes by Exploration and Annotation of the C. elegans Sequence The availability of the complete genomic sequence of the nematode C. elegans allowed the identification of new putative RTK genes following a keyword search on the Wormpep database and a TBLASTN search on the genomic sequence with either the entire sequence or the tyrosine kinase domain sequence of receptors belonging to 13 classes of RTK described in mammals (see Table 1). The protein sequences indexed in the databases as containing a tyrosine kinase domain were checked for the presence of this domain by SMART search. Only the sequences with a canonical tyrosine kinase domain (and not with a serine/threonine type or of undetermined specificity) were used. Prior to further analyses, we made some annotations with respect to the available sequences (see Methods). Figures 1 and 2 show the distribution in the C. elegans genome of the genes cited in the text. A total of 28 protein sequences, including the corrected T17A3.1, T17A3.8, W04G5.6A, W04G5.6B, and R09D1.12 sequences (see Methods), were thus available for subsequent analysis (Tables 1 and 2).
Phylogenetic Analysis of RTKs in C. elegans A phylogenetic tree based on the alignment of the tyrosine kinase
domain of RTKs was constructed; the sequences used were those
identified by the C. elegans Sequencing Consortium (1998) As seen in Figure 3, DAF-2, EGL-15, LET-23, and VAB-1 had the expected phylogenetic relationship with their known mammalian orthologs: the insulin and IGF1 receptors, the fibroblast growth factor receptors, the receptors of the EGF receptor/ERBB family, and the ephrin receptors, respectively (Table 1). KIN-8 and the F11D5.3 putative protein shared a common ancestor with the mammalian ROR, TRK, MUSK, and discoidin receptor families. F11E6.8 shared a common ancestor with the MET/RON receptor family. The mammalian RTKs from classes III, IV, and V (Ig RTKs) grouped together and with class IX and X RTKs.
Twenty-one known or predicted C. elegans RTKs did not share a
recent common ancestor with mammalian RTKs, and some of them formed
distinct phylogenetic families (Fig. 3, shaded boxes). A grouping of 13 RTKs (designated as LERF and SERF in Fig. 3; described below) was
supported by a bootstrap value of >900/1000. Among these 13 RTKs
were the KIN-15 and KIN-16 kinases (Morgan and Greenwald 1993 Structural Features and Organization of C. elegans RTKs Based on structural features, the 28 RTKs encoded by these known or predicted sequences could be divided into two groups. In the first group, we classified the seven known or potential orthologs of mammalian RTKs (DAF-2, EGL-15, LET-23, VAB-1, KIN-8, F11D5.3, and F11E6.8). Based on a similar domain architecture and the phylogenetic analysis, F11D5.3 (Table 1) may be considered as the potential ortholog of collagen receptors DDR and TKT, RTKs with factor V-factor VIII domains in their extracellular region, also called discoidin-type RTKs. The sequence of the tyrosine kinase domain of F11E6.8 (Table 1) is very similar to that of the tyrosine kinase domain of class VI RTKs; however, the predicted protein is very short, with an extracellular region of only 40 amino acids lacking a signal peptide. Search in the genomic sequence did not reveal any similarity with the extracellular region of MET family of RTKs. We considered this putative protein as a wrong GENEFINDER prediction and a potential ortholog of class VI RTKs. No significant match and no modular architecture corresponding to RTKs from the other classes were found (Table 1). Twenty-one predicted or characterized proteins with no extensive
similarity to a particular class of mammalian RTKs constitute the
second group. When their sequence and domain composition were compared,
three different types could be identified (Table 2; Fig. 3).
C. elegans Putative RTKs with LERF Resemble Mammalian VEGFRs The sequences of the extracellular region of the proteins with the
LERF were aligned with the different classes of RTK containing five or
seven immunoglobulin-like domains. The highest percentage of identity
was obtained in the alignment with the extracellular regions of class V
RTKs. These Ig RTKs are the three VEGFRs (Mustonen and Alitalo 1995
RTKs are characterized by the presence of a highly hydrophobic
transmembrane domain made of an Genomic Distribution of the Genes Encoding the New Predicted RTKs The four genes encoding the LERF RTKs are located in tandem on either chromosome III or X (Fig. 2B, D). T17A3.1 and T17A3.8, the two genes from chromosome III (Fig. 2B), are oriented in an opposite direction, 6.5 kb apart. On chromosome X, F59F3.1 and F59F3.5 are in the same transcriptional orientation; no putative gene has been evidenced in the 4.5-kb region separating these genes by GENEFINDER (Fig. 2D). The genes encoding the SERF RTKs are mostly found on chromosome II in two clusters separated by 0.5 Mb (Fig. 2A). One cluster, mapping at the approximate genetic position 0.99 and covering the M176 and R09D1 cosmids, includes the kin-15, kin-16, R09D1.13, and R09D1.12 genes, whereas the second cluster, at genetic map position 1.47 and covering the ZK938 and C08H9 cosmids, includes the ZK938.5, C08H9.5, and C08H9.8 genes. Three genes of this type are not located on chromosome II: W04G5.6A and W04G5.6B, located on chromosome I (Fig. 1), and M01B2.1, located on chromosome V (Fig. 2C). A major feature of the RTK genes located in two clusters on chromosome II is their tandem organization. In the first cluster, the kin-15 and kin-16 coding sequences are separated by 529 bp and are transcribed unidirectionaly (Fig. 2A). Another pair of genes, R09D1.13 and R09D1.12, are also transcribed in the same direction but are 7.8 kb apart. In the second cluster, the ZK938.5 and C08H9.8 RTK genes are also organized in tandem. Similarly, but on chromosome I, the W04G5.6A and W04G5.6B genes are in the same transcriptional orientation and separated by only 0.7 kb. Most of the C. elegans RTK Sequences Are Transcribed To determine roughly how many of the predicted RTK genes correspond to expressed genes, we looked for the presence of ESTs in the DNA Data Bank of Japan. Seventeen genes have corresponding ESTs or complete cDNAs in this database (Table 2). We looked by means of reverse transcription-polymerase chain reaction (RT-PCR) for expression of the RTK genes of the short and long extracellular region families that do not have a corresponding EST or cDNA in the database. In addition to F59F3.1, for which ESTs exist in the database, we found that two other LERF genes (F59F3.5 and T17A3.1) and one SERF gene (R09D1.12) are expressed. Thus, a minimum of 20 RTK-encoding genes from a total of 28 putative genes have transcribed RNA. Other Multigenic Families Are Located on Chromosome II The two clusters on chromosome II containing genes predicted to
encode RTKs also contain a large number of genes encoding chitinases
(on cosmids M176, R09D1, and T19H5 The most prominent example for repetitive duplications of transcribed
sequences in this region is the family of chitinases. Among the 40 chitinase genes or predicted chitinase-like genes (Table
3), 25 are located on chromosome II along with SERF
genes. Three other chitinase-like genes, T10D4.3, F15A4.8, and T13H5.3, map on chromosome II but far away from the tyrosine
kinase-chitinase-like clusters, at position
Based on both amino acid sequence differences and tridimensional
structure (Henrissat 1990
Identification of Novel RTKs in C. elegans A combined search of the C. elegans genome conducted along key words and through alignment (BLAST) allowed the identification of 28 putative protein-coding sequences with similarity to RTKs. Seven of these sequences were known RTKs (DAF-2, EGL-15, KIN-8, KIN-15, KIN-16, LET-23, and VAB-1) and 21 were new putative RTKs. Among the 28 RTKs, a first group of seven proteins is represented by the C. elegans orthologs of mammalian RTKs (Table 1); two protein predictions (F11D5.3 and F11E6.8) identified in this study represent candidate orthologs of mammalian discoidin receptors and receptors of the MET family, respectively. Among a second group of 21 putative proteins, eight are very
heterogeneous in their extracellular region; they lack similarity with
any extracellular domain described to date or contain domains never
described in mammalian RTKs, such as an LDL-a domain (Table 2). A
subgroup of 13 putative RTKs was subdivided into two types according to
the structure of their non-tyrosine kinase portion: Nine molecules
containing a SERF were of the first type, and four putative RTKs
containing a LERF were of the second type. Together they constitute a
separate subgroup that does not share a recent common ancestor with any
particular mammalian RTK class. It cannot be excluded that this is due
to a rapid evolution of the genes, as is thought to be the case for
two-thirds of protein-coding genes in C. elegans (Mushegian et
al. 1998 SERF RTKs have only 20-80 amino acids in their extracellular region.
The previously described (Morgan and Greenwald 1993 None of the putative LERF proteins has been described in C. elegans. They resemble the three mammalian VEGFRs. Like the VEGFRs, they contain an extracellular region composed of seven type C Ig domains (the fourth being a pseudo domain because it lacks the typical cysteine residues), a transmembrane domain, and an intracellular region composed of two tyrosine kinase subdomains separated by a kinase insert. From the phylogenetic tree built on the alignment of the tyrosine kinase domains, it seems that the two RTK types (LERF and SERF) described here have evolved very closely (Fig. 3). The loss of most of the extracellular domain from a seven immunoglobulin-like ancestor may have been the mechanism of apparition of the SERF RTKs. Further local duplications in cis or trans would have allowed the growth of this family up to nine members. The limited length of the branches and a high bootstrap value found in the phylogenetic analysis suggest that some of these duplications could be recent (C08H9.5 and ZK938.5 genes; see Fig. 3). RTKs Resembling Mammalian VEGFRs Exist in C. elegans Up to now, VEGFRs have only been described in vertebrates. Such molecules have not yet been recognized in Mollusca and Arthropoda; however, a genome from these phyla has not yet been completely sequenced. In mammals, the three VEGFRs described to date participate in the
development and architectural organization of both the circulatory and
blood systems during embryogenesis and are involved in the regulation
of cell permeability (Mustonen and Alitalo 1995 Possible Scheme of Evolution of RTKs in the C. elegans Lineage Because of the remarkable conservation of domain architecture
between C. elegans LERF RTKs and mammalian VEGFRs, convergent evolution may not be the cause of C. elegans VEGFR-like
existence, although it cannot be ruled out. Based on the phylogenetic
analysis and chromosome localization, we propose that the four LERF
RTKs derive from a common ancestor through a series of cis,
then trans and finally, cis duplications. They do not
group with the mammalian VEGFRs. This is in favor of an independent
expansion of these families after the separation from the last common
ancestor of Protostomia and Deuterostomia. The egl-15 gene,
encoding an FGFR, is, like F59F3.1 and F59F3.5, located on an
overlapping cosmid on chromosome X, at position 15.79 (Fig. 2D). This
suggests that the original linkage group that contained duplicated Ig
RTK genes may have been maintained in the nematode lineage, whereas it
has been disrupted in mammals where genes encoding class IV, on the one
hand, and class III and VI RTKs, on the other hand, are separated (Rosnet and Birnbaum 1993 The nine genes coding for the SERF RTKs are clustered on chromosomes I, II (in close proximity of chitinase-like genes), and V. From the relationships of these genes and their genome distribution, we propose a scenario of evolution implying large-scale duplications and local duplications. We hypothesize that a couple of tyrosine kinase-chitinase genes were represented in an "ancestral" linkage group that represented the precursor of a tyrosine kinase-chitinase-like cluster located on chromosome II. A second precursor cluster was generated in the close proximity of the first. Successive local duplications in cis allowed the expansion of tyrosine kinase and chitinase-like genes. This separate evolution of each cluster on chromosome II by the means of an earlier duplication of an ancestral linkage group followed by local amplification of genes is suggested by the topology of the trees of chitinase-like and tyrosine kinase genes. The phylogenetic analysis of the chitinases-like protein sequences revealed that the molecules located on each of the two chromosome II clusters form a separate branch and that both branches are grouped separately from the other chitinase genes of different species. Among the chitinase-like genes located on the other chromosomes, only M01B2.6 (on the same cosmid as MO1B2.1 SERF RTK gene) branches with the genes of one chromosome II cluster supporting the hypothesis of a partial trans duplication. For chitinase-like proteins as for kinases, some duplication events
could be recent, suggesting that the concerted expansion process may be
ongoing. The presence of genes encoding olfactory receptors in the same
clusters is interesting. Robertson (1998) Chitinases are glycosyl hydrolases that catalyze the degradation of chitin, a polymer of N-acetylglucosamine. In plants and insects, they are an important inductive host defense mechanism against fungi. It could be that chitinase-like proteins from C. elegans are also implicated in defense against parasites. In this case, the expansion of the chitinase family would be dependent on a positive selection mechanism, and passive "hitchhiking" by this mechanism may explain the combined expansion of RTK genes. However, some predictions for chitinase-like sequences yield proteins that lack the chitin-binding domain or the catalytic domain, and, at present, their role remains unknown. A large region containing multiple copies of RcC9, RcD1, Rc35, and
Rc123 DNA repetitive elements is located between the two clusters of
tyrosine kinase-chitinase genes from chromosome II. These elements,
showing an internal sequence organization resembling that of
minisatellite sequences described in mammals, have a tendency to
cluster at some positions on the chromosomes of C. elegans (Naclerio et al. 1992 Conclusions The identification and comparison of orthologs of human proteins in the nematode may yield interesting clues as to the respective functions of proteins in the two species. Furthermore, the particular organization of genes in C. elegans, such as kinase and chitinase genes, may provide clues as to the mechanisms that have driven their typical, rapid evolution by duplication and, thus, mechanisms that influence evolution. These may be associated with positive selection as the resulting proteins could be involved in defense mechanisms or with instability of a genomic region due to repetitive elements. Both mechanisms may have worked together because evolutionary pressure may have favored expansion over contraction of the region. In metazoan evolution, each lineage, having evolved independently over
long periods of time, shows some extent of specificity with regard to
genome organization (Ruddle 1997
Database Searches To identify putative RTKs in C. elegans, two approaches were used. First, a search in the last release (no. 16) of Wormpep database using different combinations of the key words =elegans=, =tyrosine=, =kinase=, and =receptor= was done by means of the Entrez search tool (http://www.ncbi.nlm.nih.gov/Entrez/). We thus identified the GENEFINDER predictions indexed as putative RTK or only as tyrosine kinases. Second, a C. elegans-specific TBLASTN search (http://www.sanger.ac.uk/Projects/C_elegans/blast_server.shtml) using tyrosine kinase domain sequences was used for the identification of some genomic regions coding for such a domain and missed by GENEFINDER. The sequences recovered by the two approaches larger than 400 amino acids were further analyzed for motif or domain composition using Pfam (http://www.sanger.ac.uk/Software/Pfam/) and SMART (http://coot.embl-heidelberg.de/SMART/) tools. This eliminated some serine/threonine kinases or undetermined proteins that were recovered from the initial searches. Then, to eliminate the potential nonreceptor type tyrosine kinases, a search in the PSORT II program (http://psort.nibb.ac.jp) was done. Two sequences (CO8H9.8 and T01G5.1) lacking a consensus ATP-binding site were discarded. For the C24G6.2 gene, there are two alternative predictions of putative transcripts, C24G6.2A (coding for a protein with a large extracellular region with fibronectin type III domains) and G24G6.2B (coding for a protein with a short extracellular region lacking any described domain). The information for cosmids, genes, and gene products (Wormpep database) of C. elegans is available through the ACeDB database (http://www.sanger.ac.uk/Projects/C_elegans/webace_front_end.shtml). Searches for expressed sequence were done in the DNA Data Bank of Japan maintained by Yuji Kohara (http://www.ddbj.nig.ac.jp/). Sequence Analysis and Alignment Sequence similarity searches were done using the BLASTP and TBLASTN
algorithms (Altschul et al. 1990 Annotations Annotation 1 The T17A3.1 predicted protein contains a truncated tyrosine kinase domain (with the TK1 domain only). A region presenting a significant similarity to both kinase insert (KI) and TK2 domain, referred to as F40G9.13 gene by GENEFINDER, was identified downstream of the putative stop codon of the predicted T17A3.1, in the neighboring F40G9 cosmid (see Fig. 2B). We considered T17A3.1 as an erroneous prediction and corrected the protein sequence by the addition of the missing KI and TK2 region identified on cosmid F40G9. A short region presenting a weak similarity with the extracellular region of F59F3.1 was also identified on cosmid F40G9 but in an opposite orientation to the T17A3.1 transcript. We considered this region lacking a GENEFINDER prediction as a pseudogene.Annotation 2 In the region of the T17A3.8 transcript, we identified two short sequences with high similarities to the one encoding the putative extracellular region of F59F3.1, which were not predicted by GENEFINDER. We corrected the deduced protein sequence of T17A3.8 by including these two sequences. We identified by BLAST search on the genomic sequences an exon not predicted by GENEFINDER coding for a region downstream of the ATP-binding site. On the same cosmid, the gene prediction T17A3.10, which is oriented in the same direction as T17A3.8 (see Fig. 2B), has a similarity with the region contained between amino acid positions 450 and 850 of putative RTKs F59F3.1 and F59F3.5. It was considered as a probable pseudogene because we were unable to identify upstream and downstream sequences with similarity to the remaining of an RTK molecule.Annotation 3 When we used the Pfam module recognition tool, we found that one of the genes, namely W04G5.6, contained sequences encoding two tyrosine kinase domains that, when analyzed by TBLASTN, happened to be very similar. We considered these two regions as belonging to two different transcripts transcribed in the same direction that were missed by the GENEFINDER prediction. They will hereafter be referred to as W04G5.6A and W04G5.6B (see Fig. 1).Annotation 4 Sequence analysis of the cDNA fragment amplified by RT-PCR using R09D1.12 primer pair showed that exon four of the R09D1.12 gene is 75 bp longer in its 3' end than predicted. The corrected protein sequence, by addition of 25 amino acids in the tyrosine kinase domain, was used for phylogenetic analysis.Phylogenetic Analyses Phylogenetic trees were inferred using neighbor-joining algorithms
(Saitou and Nei 1987 For the phylogenetic analysis of RTKs, we used only the tyrosine kinase domain of proteins; due to the length variability of the kinase insert region between different proteins, these were eliminated. The large gaps were also eliminated. The tyrosine kinase domain was defined as follows: For vertebrate tyrosine kinases, the sequences used were those defined by the Pfam program; and for C. elegans kinases, the domain was identified by SMART and aligned with the alignment existing for the vertebrate sequences in Pfam. This alignment can be found on our Web site (http://olan.marseille.inserm.fr/u119/home.html). The tree was rooted using the kinase domain of human ABL1, a cytoplasmic tyrosine kinase. The chitinase domains of well-characterized chitinases and of chitinase-like proteins from different species of mammals, insects, and worms were used for the phylogenetic analysis of chitinases-like proteins. A plant chitinase was used for rooting the tree. Expression Analyses RT reactions were performed from 2 µg of total RNA extracted
from mixed-staged worm preparations by lithium chloride precipitation (MacLeod et al. 1981 The following conditions were used for PCR amplification: initial denaturation at 95°C for 5 min, 35 cycles of 30-sec denaturation at 95°C, 30-sec annealing at 55°C (for R09D1.12 and T17A3.1) or 52°C (for F59F3.5), 1-min extension at 72°C, and a final extension at 72°C for 10 min. The fragments were gel-purified, cloned using pGEM-T Easy kit (Promega), and sequenced at Génome Express (Grenoble, France) using an automated sequencer (Applied Biosystems 373).
This work has been supported by INSERM and Institut PaoliCalmettes. We thank J. Ewbank, D. Maraninchi, C. Mawas, N. Pujol, and M.J. Santoni for helpful discussions and encouragement. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
The KIN-8 receptor tyrosine kinase has been renamed CAM-1 (Forrester et al. 1999. Nature 400: 881-885).
4 Corresponding author.
E-MAIL birnbaum{at}marseille.inserm.fr; FAX 33 4 91 26 03 64.
Received May 4, 1999; accepted in revised form August 17, 1999. 9:1026-1039 ©1999 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/99 $5.00 This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||