|
|
|
|
Vol. 11, Issue 6, 1126-1142, June 2001
RESOURCES
|
| |
ABSTRACT |
|---|
|
|
|---|
Recent genetic analyses in worms, flies, and mammals illustrate the importance of bioactive peptides in controlling numerous complex behaviors, such as feeding and circadian locomotion. To pursue a comprehensive genetic analysis of bioactive peptide signaling, we have scanned the recently completed Drosophila genome sequence for G protein-coupled receptors sensitive to bioactive peptides (peptide GPCRs). Here we describe 44 genes that represent the vast majority, and perhaps all, of the peptide GPCRs encoded in the fly genome. We also scanned for genes encoding potential ligands and describe 22 bioactive peptide precursors. At least 32 Drosophila peptide receptors appear to have evolved from common ancestors of 15 monophyletic vertebrate GPCR subgroups (e.g., the ancestral gastrin/cholecystokinin receptor). Six pairs of receptors are paralogs, representing recent gene duplications. Together, these findings shed light on the evolutionary history of peptide GPCRs, and they provide a template for physiological and genetic analyses of peptide signaling in Drosophila.
| |
INTRODUCTION |
|---|
|
|
|---|
The recent publication of entire genomes for the worm, the fly, and
human species has initiated the era of functional
genomic analysis. The experiences to date have
indicated that such analysis involves multiple stages, in which
improvements are recorded as the databases are completed and analytic
programs become more precise (Reese et al. 2000
), and as more
comparative information is made available (Sonnhammer et al. 1997
). G
protein-coupled receptors (GPCRs) provide sensitivity to a variety of
environmental, developmental, and physiological signals. They display a
uniform topology with seven transmembrane (TM) domains and represent
one of the largest recognizable groups of proteins (Bockaert and Pin 1999
). Here we have organized all genomic sequences that encode Drosophila GPCRs to identify and classify those devoted to
peptide hormone and neuropeptide ligands (peptide GPCRs).
Given the availability of the human and mouse genomic sequences, what
can we gain by a thorough analysis of Drosophila peptide GPCRs? We propose two reasons to motivate such efforts. First, our
understanding of GPCR signaling mechanisms appears incomplete. Recent
advances have indicated means by which GPCR signaling potential may be
increased, including receptor oligomerization and association with a
variety of accessory proteins (Bockaert and Pin 1999
), and receptor
translocation to the nucleus (Chen et al. 2000
). There is great need,
therefore, to address new hypotheses of GPCR signaling mechanisms in
vivo. For this purpose, it will be very helpful to use the powerful
tools for genetic analysis that are afforded by model organisms such as
Drosophila. The second reason we favor the pursuit of
Drosophila GPCRs invokes the success of genetic analysis in
another model genetic system, Caenorhabditis elegans. In the
past few years, rapid progress has been made in the analysis of insulin
signaling in the worm; C. elegans insulin regulates
metabolism, development, and longevity by mechanisms that are similar
to the endocrine regulation of metabolism and fertility by mammalian
insulin (Kimura et al. 1997
; Tissenbaum and Ruvkun 1998
). In addition,
the genetic analysis has extended our understanding of insulin
signaling by revealing novel molecular features that may be variant in
diabetic pedigrees (Ogg et al. 1997
; Ogg and Ruvkun 1998
). Although
insulin binds to a different class of receptors, it is likely that the
same rapid development of new information will accompany the genetic
analysis of peptide GPCRs in Drosophila.
In a recent review, Brody and Cravchik (2000)
began the process of
categorizing Drosophila GPCRs by describing ~100 genes, including 21 receptors for classical neurotransmitters and
neuromodulators (biogenic amines, related compounds, and purines) and
26-30 peptide receptor genes. We have extended that analysis by
re-searching the original genomic sequences for peptide receptors (we
found one additional GPCR) and by improving the annotations of 20 previously-predicted genes. We classified receptor genes according to
phylogenetic trees constructed with the aid of the Pfam 7 TM databases
(Bateman et al. 2000
). In addition, we refined the Drosophila
GPCR classifications by incorporating information we deduced by
examining gene organizations. Through this analysis, we expanded the
set of known and candidate peptide GPCRs from ~30 to ~45.
Finally, to gain a sense of the potential peptide ligands, we assembled
a list of 22 Drosophila genes known or predicted to encode
bioactive peptides that may activate these receptors. Together, these
results shed light on the evolutionary history of neuropeptide
signaling. They also are intended to aid in future efforts to analyze
peptide receptor function in development, physiology, and behavior by
using the power of Drosophila genetics.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
We searched the Drosophila melanogaster genome sequence
with the goal of identifying all peptide GPCRs. Initially, we scanned the gene annotations developed jointly by the Berkeley
Drosophila Genome Project (BDGP) and Celera Genomics for all
putative GPCRs (Adams et al. 2000
; Brody and Cravchik 2000
). Based on
BLASTP scores obtained with each sequence, we excluded
entries that are likely to represent nonpeptide GPCRs (neurexins, HE6-
and methuselah-related proteins, rhodopsins, developmental genes, taste
and odorant receptors, and receptors for biogenic amines and other
small neurotransmitters). The remaining set of 44 known or putative
peptide receptors and unclassified GPCRs was retained for further
analysis (Table 1).
|
To gauge the completeness of this set, we scanned the
Drosophila genome for additional, nonannotated peptide GPCRs
in three ways. First, we scanned a set of annotations of GPCR genes
obtained through a GENSCAN search of the entire
Drosophila genome (K. Scott and L. Vosshall, pers. comm.).
This list contained one receptor sequence, BG:BACR48G21.1
(BACR48G21.1), which had not been annotated previously. Second, we
performed BLASTP searches using a "GPCR query set,"
which included the previously cloned or annotated peptide, amine, and
related/unclassified Drosophila GPCRs as well as ~200
sequences representing a diverse set of Family A GPCRs (unless
indicated, we used the nomenclature for GPCR Families and Groups given
by Kolakowski [1994] [see http://www.gcrdb.uthscsa.edu/]). All of
the putative peptide receptors listed in Table 1 (including BACR48G21.1) were detected with several queries (for the vast majority,
more than 30 times). However, this analysis did not reveal any
additional candidate peptide GPCRs. Finally, we used the GPCR query set
to perform TBLASTN searches of the Celera/BDGP whole
genome shotgun sequence. As with the BLASTP survey, the
TBLASTN search yielded scaffold sequences corresponding to
all of the GPCRs on our list, but no additional candidate peptide
receptors. The whole genome shotgun sequence currently represents
~98% of the Drosophila eukaryotic genome (Adams et al.
2000
). Therefore, we conclude that the set of 45 cloned and candidate
peptide GPCRs is essentially complete.
We focused on sequences encoding the seven TM domains. More than 50%
of the BDGP/Celera annotations for GPCRs in this set (23 out of 43)
were missing sequences representing one or more TM domains or, in the
case of two receptors (CG4187 and CG5042), an N-terminal domain
containing conserved, leucine-rich repeats (Table 1). In three cases,
the correct gene sequences were published previously (Li et al. 1991
;
Ashburner et al. 1999
; Birgül et al. 1999
). We revised the remaining
20 incorrect or incomplete annotations using software-based gene
prediction methods and manual inspection. For six GPCRs, there were two
to three neighboring annotations that contained nonoverlapping GPCR
sequence motifs. In each of these cases, we did not detect open reading
frames encoding conserved TM domains in the intervening genomic
sequence. Therefore, we merged these sequences to generate single,
revised annotations. Additionally, for CG5911, we detected two adjacent
sets of exons encoding alternative versions of TM4-7, including a
conserved splice acceptor site in TM4. Thus, CG5911 appears to encode
two distinct receptor isoforms (50% identical in TM5-7) through
alternative splicing. Although final confirmation of these predictions
will require direct sequencing of cDNAs, we conclude that our revised annotations are of sufficiently high quality to perform phylogenetic analysis, based on the presence of well conserved motifs (Baldwin et
al. 1997
; Tams et al. 1998
) throughout the TM domains of each of these receptors.
After assembling the list of 45 cloned and candidate peptide GPCRs, we
classified these proteins based on BLASTP scores, on the
locations of these receptors on representative phylogenetic trees for
each GPCR Family (A and B), and on the degree to which these locations
were supported by bootstrap analysis. We examined the genomic location
of each peptide GPCR as well as all biogenic amine and small
transmitter GPCRs to identify linked (possibly paralogous) genes. To
detect conserved gene organizations, we noted the intron locations and
phasing for each cloned and candidate peptide GPCR within the TM
regions, as well as for a few related vertebrate GPCRs. The intron
analysis of the vertebrate GPCRs was not comprehensive, in part because
>93% of vertebrate GPCR genes lack introns within the coding sequence
(Gentles and Karlin 1999
). Finally, based on the results of the above
tests, we were able to place most of the Family A receptors in one of seven
alignments, each of which contained one or more related receptor subgroups.
We found strong evidence supporting the classification of 32 receptors
as peptide GPCRs (Table 2). In addition,
there were two receptors that are clear orthologs of the orphan
receptor, LGR7, a member of a receptor clade containing several peptide GPCRs. For seven additional receptors, we found weaker evidence to
indicate that they are peptide GPCRs. Finally, we regard four of the
receptors as unclassifiable. Interestingly, we found at least six pairs
of paralogs (variants generated by gene duplications or other
processes), most of which appear to be related to common ancestors of
vertebrate GPCR subgroups (rather than derived independently from
vertebrate paralogs; e.g., Fig. 1A,D [see below]). Based on the
presence of ESTs and/or cDNAs (Table 1), at least 25% of the genes
in Table 1 are expressed. Pseudogenes are rare in Drosophila
(Petrov and Hartl 2000
), and based on strong sequence conservation
in the TM domains, most of the remaining genes are likely to be
expressed as well. Thus, we conclude that there are 39-41 peptide
GPCRs in Drosophila, with an additional four GPCRs that may
later be included in this category. The following sections describe
this analysis in detail and provide a listing of potential cognate ligands.
|
Overview of the Drosophila Peptide GPCRs
Together, the set of known and candidate Drosophila peptide GPCRs contains representatives of at least 15 monophyletic vertebrate GPCR subgroups. Family A/Group III-B contains the largest number of Drosophila peptide GPCRs (at least 19; Table 2). These include 17 Drosophila representatives of six vertebrate GPCR subgroups: the gastrin/cholecystokinin, neurokinin, neuropeptide FF and hypocretin/orexin, neuropeptide Y, bombesin/gastrin releasing peptide, and neurotensin receptors (and the neurotensin-related receptors for neuromedin U, growth hormone secretagogue, and thyrotropin releasing hormone). Family A/Group V also contained a large number of Drosophila peptide GPCRs (11; Table 2), representing seven vertebrate GPCR subgroups: the galanin, somatostatin/opioid, gonadotropin releasing hormone, oxytocin/vasopressin, and glycoprotein hormone receptors, as well as two subgroups represented by vertebrate orphan receptors (LGR4-6 and LGR7). Finally, there were five Drosophila peptide GPCRs that belong to Family B. Four of these receptors belong to one of two vertebrate GPCR subgroups: the calcitonin and corticotropin releasing factor receptors. Thus, a large majority of the Drosophila and vertebrate neuropeptide signaling pathways appear to share common evolutionary origins. It remains to be seen whether the functions of these signals have been similarly conserved.
Family A/Group III-B: Gastrin/Cholecystokinin Receptors
Cholecystokinin (CCK) and gastrin are related neuroendocrine
peptides that act through two closely related families of receptors (type A, CCKR, and type B, GASR). These receptors likely evolved from a
common ancestor (Johnsen 1998
). Two Drosophila GPCRs (CG6857, CG6881) displayed strong evidence of evolutionary kinship with this
receptor subgroup (Table 2). On the subgroup-specific tree (Fig.
1A), CG6857 and CG6881 (as well as
Xenopus laevis CCKR) were located on the base of the tree,
before the branches leading to the CCKR and GASR receptors. Therefore,
it appears that the fly receptors diverged from a common ancestor of
the CCKR and GASR lineages. Consistent with this interpretation,
CG6857 and CG6881 are closely linked genes (~30 kb
apart), and they each display the strongest sequence similarity with
each other (by BLASTP and on the phylogenetic trees; Table
2, Fig. 1A). Therefore, they likely arose through a gene duplication
event rather than independently from the CCKR and/or GASR lineages. CG6857 and CG6881 both share an intron (same position
and same phase) in TM3 (Table 3) with genes
encoding both CCKR (accession #AF015959-AF015963) and human GASR
(L10822). Likewise, all of these genes have an intron in a similar
position within the highly variable cytoplasmic loop between TM5 and
TM6. This conservation of introns further indicates that CG6857 and
CG6881 are members of the CCKR/GASR receptor subgroup.
|
|
Family A/Group III-B: Neurokinin Receptors
The neurokinin (tachykinin) receptors (NKRs) are a monophyletic
group of GPCRs that are also closely related to the orexin/hypocretin receptors (OXRs), the neuropeptide FF/AF receptor (NFFR), and a class
of orphan, glucocorticoid-induced receptors (GCRCs). We found six
Drosophila members of this subgroup: CG5811 (Li et al. 1992b
;
St-Onge et al. 2000
), CG10626, TAKR86C (Monnier et al. 1992
), TAKR99D
(Li et al. 1991
, 1992b
), CG10823, and BACR48G21.1. On the
subgroup-specific tree (Fig. 1B), TAKR86C and TAKR99D were located near
the base of a branch leading to NK1R-NK3R, which indicates that the two
fly proteins (and the two C. elegans orthologs) arose before
the diversification the vertebrate NKRs. TAKR86C and TAKR99D are
located together on the subgroup tree (with stable fly neurokinin
receptor, STKR; Fig. 1B), and in BLASTP searches, each
detects the other with the lowest P values (Table 2).
Moreover, the Takr86C (Rosay et al. 1995
) and Takr99D
genes share two introns in the same position and with the same phase (Table 3). Thus, TAKR86C and TAKR99D appear to be paralogs, and they
are therefore likely to share similar ligands and functional properties.
Two additional Drosophila receptors, CG5811 and CG10626, are
related to the true NKRs. However, based on BLASTP and phylogenetic analysis (Table 2, Fig. 1B), each of these receptors appears to be the ortholog of a NKR-related class of GPCRs that to date
have been identified only in invertebrates. CG10626 is closely related
to the tick NKR (LKR; Holmes et al. 2000
) and the snail lymnokinin
receptor (LSR) (Table 2), and these three receptors are located on a
single branch of the subgroup-specific tree (Fig. 1B). Likewise, CG5811
displays strong sequence similarity with GRL106, a snail NKR-like
protein (Table 2). The branching pattern of this portion of the NKR
subgroup tree is unstable (Fig. 1B). However, CG5811 and
CG10626 have two introns that are in similar locations and
display the same phasing (Table 3). Thus, these genes appear to be
paralogs that diverged independently of the true NKRs. Consistent with
this interpretation, the midpoint root of the NKR subgroup tree is
located between the branch leading to the true NKRs and the branches of
the tree leading to CG5811, CG10626, and the related GPCRs.
Finally, there are two additional GPCRs, CG10823 and BACR48G21.1, that
display moderate to weak homology with the NKRs. By BLASTP, CG10823 displays strongest homology with the vertebrate neuropeptide FF/neuropeptide AF receptor (NFFR), the putative mammalian RF-amide-related peptide receptor (OT7T022; Hinuma
et al. 2000
), as well as CG5811 (Table 2). In addition, the
CG10823 gene has an intron that is located in the same
position (and phase) as one of the two introns shared by
Takr86C and Takr99D (Table 3). On the
subgroup-specific tree, CG10823 is located near the base of a branch
leading to the orexin/hypocretin receptors (OXRs), OT7T022 and NFFR
(Fig. 1B). Therefore, CG10823 appears to have arisen from a common
ancestor of these vertebrate relatives. Finally, BACR48G21.1 also
appears to be a member of the NKR subgroup. However, this relationship
was not well supported by the phylogenetic analysis (Table 2), and
additional sequence data will be required to evaluate this finding.
Family A/Group III-B: Neuropeptide Y Receptors
The receptors for the neuropeptide Y (NPY) family of peptides
(NPYRs) and the prolactin releasing peptide (PRPR) form a subgroup of
related GPCRs (Hinuma et al. 1998
; Hoyle 1999
). Four
Drosophila proteins, CG1147, CG7395, CG12610, and CG13995,
appear to be members of this subgroup (Table 2). On the
subgroup-specific tree, the position of the root was unclear (Fig. 1C).
In addition, although the branching pattern for the portions of the
tree containing the vertebrate NPYR receptors (except NY2R) was stable,
the rest of the tree was not clearly resolved. In the
BLASTP analysis and on the phylogenetic trees (Table 2;
Fig. 1C), CG1147 showed the strongest sequence homology with a class of
receptors that includes a C. elegans orphan GPCR (C25G6.5) and
the vertebrate neuropeptide Y Y2 receptors (NY2Rs). CG7395, which also
displays strong general sequence homology with the other members of
this subgroup, appears to be most closely related to a diversified group of orphan NPYR-like C. elegans receptors. In contrast,
CG12610 appears to be most closely related to PRPR. The fourth
Drosophila receptor in this group, CG13995, was located on the
Group III portion of the full Family A tree, which consists almost
exclusively of peptide GPCRs. However, CG13995 did not show strong
evidence of homology with any specific class of peptide GPCRs (Table
2). Interestingly, the CG13995 gene shares an intron in TM3
(same position and phase) with CG12610. Therefore, we propose
that CG13995 is distantly related to the NPYR subgroup. Finally, it has
been suggested that CG5811 is a NPYR-like receptor, despite its greater sequence similarity with the NKRs (see above), based on the activation of functionally expressed CG5811 by NPY and related peptides (at micromolar concentrations) and the lack of activation by vertebrate neurokinins (Li et al. 1992b
). However, in competitive displacement experiments with CG5811 (St-Onge et al. 2000
), PQGRF-amide-like peptides (e.g., NPFF and Lymnaea cardioexcitatory peptide)
displayed IC50s in the subnanomolar range. Thus, CG5811 does
not appear to be a member of the NPYR subgroup.
Family A/Group III-B: Bombesin/Gastrin Releasing Peptide Receptors
The bombesin-like neuropeptides, which include bombesin, gastrin
releasing peptide (GRP), and neuromedin B (NMB), exert a wide variety
of physiological actions in the CNS and the periphery through a class
of related receptors (Sun et al. 2000
). These receptors include the
GRP-preferring receptor (GRPR), the neuromedin B-preferring receptor
(NMBR), and an orphan class of receptors, characterized by bombesin
receptor subtype 3 (BRS3). There are two Drosophila GPCRs,
CG14484 and CG14593, that belong to this phylogenetic subgroup (Table
2). On the subgroup-specific tree, the three types of vertebrate
bombesin/GRP receptors formed a clade, whereas CG14484 and CG14593
branch out from the base of the tree (Fig. 1D). Therefore, it appears
that the fly receptors diverged from a common ancestor of the
vertebrate bombesin/GRP receptor lineages. The organizations of the
CG14484 and CG14593 genes are similar; each has one
intron in the same position and phase, and there are two additional
introns in similar positions (Table 3). Thus, CG14484 and CG14593
appear to be paralogs. Together, these results indicate that CG14484
and CG14593 are bombesin/GRP receptors; to our knowledge, this is the
first clear molecular evidence for bombesin/GRP signaling in invertebrates.
Family A/Group III-B: Growth Hormone Secretagogue, Neurotensin, Neuromedin U, and Thyrotropin Releasing Hormone Receptors
The receptors for neurotensin (NTR), neuromedin U (NMUR),
thyrotropin releasing hormone (TRFR), and growth hormone secretagogue (GHSR) form a large and diverse subgroup of GPCRs (Fujii et al. 2000
).
Among these, NTR, GHSR, and NMUR display strong sequence similarity,
whereas TRFR is more distantly related. At least seven Drosophila GPCRs appear to be members of this subgroup:
CG8784, CG8795, CG9918, CG14575, CG5911A, CG5911B, and CG14003 (Table 2). An additional seven GPCRs (CG2114, CG5936, CG6986, CG8985, CG13229,
CG13803, and CG16726) are all most closely related to a large set of
related orphan receptors that had been identified previously only in
C. elegans (C. Bargmann, pers. comm.). These orphan GPCRs
fall into at least three classes, and there are one to three
Drosophila GPCRs in each class (Fig. 1E). The three receptors in class A (CG8985, CG13229, and CG13803) display strong sequence homology. In addition, CG8985 and CG13803 are linked
genes (~30 kb apart), and they share an intron (Table 3). Thus, the
Drosophila class A receptors appear to be paralogs. All three
classes display weak sequence similarity with TRFR, NTR, and GHSR,
indicating that this entire family of orphan receptors may be derived
from an ancestor of these vertebrate receptors and therefore may encode peptide GPCRs. However, confirmation of such a relationship will require functional analysis of one or more members of these orphan GPCR classes.
CG8784 and CG8795 are two of the seven Drosophila GPCRs displaying the strongest sequence similarity with this vertebrate subgroup, and they appear to be paralogs. They display strong sequence similarity with each other (Table 2; Fig. 1E). Moreover, the CG8784 and CG8795 genes are closely linked (~10 kb apart) and share four introns with identical positions and phasing (Table 3). Similarly, CG9918 and CG14575 each share one intron with CG8784/CG8795 (Table 3), indicating that all four of these receptors are closely related. Their closest vertebrate homologs are NMUR, GHSR, and NTR, based on BLASTP analysis and on their positions in the phylogenetic trees (Table 2; Fig. 1E). Consistent with this finding, the shared intron located in the TM6 domain of CG8784 and CG8795 is also found in the same position and with the same phasing in the pufferfish GHSR gene (AF082211). However, the branching pattern for the subgroup-specific tree was unstable, and the evolutionary relationships among these receptors are unclear. Three additional receptors, CG5911A and CG5911B (generated by putative alternative splicing of the CG5911 gene) and CG14003, also displayed moderate to weak sequence homology with this subgroup and appear to be most closely related to vertebrate TRFR.
Family A/Group V: Galanin/Allatostatin and Opioid/Somatostatin Receptors
There were four Drosophila receptors, AlstR
(Birgül et al. 1999
; Lenz et al. 2000a
), CG7285, CG10001 (Lenz et al.
2000b
), and CG13702, that displayed strong sequence similarity with
galanin, somatostatin, and opioid receptors (Table 2). Because these
three classes of vertebrate receptors display extensive sequence
similarity, we grouped them together to construct a subgroup-specific
tree (Fig. 2A). The root of this tree is
located between the branch leading to the galanin receptors and the
branch leading to the somatostatin and opioid receptors. CG7285 and
CG13702 were located on the branch containing all of the somatostatin
and opioid receptors and related orphan GPCRs. The opioid receptors
form a clade, and two groups of somatostatin receptors also form clades
(SSR1/4 in one and SSR2/3/5 in the other). The remaining branches on
this side of the tree are unstable. Together, these results indicate that CG7285 and CG13702 are orthologous to the vertebrate somatostatin and opioid receptors, although it is not clear whether they diverged from a common ancestor or from a point deeper within the tree. CG7285
and CG13702 appear to be paralogs; they display strong sequence
homology (Table 2; Fig. 2A), and they are encoded by linked genes
(~90 kb apart) that share an intron with the same location and
phasing (Table 3).
|
The allatostatin receptor, AlstR (Birgül et al. 1999
), and
CG10001 were located on the portion of the tree containing all of the
galanin receptors (Fig. 2A), indicating that AlstR and CG10001
are Drosophila orthologs of the mammalian galanin receptors. This finding is in agreement with an earlier phylogenetic analysis of
AlstR (Birgül et al. 1999
). The AlstR and CG10001
genes share an intron at the same location and with the same phasing
(Table 3; Lenz et al. 2000b
). Thus, AlstR and CG10001
appear to be paralogs and are likely to share many functional
properties. Interestingly, immunocytochemical studies, using
anti-porcine galanin and anti-porcine galanin message-associated
peptide, as well as receptor autoradiography studies using
125I-porcine galanin, showed the presence of galanin-like
peptides in several locations in the adult CNS of blowflies,
including the fan-shaped body of the central complex and a ring of
cells in the medulla (Lundquist et al. 1991
, 1993
; Johard et al.
1992
). Similar patterns of staining in the fan-shaped body and medulla have been obtained in Drosophila with a specific monoclonal
anti-allatostatin antiserum (Yoon and Stay 1995
). These comparative
data provide additional support for the conclusion that
AlstR and CG10001 are closely related to the vertebrate
galanin receptors (cf., Birgül et al. 1999
; Lenz et al. 2000b
).
Family A/Group V: Gonadotropin Releasing Hormone, Vasopressin, and Oxytocin Receptors
The receptors for gonadotropin releasing hormone (GRHR) and the
receptors for vasopressin (VPR) and oxytocin (OXYR) belong to two
closely related clades of GPCRs (Hoyle 1999
). In Drosophila, there are three GPCRs that belong to this subgroup; CG6111, CG10698, and Dm-GRHR (Table 2; Hauser et al. 1998
). The branching pattern near
the base of the subgroup-specific tree was unstable (Fig. 2B), and the
evolutionary history of this subgroup is unclear. However, when the
tree is midpoint rooted, Dm-GRHR and CG10698 branch from the side of
the tree leading to the vertebrate GRHRs, and CG6111 branches from the
side of the tree leading to VPR, OXYR, and related GPCRs. These results
are in agreement with the results of BLASTP analysis.
Moreover, the Dm-GRHR gene shares an intron near TM4
(identical location and phasing) with the rat GRHR gene
(U92471)(Hauser et al. 1998
); CG10698 also shares this intron.
Thus, Drosophila appears to have two GRHR-like receptors and
one VPR/OXYR-like receptor.
Family A/Group V (Type 1c): Glycoprotein Hormone Receptors
Four glycoprotein hormones have been identified in mammals:
thyroid-stimulating hormone (TSH) and the gonadotropins,
follicle-stimulating hormone (FSH), choriogonadotropin (CG), and
luteinizing hormone (LH). These four hormones bind to a subgroup of
receptors (the LGRs) that all bear a characteristic, large, N-terminal
"ectodomain" that participates in the binding of the large
glycoprotein ligands (Hsu et al. 2000
) (type 1c receptors; Bockaert and
Pin 1999
). Four Drosophila receptors, CG4187, CG5042, and the
proteins encoded by the Fsh (Hauser et al. 1997
) and
rk (Ashburner et al. 1999
; Eriksen et al. 2000
) genes, display
sequence similarity with the LGRs, including the N-terminal ectodomain
(Table 2). On the subgroup-specific tree (Fig. 2C), there were three
distinct clades (cf., Hsu et al. 2000
). The first includes LGR7, a
Lymnaea ortholog (SLGR), CG4187, and CG5042. The second
includes LGR4-LGR6, and the third includes the glycoprotein hormone
receptors (LSHR, FSHR, and TSHR). Fsh is located at the base
of a branch leading to the glycoprotein hormone receptors, indicating
that this gene may have evolved from a common ancestor of LSHR, FSHR,
and TSHR. Three additional receptors, C. elegans LGR (NLGR),
sea anemone LGR (ALGR), and rk, were grouped only weakly with
the glycoprotein hormone receptors; the branching pattern of this
portion of the tree was unstable. Therefore, these could not be
assigned to any one class of LGRs by basis of the phylogenetic analysis alone.
Within the ectodomain, all of the LGRs contain a variable number of
leucine-rich repeats and a functionally important hinge region located
between the leucine-rich repeats and the seven-TM core. At the borders
of the hinge region, there are two sequences that are diagnostic of the
three different subclasses of LGRs (Table
4; Hsu et al. 2000
). These groupings are
also supported by BLASTP analysis of the
ectodomains (data not shown). These sequences support the placement of
Fsh in the subfamily of glycoprotein hormone receptors.
|
Placement of CG4187 and CG5042 in the LGR7 clade is supported by
BLASTP analysis of the ectodomains (data not shown) and
the presence of subgroup-specific hinge sequences (Table 4). Unlike the
other two subgroups of LGRs, the ectodomains of LGR7 and snail LGR have
low density lipoprotein (LDL) receptor-like cysteine-rich motifs at the
N terminus (Tensen et al. 1994
; Hsu et al. 2000
). CG4187 and CG5042
also each contain at least one LDL motif (Table 4). The function of the
LDL motif is unclear, but it indicates a possible role for
lipoprotein-like molecules in neuronal G protein-mediated signal
transduction (Tensen et al. 1994
). Alternatively, given the presence of
leucine-rich repeats, these receptors may bind to glycoproteins.
Although phylogenetic analysis of the LGRs did not place rk in any of the three subgroups of LGRs, analysis of the ectodomain indicates that this receptor is orthologous to LGR4-6. This is based on the presence of hinge sequences most similar to LGR4-6 and on BLASTP analysis (data not shown). The other members of this family are orphan receptors. However, the presence of the leucine-rich repeats indicates that these proteins also bind to glycoproteins.
Family B/Group I: Calcitonin and Diuretic Hormone Receptors
In addition to the 40 proteins in Family A (the rhodopsin-like
receptors), there are 5 Drosophila peptide GPCRs in Family B (the secretin-like receptors). Based on BLASTP analysis and their positions on the phylogenetic tree (Fig.
3), at least four of these receptors (CG4395,
CG8422, CG12370, and CG17415) belong to Group I. This group contains
the receptors for calcitonin (CALR), calcitonin gene related peptide
(CGRR), corticotropin releasing factor (CRFR and CRF2), and diuretic
hormone (DIHR). The position of the fifth Drosophila peptide
GPCR in this family (CG13758) is unclear, and it may be a member of
Group I, II, or III. CG8422 and CG12370 appear to paralogs, and they
are orthologous to the DIHRs. These receptors belong to a clade
containing CRFR and CRF2, which indicates that the ancestor to the
insect DIHRs evolved from a common ancestor of the vertebrate
corticotropin releasing factor receptors (Fig. 3). In contrast, CG4395
and CG17415 are most closely related to CALR and CGRR, although the
bootstrap scores more deeply located within this branch of the tree
were not strong enough to determine whether CALR and CGRR diverged before or after the related Drosophila receptors. We did not
find evidence for well defined GPCR-associated proteins (e.g., RAMPs [Bockaert and Pin 1999
] and RCPs [Evans et al. 2000
]).
|
Drosophila Genes Encoding Neuropeptides and Peptide Hormones
We wished to compare the number of peptide GPCRs with the number of
neuropeptides present (or suspected to exist) in Drosophila. Based on the literature and on some genomic analysis, we have assembled
a list of 22 Drosophila neuropeptide genes (Table
5). These genes are either known or
predicted to encode bioactive neuropeptides and peptide hormones. Eight
of these, which encode neuropeptides described for Drosophila
or other arthropods, were described previously only in gene annotations
generated by Celera/BDGP and in a parallel survey, which was just
published recently (Vanden Broeck 2001
). An additional peptide listed
by Vanden Broeck (2001)
("IFa") was not included, because the
precursor did not match our criteria for putative neuropeptide genes.
Because neuropeptide-encoding precursors do not display multiple,
uniform characteristics found in GPCRs, we are certain to have missed
many peptide genes and thus consider this list incomplete. However,
assuming a 1 : 1 ratio of neuropeptide and peptide hormone genes to
peptide GPCRs, these 22 genes appear to encode the ligands for at least
50% of the Drosophila peptide GPCRs that we have described.
This may be an underestimate, given the fact that many of these
neuropeptide genes encode multiple peptides. In addition to these 22 neuropeptide genes, we list several insect peptides and peptide
hormones known in other insects and for which Drosophila
homologs have been inferred by observation or simply by conjecture.
Although the structures of these genes are not yet available, they are
included here to permit consideration of all plausible ligands for the
identified peptide GPCRs.
|
Ligands for Family A Peptide GPCRs
There are multiple genes encoding potential ligands for the Drosophila NKR-like receptors. CG14734 produces neurokinin-like peptides (Siviter et al. 2000Ligands for Family B Peptide GPCRs
We speculate that the corticotropin releasing factor (CRF)-related peptides encoded by CG8348 and CG13094 (similar to Locusta diuretic hormone; Coast 1996Peptide Genes Still Awaiting Identification
There are several insect neuropeptides and peptide hormones that have not as yet been cloned in Drosophila. These include three large protein hormones
PTTH, bursicon, and the anterior retraction
factor (ARF)
that are known to exist in Drosophila but
currently lack molecular definition. At least two of these proteins,
PTTH and bursicon, are glycoprotein hormones (Fraenkel et al. 1966
- and specific
-subunits (Hsu et al. 2000| |
METHODS |
|---|
|
|
|---|
Peptide GPCR Sequence Acquisition
To identify all predicted Drosophila GPCRs, we first
scanned the gene annotations developed jointly by the Berkeley
Drosophila Genome Project (BDGP) and Celera Genomics for all
proteins predicted to contains domains matching seven-TM motifs (Adams
et al. 2000
; Brody and Cravchik 2000
). We rejected sequences identified
recently as odorant receptors by a committee representing scientists
working in the field (Drosophila Odorant Receptor Nomenclature
Committee 2000
). Each remaining cloned and candidate receptor gene was
used as a BLASTP search query of the database of
predicted Drosophila proteins using the BDGP server
(http://www.fruitfly.org/) and/or of the "non-redundant"
database of all proteins using the NCBI server
(http://www.ncbi.nlm.nih.gov/). Sequences were not considered
further if the resulting top-scoring proteins yielded P values
for nonpeptide receptors (and associated orphan receptors) that were at
least 10-fold greater than the top P value for a putative
peptide receptor. Three sequences (CG18314, CG12796, CG13579) generated
a smaller range of P values following BLASTP searches on the BDGP server. Nevertheless, analysis of these proteins using the NCBI server yielded hits that were exclusively amine/small neurotransmitter receptors (or orphan receptors). These proteins therefore are
likely to encode nonpeptide receptors, and they also were excluded.
We first scanned a set of GPCR sequence annotations obtained through a
GENSCAN search of the complete Drosophila genome
sequence (see Vosshall et al. 1999
) and identified based on sequence
similarity to GPCRs in the NCBI nonredundant protein database (K. Scott
and L. Vosshall, pers. comm.). For the BLASTP and
TBLASTN analyses, we assembled a "GPCR query set," which included the previously annotated peptide, amine, and
related/unclassified Drosophila GPCRs as well as ~200 sequences
representing a diverse set of Family A (rhodopsin receptor-like family)
GPCRs from the Pfam database (7TM-1; http://pfam.wustl.edu/). These
sequences were used as queries for BLASTP and
TBLASTN searches on the BDGP server, using the predicted
proteins and the Celera/BDGP whole genome shotgun sequence datasets,
respectively. To expedite the latter search, we assumed that
TBLASTN hits to genomic sequences that were already on our
list were due to the detection of previously annotated GPCR genes.
GPCR Alignments
We used the hidden Markov model-based protein alignments contained
in Version 5.5 of PFAM (Sept., 2000; Bateman et al. 2000
)
as a template for the manual alignment of the Drosophila cloned and candidate peptide receptors. The alignments were viewed using ClustalX (Version 1.8; Thompson et al. 1997
), and, in some cases, this program was used to help resolve the alignment of
variable regions (e.g., between TM domains 4 and 5). We used these
alignments to build phylogenetic trees and also to detect missing or
incorrect sequences in the gene annotations.
The N-terminal and C-terminal non-TM sequences in GPCRs tend to be
poorly conserved, making accurate alignment difficult, and the seven-TM
core region is sufficient for the subclassification of these proteins
(Strader et al. 1994
). Therefore, for Family A receptors, we deleted
sequences N-terminal to the conserved GNXXLV motif (single-letter amino
acid code) in TM1 and C-terminal to the conserved NPXIY motif in TM7.
For Family B receptors (secretin receptor family), we deleted sequences
flanking the X10GX3S motif in TM1 and the
QGX2V X4CX5X motif in TM7.
Correction of Annotations
To locate missing TM domains among the putative peptide receptor annotations, we scanned for potential coding exons in flanking genomic sequence using the GENSCAN server at MIT (http://genes.mit.edu/GENSCAN.html), and the FGENES (gene prediction) and FEX (exon prediction) programs on the Baylor College of Medicine (BCM Search Launcher) server (http://www.hgsc.bcm.tmc.edu/). We also scanned for potential mRNA splice sites using the SPL program on the BCM Search Launcher server and by manual inspection of potential open reading frames displayed using MacVector (Genetics Computer Group, Madison, WI). The DNA sequences for all of the predicted donor and acceptor splice sites were NN|GT and AG|NN, respectively. Finally, we examined neighboring gene annotations to identify duplicate annotations of single GPCR genes. The annotations were judged to be complete when each of the TM domains displayed features that were clearly recognizable among closely related receptors. Except for the LGR subgroup of receptors (see Results and Discussion), which all share a large and subgroup-specific N-terminal domain, we did not evaluate the quality of the annotations for the N-terminal and C-terminal non-TM regions.
Tree Building
We classified the cloned and candidate peptide GPCRs based on five criteria. First, we noted the highest scoring BLASTP hits obtained on the NCBI server (Table 2). Second, we constructed alignments of Family A and Family B receptors, including all of the Drosophila peptide GPCRs identified above, for the purpose of generating full phylogenetic trees for each family. For Family A, we included mostly complete (TM1-TM7) sequences representing each of the five receptor groups, as well as sequences representing each of the various subgroups of receptors (e.g., the three types of galanin receptors) and representative orphan receptors contained within the full list of Pfam 7TM-1 (Family A) GPCRs. For Family B, we included all of the Group I-III receptors and a representative set of Family B, Group IV receptors within the full list of Pfam 7TM-2 GPCRs. After manual editing of the alignments, we constructed neighbor-joining phylogenetic trees for each family using ClustalX, using the correction for multiple substitutions provided by the software, followed by bootstrap analysis (1000 replicates).
For the subsequent subgroup-specific trees, we attempted to include all
complete TM1-TM7 sequences belonging to each subgroup (as well as some
partial sequences). These were identified by scanning the full Pfam
7TM-1 alignment and the GPCRDB listing of available GPCR sequences
(http://www.gpcr.org/7tm/), and by performing BLASTP
searches with the cloned and candidate Drosophila peptide
GPCRs as well as other representatives of each subgroup. After manual
editing of the alignments, the construction of neighbor-joining trees
and the bootstrap analysis was performed as above. A set of 26 indoleamine (biogenic amine) receptors, which form a monophyletic group
(Kolakowski 1994
), was used as an outgroup for the purpose of rooting
the subgroup-specific trees. When the position of the root was unclear,
the outgroup was omitted. All alignments, revised annotations, and
unabridged versions of the trees are located at
http://thalamus.wustl.edu/flyGPCR/peptideGPCR.html. In addition, the
revised annotations have been submitted to FlyBase (http://flybase.bio.indiana.edu/).
| |
ACKNOWLEDGMENTS |
|---|
This work was supported by National Institutes of Health Grant NS21749 and the Human Frontier Science Program Organization (P.H.T.). We thank Sean Eddy for helpful discussions, Kirstin Scott and Leslie Vosshall for sharing Drosophila GPCR sequence data, Lin Yang and Dori Sztipanovits for technical assistance, and Aguan Wei for comments on the manuscript. We also thank Cori Bargmann and Kemal Payza for sharing unpublished results.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
NOTE ADDED IN PROOF |
|---|
We have identified two additional ESTs for peptide GPCRs: AT008361
(CG1147) and AT0019640 (CG13229). Also, based on discussions with Jan
Veenstra (Universite Bordeaux) we now add two additional peptide genes
to our list: the SIFamide gene (currently listed as part of
CG4681; Ifa, Vanden Broeck 2001
) and the hugin
gene (CG6371).
| |
FOOTNOTES |
|---|
1 E-MAIL hewesr{at}thalamus.wustl.edu; FAX (314) 362-3446.
2 E-MAIL taghertp{at}pcg.wustl.edu; FAX (314) 362-3446.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.169901.
| |
REFERENCES |
|---|
|
|
|---|
-carbon template for the transmembrane helices in the rhodopsin family of G-protein-coupled receptors.
J. Mol. Biol.
272:
144-164[CrossRef][Medline].