|
|
|
|
Genome Res. 14:580-590, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Letter Genomic Analysis of the Nuclear Receptor Family: New Insights Into Structure, Regulation, and Evolution From the Rat Genome1 Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA 2 Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA 3 Huffington Center on Aging, Department of Otolaryngology, Baylor College of Medicine, Houston, Texas 77030, USA 4 Department of Microbiology and Molecular Genetics, University of Texas Medical School, Houston, Texas 77225, USA
Completion of the Rattus norvegicus genome sequence enabled a global inventory and analysis of the nuclear receptors (NRs) in three mammalian species. Forty-nine NR members were found in mouse, 48 in human. Forty-seven were found in the rat, with gaps at the locations expected for the other two. Pairwise comparisons of their distribution in rat, mouse, and human identified 11 syntenic NR gene blocks, including three small clusters of two or three closely related genes, each spanning 40 kb to 1700 kb. The exon structure of the ligand-binding domain suggests that exon shuffling has played a role in the evolution of this family. An invariant splice junction in all members of the NR family except LXR suggests a functional role for the intron. The ligand-binding domains of PXR and CAR are among the most divergent in the family. Their higher nucleotide substitution rates may be related to the central role played by these two NRs in the metabolism of the foreign compounds and may have resulted from limited positive selection.
Nuclear receptors (NRs) are transcription factors capable of exerting regulation of gene expression in the nucleus in response to various extracellular and intracellular signals (Tsai and O'Malley 1994
NRs share a similar modular domain structure, which includes, from N-terminus to C-terminus, the variable modulatory A/B domain, the DNA-binding domain (DBD), the hinge D-region, the ligand-binding domain (LBD), and an F-domain that is not found in all NRs. The DBD contains two zinc fingers in tandem that encompass
NRs constitute one of the largest groups of transcription factors in animals. Twenty-one NR genes are identified in the complete sequence of the Drosophila melanogaster genome (Adams et al. 2000
With the draft rat genome sequence available (Rat Genome Sequencing Project Consortium 2004
Nuclear Receptor Inventory in Rat, Mouse, and Human Genomes The presence of six NR domains was examined in the rat, mouse, and human genomic sequences using GENEWISEDB (see Methods). The numbers of NR domains identified in the three genomes are summarized in Table 1 (see Supplemental Table 1 for a detailed inventory and genomic coordinates). Grouping and subsequent assignment of these domains to different NRs by BLAST revealed that most of the known mammalian NR genes are present in the current three genome sequences (Suppl. Table 1, Fig. 1); however, the sequences encoding several receptors are partially or completely missing in the rat and mouse genomes. The absence of the sequences encoding Rev-erb (NR1D2) and PNR (NR2E3) and the LBD of TLX (NR2E1) in the rat genome and the DBD of LXR (NR1H2) in the mouse genome can be explained by gaps in these two assemblies at the expected syntenic locations. The final tally of complete or partially identified NR genes was 48 for human, 49 for mouse, and 47 for rat.
Among the NR genes are also "domain singletons," the genomic sequences encoding NR domains without nearby sequences, or gaps, to make complete NR genes (Suppl. Table 1). They do not share sequence similarity with the single-domain DAX-1 (NR0B1) and SHP (NR0B2), two NRs known to lack a DBD.
Although some domain singletons might be a result of false positive identifications, others defy so quick a dismissal and remain puzzling. For example, a 522-bp sequence identified on human chromosome 16 encodes a partial A/B domain of the glucocorticoid receptor (GR, NR3C1), and is 95% identical to a portion of the first coding exon of GR. The observation that the intron downstream of the first coding exon of GR harbors a potentially active family-Y Alu element (Batzer et al. 1990
NR pseudogenes (
Four pseudogenes were detected in the mouse genome and three in the rat genome. Although there are two LRH1 pseudogenes in both the mouse and rat genomes, it is likely that the two sets were created independently because there are no syntenic pairings, and they have marked differences in their sequence features (data not shown).
Genomic Distribution of Nuclear Receptors
Three tightly linked NR gene clusters stand out within the syntenic blocks: cluster i composed of TR (NR1A1), RAR (NR1B1), and Rev-erb (NR1D1) from block VII; cluster ii of TR (NR1A2), Rev-erb (NR1D2), and RAR (NR1B2) from block VIII; and cluster iii of SF1 (NR5A1) and GCNF1 (NR6A1), a subset of block X. They span 270 kb, 1700 kb, and 40 kb, respectively, in the rat genome. Salient features of clusters i and ii in the human and rat genomes were described previously (Laudet et al. 1992 with Rev-erb (Lazar et al. 1989 and Rev-erb do not share terminal exons (Koh and Moore 1999
The genome sequences bring details of this organization into focus. The gene order, spacing, and orientations are different in the extant clusters i and ii (Fig. 2). Although TR and Rev-erb maintain the same tail-to-tail orientation, the pair is inverted relative to RAR in the two clusters. Among these six genes, only TR
Given the propensity for processes of chromosomal rearrangement to scatter the majority of the NR genes, it is interesting that both clusters remained closely linked, suggesting that natural selection favors the clusters. All other syntenic groups of NR genes found here belong to a set of large syntenic blocks shared by the rat, mouse, and human genomes and may simply reflect the current state of the chromosomal organization on the whole-genome scale. Studies of the segmental duplication suggest that the recent segmental duplication events have contributed little to the evolution of the NRs in human, mouse, and rat, as no NR genes or their functional domains are found in the large duplicated regions in the human and rat genomes (Bailey et al. 2002
Phylogenetic Analysis
SHP (small heterodimer partner) and DAX-1 (dosage-sensitive sex and AHC critical region on the X, gene 1) of the NR0B group were thought to possibly represent an ancient gene structure (cf. Guo et al. 1996 The LBDs of most NR members have changed little since the divergence of humans and rodents. This is manifested in the tree as extremely short terminal branch lengths, that is, those branches representing the last common ancestor of the three species. However, three groups, (NR1I2-3, NR1H5, and NR0B1-2, see Fig. 3A, shaded groups) were significantly more divergent among the three species. Nucleotide substitution analysis revealed that the synonymous rates in the LBDs of CAR (NR1I3) and PXR (NR1I2) are average for the family, whereas the nonsynonymous rates were 6.4 and 3.7 times higher than the average (Suppl. Fig. 1).
Evaluation of the terminal branch lengths of all NR members revealed cases where the rat sequence was closer to human than the mouse was: RAR
There was too little variation in the
The KA/KS ratios of the LBD domains indicate that the NRs are subject to strong purifying selection. No positive selection was detected by Student's t-test. However, because the KA/KS ratios of the LBDs of PXR and CAR were 4.0 and 5.6 times greater than the averages, respectively, these two domains may have experienced limited positive selection in the context of the NR evolution. For PXR and CAR, the increased KA/KS ratios in the LBDs could be more readily explained by their biological functions. PXR, an orphan NR preferentially expressed in the liver and intestine, responds to potentially harmful chemicals by activating the expression of cytochrome P-450 genes crucial for the detoxification of a wide variety of structurally diverse xenobiotics and endobiotics (Kliewer et al. 1998
We investigated the structural implications of the LBD sequence variation in the PXR group. Thirty-three variable sites in the multiple sequence alignment of the LBDs of PXR from human, mouse, rat, rhesus, pig, rabbit, dog, chicken, and zebrafish were mapped on the tertiary structure of the LBD of the human PXR (Watkins et al. 2003
By contrast, the longest -helix, 10, has only four variable sites, all extending toward the interior protein. The tertiary structure of the PPAR-RXR heterodimer (Gampe Jr. et al. 2000 10 is involved in the interaction with RXR. 10 probably functions similarly in other heterodimeric partners of the RXR, including PXR. Thus variation of the outward face of PXR 10 may be constrained by this important function.
Exon Structures of DBD and LBD
Eight patterns are evident in the DBD splice junctions (Fig. 5A). The junction is located at various positions in between the two zinc finger motifs in four of the eight groups. It is located in the first zinc finger motif in the NR2B1-3, NR2C1&2 group, and it is located at different positions within the second zinc finger in NR2A1&2, NR2F6, and NR2E1 groups. The splice junction was lost from NR1H2&3, NR2F1&2, and NR6A1. Because these do not form a monophyletic group in the tree (Fig. 3A), the intron was probably lost in three separate events. Members of subfamilies NR1 and NR3 show little variation in junction location, whereas subfamily NR2 has several variants. In two cases members of different subfamilies shared the same splice junction: NR1 and NR5, and NR1I and NR4. This result, taken together with the phylogenetic results described above, may suggest a complex evolutionary relationship between the subfamilies NR1, 4, and 5 (see Phylogenetic Analysis above). Alternatively, there could be preferred sites for acquiring introns. Elucidation of the principles governing the dynamics of intron acquisition and change over long evolutionary timescales is needed to understand these relationships.
The LBD was less conserved, overall, than the DBD compared across the whole family (alignments in Suppl. Fig. 2). Within it, three sequence motifs were identified (see Methods), although none of those were as conserved as the zinc finger motifs in the DBD. Motif I, spanning
Up to four splice junctions were found in the peptide sequences of the region of the LBDs to which our analysis was confined (see the Pfam profiles used to identify LBD, and Methods). The locations of the four introns were confined to distinct regions of the LBD as defined by the aforementioned structural motifs. The first is within motif I; second, between motifs I and II; third, within motif II; and fourth, downstream from motif II. Introns were lost multiple times at regions 1, 2, and 4. Moreover, the precise location of introns 1, 2, and 4 was variable. In distinct contrast, intron 3 was invariant in that it was present in all of the NRs except LXR
The conserved LBD splice junction was likely to have originated early in the family and was subsequently conserved in evolution: it was also observed in the LBD of the Danio rerio SVP46 (NR2F5, data not shown). The selective pressure maintaining the splice junction could arise from conservation of amino acid sequence. The aforementioned aspartic acid codon is split by this phase-1 splice junction. However, motif II is much less conserved than the zinc finger motifs or motif I, and some NR sub-families have neither the aspartic acid nor a glutamic acid at the splice junction. Thus, some sequence or structural motif in the NR mRNA involved in its regulation, processing, or stability may be the determinant of the conservation of this splice junction. Because LXR
Comparison of the 26 different splicing patterns of the sequence encoding the LBD (Fig. 5B) conveys the sense that large-scale sequence changes, intron loss or gain, and exon addition or substitution played an important role in shaping the evolution of this family. The loss or gain of the first, second, or fourth introns in the LBD occurred within many NRs. Large-scale innovative changes in the coding sequence of the LBD region may have contributed significantly to the rise of some new NR genes. FXR Further variation in LBD splice junction patterns may exist in other isoforms, and thus a full accounting of all isoforms, in these and other species, will be important.
Conclusion
Paralogous NR family members exhibit a variety of different exon structures in both their DBDs and LBDs. Among the variation, the conserved location of the splice junction in the second motif of the LBD stands out as a peculiar phenomenon. It may prove to be a more reliable signature for the NR genes than the C4 zinc finger. Very similar findings are reported in other gene families, for example, chemoreceptor superfamily (Robertson et al. 2003
Identification of Nuclear Receptor Genes in Human, Mouse, and Rat Genomes Six structural and functional domains specific for members of the NR family were obtained from Pfam (Bateman et al. 2002 -helix 3 and extended through -helix 10 (Wurtz et al. 1996
The mRNA and protein sequences of 62 representative NRs (Robinson-Rechavi et al. 2001
The human, mouse, and rat genomic sequences used in this study were human genome build 34 of June 2003, mouse genome build of February 2003, and rat genome build of April 2003. To take advantage of parallel computing, each of these three genomes was partitioned into 750-kb segments with 2-kb overlaps. Only domains of the NRs identified at the previous step with stringently high E-values were searched in the genomic sequences using GENEWISEDB (Wise 2.2.0; Birney and Durbin 2000
Domains identified in each genome were grouped together based on their orientation and the coordinates of their genomic locations, and were compared to the 62 NR protein sequences using the best BLASTP hit as the identity. The GENEWISEDB search results were also parsed to create custom annotation tracks in the UCSC genome browser (http://genome.ucsc.edu/
Pseudogenes were identified among NR genes which had more than one copy in a genome and when the sequence of the mRNA transcript of this gene or its orthologs was available. The mRNA sequence was aligned using TBLASTN and BLAT (Kent 2002
Statistical Test for Clustering of Nuclear Receptor Genes in the Rat Genome
Sequence Analyses
Corrected but unaligned LBD peptide sequences were searched for conserved sequence motifs (http://blocks.fhcrc.org/
CLUSTALW correctly aligned the residues corresponding to motifs I and III but not motif II. In particular, the subfamily NR0B alignment was greatly improved using motif II as a guide; minor adjustments were required in some other subfamilies. Phylogenetic tree reconstruction of both the protein and DNA alignments was performed using an implementation of the neighbor-joining method in the PAUP*4.0 software package (Swofford 2003
KA/KS of every orthologous gene pair was calculated as the measure of sequence evolution (Li et al. 1985
Splice Junction Analysis
We thank Hugh M. Robertson for helpful discussions. A.J.C., R.B.L., and F.A.P. were supported in part by the NIH NURSA orphan receptor program, grant #U19DK62434-01. This work was supported by grants 5-U54 HG02051 (Human/MB) from the NHGRI and 5-U54 HG02345 (Rat) from the NHGRI/NHLBI to R.A.G. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2160004.
5 Corresponding author. [Supplemental material is available online at www.genome.org.]
Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al. 2000. The genome sequence of Drosophila melanogaster. Science 287: 2185-2195.
Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R, Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276-280.
Batzer, M.A., Kilroy, G.E., Richard, P.E., Shaikh, T.H., Desselle, T.D., Hoppens, C.L., and Deininger, P.L. 1990. Structure and variability of recently inserted Alu family members. Nucleic Acids Res. 18: 6793-6798.
Birney, E. and Durbin, R. 2000. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10: 547-548.
Bonnelye, E., Vanacker, J.M., Desbiens, X., Begue, A., Stehelin, D., and Laudet, V. 1994. Rev-erb
Boudet, N., Aubourg, S., Toffano-Nioche, C., Kreis, M., and Lecharny, A. 2001. Evolution of intron/exon structure of DEAD helicase family genes in Arabidopsis, Caenorhabditis, and Drosophila. Genome Res. 11: 2101-2114.
Eddy, S. 1998. Profile hidden Markov models. Bioinformatics 14: 755-763.
Forrest, D., Sjoberg, M., and Vennstrom, B. 1990. Contrasting developmental and tissue-specific expression of
Gampe Jr., R.T., Montana, V.G., Lambert, M.H., Miller, A.B., Bledsoe, R.K., Milburn, M.V., Kliewer, S.A., Willson, T.M., and Xu, H.E. 2000. Asymmetry in the PPAR
Giguere, V. 1999. Orphan nuclear receptors: From gene to function. Endocr. Rev. 20: 689-725.
Greschik, H., Wurtz, J.-M., Hublitz, P., Kohler, F., Moras, D., and Schule, R. 1999. Characterization of the DNA-binding and dimerization properties of the nuclear orphan receptor germ cell nuclear factor. Mol. Cell. Biol. 19: 690-703. Guo, W., Burris, T.P., Zhang, Y.H., Huang, B.L., Mason, J., Copeland, K.C., Kupfer, S.R., Pagon, R.A., and McCabe, E.R. 1996. Genomic sequence of the DAX1 gene: An orphan receptor responsible for X-linked adrenal hypoplasia congenital and hypergonadotropic hypogonadism. J. Clin. Endocrinol. Metab. 81: 2481-2486.[Abstract] Gurates, B., Amsterdam, A., Tamura, M., Yang, S., Zhou, J., Fang, Z., Amin, S., Sebastian, S., and Bulun, S.E. 2003. WT1 and DAX-1 regulate SF-1-mediated human P450arom gene expression in gonadal cells. Mol. Cell Endocrinol. 208: 61-75.[CrossRef][Medline] Henikoff, S., Henikoff, J.G., Alford, W.J., and Pietrokovski, S. 1995. Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163: GC17-26.[CrossRef][Medline]
Johansson, L., Thomsen, J.S., Damdimopoulos, A.E., Spyrou, G., Gustafsson, J.A., and Treuter, E. 2000. The orphan nuclear receptor SHP utilizes conserved LXXLL-related motifs for interactions with ligand-activated estrogen receptors. Mol. Cell. Biol. 20: 1124-1133. Kent, W.J. 2002. BLAT: The BLAST-like Alignment Tool. Genome Res. 4: 656-664. Kliewer, S.A., Moore, J.T., Wade, L., Staudinger, J.L., Watson, M.A., Jones, S.A., McKee, D.D., Oliver, B.B., Willson, T.M., Zetterstrom, R.H., et al. 1998. An orphan nuclear receptor activated by pregnanes defines a novel steroid signaling pathway. Cell 92: 73-82.[CrossRef][Medline] Koh, Y.S. and Moore, D.S. 1999. Linkage of the nuclear hormone receptor genes NR1D2, THRB, and RARB: Evidence for an ancient, large-scale duplication. Genomics 57: 289-292.[CrossRef][Medline] Laudet, V. 1997. Evolution of the nuclear receptor superfamily early diversification from an ancestral orphan receptor. J. Mol. Endocrinol. 19: 207-226.[Abstract] Laudet, V., Hanni, C., Coll, J., Catzeflis, F., and Stehelin, D. 1992. Evolution of the nuclear receptor gene superfamily. EMBO J. 11: 1003-1013.[Medline]
Lazar, M.A., Hodin, R.A., Darling, D.S., and Chin, W.W. 1989. A novel member of the thyroid/steroid hormone receptor family is encoded by the opposite strand of the rat c-erbA Lehmann, J.M., McKee, D.D., Watson, M.A., Willson, T.M., Moore, J.T., and Kliewer, S.A. 1998. The human orphan nuclear receptor PXR is activated by compounds that regulate CYP3A4 gene expression and cause drug interactions. J. Clin. Invest. 102: 1016-1023.[Medline] Li, W.-H., Wu, C.I., and Luo, C.C. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2: 150-174.[Abstract] Maglich, J.M., Sluder, A., Guan, X., Shi, Y., McKee, D.D., Carrick, K., Kamdar, K., Willson, T.M., and Moore, J.T. 2001. Comparison of complete nuclear receptor sets from the human, Caenorhabditis elegans and Drosophila genomes. Genome Biol. 2: RESEARCH0029.
Maglich, J.M., Stoltz, C.M., Goodwin, B., Hawkins-Brown, D., Moore, J.T., and Kliewer, S.A. 2002. Nuclear pregnane X receptor and constitutive androstane receptor regulate overlapping but distinct sets of genes involved in xenobiotic detoxification. Mol. Pharmacol. 62: 638-646. Mangelsdorf, D.J., Thummel, C., Beato, M., Herrlich, P., Schutz, G., Umesono, K., Blumberg, B., Kastner, P., Mark, M., Chambon, P., et al. 1995. The nuclear receptor superfamily: The second decade. Cell 83: 835-839.[CrossRef][Medline]
Moore, L.B., Parks, D.J., Jones, S.A., Bledsoe, R.K., Consler, T.G., Stimmel, J.B., Goodwin, B., Liddle, C., Blanchard, S.G., Willson, T.M., et al. 2000. Orphan nuclear receptors constitutive androstane receptor and pregnane X receptor share xenobiotic and steroid ligands. J. Biol. Chem. 275: 15122-15127. Nuclear Receptors Committee. 1999. A unified nomenclature system for the nuclear receptor subfamily. Cell 97: 1-20.[CrossRef][Medline]
Otte, K., Kranz, H., Kober, I., Thompson, P., Hoefer, M., Haubold, B., Remmel, B., Voss, H., Kaiser, C., Albers, M., et al. 2003. Identification of farnesoid X receptor Rat Genome Sequencing Project Consortium. 2004. Genome sequence of the Brown Norway Rat yields insights into mammalian evolution. Nature (in press).
Robertson, H.M., Warr, C.G., and Carlson, J.R. 2003. Molecular evolution of the insect chemoreceptor superfamily in Drosophila melanogaster. Proc. Natl. Acad. Sci. 100: 14537-14542. Robinson-Rechavi, M., Carpentier, A.-S., Duffraisse, M., and Laudet, V. 2001. How many nuclear hormone receptors in the human genome? Trends Genet. 17: 554-556.[CrossRef][Medline] Sluder, A.E. and Maina, C.V. 2001. Nuclear receptors in nematodes: Themes and variations. Trends Genet. 17: 206-213.[CrossRef][Medline] Swofford, D.L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, MA. Tsai, M.J. and O'Malley, B.W. 1994. Molecular mechanisms of action of steroid/thyroid receptor superfamily members. Annu. Rev. Biochem. 63: 451-486.[CrossRef][Medline] Tuzun, E., Bailey, J.A., and Eichler, E.E. 2004. Recent segmental duplications in the working draft assembly of the brown Norway rat. Genome Res. (this issue). Wang, L.H., Tsai, S.Y., Cook, R.G., Beattie, W.G., Tsai, M.J., and O'Malley, B.W. 1989. COUP transcription factor is a member of the steroid receptor superfamily. Nature 340: 163-166.[CrossRef][Medline]
Watkins, R.E., Wisely, G.B., Moore, L.B., Collins, J.L., Lambert, M.H., Williams, S.P., Willson, T.M., Kliewer, S.A., and Redinbo, M.R. 2001. The human nuclear xenobiotic receptor PXR: Structural determinants of directed promiscuity. Science 292: 2329-2333. Watkins, R.E., Davis-Searles, P.R., Lambert, M.H., and Redinbo, M.R. 2003. Coactivator binding promotes the specific interaction between ligand and the pregnane X receptor. J. Mol. Biol. 331: 815-828.[CrossRef][Medline] Wurtz, J.M., Bourguet, W., Renaud, J.P., Vivat, V., Chambon, P., Moras, D., and Gronemeyer, H. 1996. A canonical structure for the ligand-binding domain of nuclear receptors. Nat. Struct. Biol. 3: 87-94.[CrossRef][Medline] Zar, J.H. 1984. The Poisson distribution and randomness. Biostatistical analysis, 2nd ed. Prentice-Hall, Inc., Englewood Cliffs, NJ.
Zhang, M. and Chiang, J.Y. 2001. Transcriptional regulation of the human sterol 12
http://genome.ucsc.edu/; UCSC genome bioinformatics. http://blocks.fhcrc.org/; blocks WWW server.
Received November 12, 2003;
accepted in revised format December 10, 2003.
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||