|
|
|
|
Vol. 9, Issue 5, 449-456, May 1999
LETTER
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
DSPG3, the human homolog to chick PG-Lb, is a member of the small leucine-rich repeat proteoglycan (SLRP) family, including decorin, biglycan, fibromodulin, and lumican. In contrast to the tissue distribution of the other SLRPs, DSPG3 is predominantly expressed in cartilage. In this study, we have determined that the human DSPG3 gene is composed of seven exons: Exon 2 of DSPG3 includes the start codon, exons 4-7 code for the leucine-rich repeats, exons 3 and 7 contain the potential glycosaminoglycan attachment sites, and exon 7 contains the potential N-glycosylation sites and the stop codon. We have identified two polymorphic variations, an insertion/deletion composed of 19 nucleotides in intron 1 and a tetranucleotide (TATT)n repeat in intron 5. Analysis of 1.6 kb of upstream promoter sequence of DSPG3 reveals three TATA boxes, one of which is 20 nucleotides before the transcription start site. The transcription start site precedes the translation start site by 98 nucleotides. There are 14 potential binding sites for SOX9, a transcription factor present in cartilage, in the promoter, and in the first intron of DSPG3. We have examined the evolution of the SLRP gene family and found that gene products clustered together in the evolutionary tree are encoded by genes with similarities in genomic structure. Hence, it appears that the majority of the introns in the SLRP genes were inserted after the differentiation of the SLRP genes from an ancestral gene that was most likely composed of 2-3 exons.
[The sequence data described in this paper have been submitted to GenBank under accession nos. AF031658 and U63814.]
| |
INTRODUCTION |
|---|
|
|
|---|
DSPG3 is the human homolog to chick PG-Lb, an
extracellular matrix proteoglycan originally isolated from epiphyseal
cartilage (Shinomura et al. 1983
; Deere et al. 1996
). DSPG3 (PG-Lb) is
a member of the small leucine-rich repeat proteoglycan (SLRP) family, including decorin, biglycan, fibromodulin, and lumican (Krusius and
Ruoslahti 1986
; Fisher et al. 1989
; Oldberg et al. 1989
; Shinomura and
Kimata 1992
; Deere et al. 1996
). The core proteins of the SLRPs are
composed of 6-10 tandem repeats of 24 amino acid residues that are
rich in leucine (for review, see Kobe and Deisenhofer 1994
). The
leucine-rich repeats (LRR) are preceded by four cysteines and followed
by two cysteines that are presumed to form disulfide bonds on either
side of the LRRs. Related LRR glycoproteins include prolargin (PRELP),
osteoglycin (formerly known as osteoinductive factor), and osteomodulin
(Madisen et al. 1990
; Bengtsson et al. 1995
; Grover et al. 1996
; Ohno
et al. 1996
).
The SLRPs are clustered in a few chromosomal regions. Decorin,
lumican, and DSPG3 map to human chromosome 12q21-q22
(McBride et al. 1990
; Danielson et al. 1993
; Vetter et al. 1993
;
Chakravarti et al. 1995
; Grover et al. 1995
; Deere et al. 1996
).
Fibromodulin and PRELP are localized to human
chromosome 1q32 (Sztrolovics et al. 1994
; Grover et al. 1996
).
Currently, only one SLRP, biglycan, maps to human chromosome
Xq28 (McBride et al. 1990
; Fisher et al. 1991
; Traupe et al. 1992
).
The SLRPs have related genomic structures. The fibromodulin,
lumican, and PRELP genes are composed of three exons,
and the first intron, in each case, immediately precedes the start site of translation, whereas the second intron is in the last LRR (Antonsson et al. 1993
; Grover et al. 1995
, 1996
). The decorin and
biglycan genes are both composed of eight exons, with the
positions of two introns corresponding with those identified in
fibromodulin, lumican, and PRELP (Fisher et
al. 1991
; Danielson et al. 1993
; Vetter et al. 1993
). The other five
introns are present in the LRRs at identical sites.
DSPG3, in contrast to the other SLRPs, is predominantly expressed in
cartilage (Shinomura and Kimata 1992
; Deere et al. 1996
; Kurita et al.
1996
). There are several important extracellular matrix proteins
expressed primarily in cartilage: collagen types II, IX, X, and XI,
aggrecan, and link protein (for reviews, see Heinegard and Oldberg
1989
; Hall and Newman 1991
). Promoter studies for these genes have
identified regions that may be important for the cartilage-specific
transcription of these genes, including a binding site for SOX9 in
the first intron of type II collagen (Nishimura et al. 1989
; Rhodes and
Yamada 1995
; Thomas et al. 1995
; Krebsbach et al. 1996
, Lefebvre et al.
1997
). SOX9 is a HMG (high-mobility group) transcription factor.
Mutations in SOX9 cause campomelic dysplasia, a skeletal
dysplasia, which suggests that SOX9 is important for the regulation of
genes in normal cartilage development (Foster et al. 1994
; Wagner et
al. 1994
).
In this study we have delineated the genomic structure of human DSPG3 and compared the exon/intron boundaries with those of lumican and decorin. We have also sequenced the promoter region of human DSPG3 and have identified transcriptional elements that may be important for the cartilage-specific expression of DSPG3. In addition, using available protein sequence data, we have performed the most complete evolutionary analysis of the SLRP gene family.
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Genomic Structure of DSPG3
The genomic structure of human DSPG3 is composed of seven
exons and spans more than 12 kb (Table 1). This
structure is conserved with murine PG-Lb (Iwata et al. 1998
).
Exon 1 consists of 5'-untranslated region, and the start codon is
present in the second exon. Exons 4-7 encode the LRRs. Exons 3 and 7 contain the potential glycosaminoglycan attachment sites (codons 64, 96, and 320), and exon 7 includes the consensus N-glycosylation sites
(codons 283 and 302), the stop codon, and 3'-untranslated region.
The majority of the intron sizes are ~1 kb. Exceptions are intron 1 (2.2 kb) and intron 2, whose size as determined by Southern blot
analysis is at least 5 kb (data not shown). All of the splice donor and
acceptor sites follow the consensus GT-AG rule. Two intronic
polymorphisms, with low heterozygosities, were identified: An
insertion/deletion of 19 nucleotides (TTGAACATCTGGCAGCAAT, nucleotides
3747-3765) in intron 1 (heterozygosity = 0.21) and a tetranucleotide
(TATT)n repeat in intron 5 (heterozygosity = 0.39) (Table
2) (GenBank accession nos. AF031658 and U63814,
respectively).
|
|
Promoter Sequence of DSPG3
The sequence of the promoter and first intron of DSPG3 has
been deposited in GenBank (accession no. AF031658). The start site
of transcription is 98 nucleotides upstream of the start codon (mRNA
sequence) as demonstrated by a ribonuclease protection assay (data
not shown). The transcription start site is 20 nucleotide downstream of
a TATA box (Fig. 1). There are several potential transcription factor binding sites present in the promoter sequence. The most noteable site, (A/T)(A/T)CAA(A/T)G, is the consensus DNA-binding site for HMG domain transcription factors, including SOX9 (Grosschedl et al. 1994
; Sudbeck et al. 1996
; Lefebvre et al.
1997
). This site is present 4 times in the promoter region and 10 times in the first intron of DSPG3. Several of these sites are
conserved with murine PG-Lb (Iwata et al. 1998
). Mutations in
SOX9 cause campomelic dysplasia, a skeletal dysplasia
associated with sex reversal and lethality (Foster et al. 1994
; Wagner
et al. 1994
). These mutations suggest that SOX9 is an important
transcription factor in cartilage, as well as testis development.
DSPG3 is primarily expressed in cartilage, and SOX9 may
play an important role in the tissue-specific expression of this gene.
Support for this conclusion comes from the cartilage-specific, type II
collagen gene, COL2A1. COL2A1 was recently shown to contain a
SOX9-binding site in the first intron that was necessary for the
correct tissue expression of the gene (Bell et al. 1997
; Lefebvre et
al. 1997
). Further studies will be necessary to prove whether the
potential SOX9-binding sites are necessary for DSPG3
expression in cartilage.
|
Evolutionary Analysis of the SLRP Gene Family
In this study we have compared the genomic structures of the human
SLRP genes and related proteins to determine whether there were
indications of shared placement of the introns (Fig.
2). Analysis of the newly determined genomic
structure of DSPG3 demonstrates that the first intron is
present in the 5'-untranslated region, whereas the last (sixth)
intron is present in the last LRR, a pattern similar to the genomic
structures of the other SLRPs. However, the placement of introns 2-5
in the LRR region of DSPG3 shows no correspondence with
introns 2-6 in decorin/biglycan (which are identical with
each other) that are also located in the region encoding the LRRs
(Fig. 2). Also, the last (seventh) LRR of DSPG3 aligns with
the seventh LRR of decorin and biglycan and not
the last (tenth) LRR. Therefore, the genomic structures of
DSPG3 and decorin/biglycan appear to have evolved
independently but with a shared tendency for the introns to occur in
the LRR-encoding region. The other major group of SLRP genes
(fibromodulin, lumican, and PRELP) have
three exons. As with DSPG3, the first intron occurs upstream of the start site, whereas the second occurs in the final LRR. It should be noted that the final LRR of DSPG3 is in repeat seven (LRRs 8-10 are absent or deleted) and the exact position of the
final intron differs between biglycan/decorin and
fibromodulin/lumican/PRELP. Also, the final intron in
fibromodulin is shifted 1 bp relative to that of
lumican and PRELP. Therefore, it seems that the
ancestral SLRP had one intron upstream of the start codon, and that
most, if not all, of the additional introns were introduced
separately in at least three lineages: DSPG3; biglycan/decorin;
fibromodulin/lumican/PRELP. This observation is in accord with the introns
late hypothesis (Stolzfus et al. 1997
).
|
To analyze further the evolutionary history of the SLRP gene family,
protein sequences from SLRPs and related LRR genes were aligned. Because the alignment was uncertain in the amino- and carboxy-terminal regions, only the LRR-containing region bounded by the
four amino-terminal and two carboxy-terminal cysteines was used in the
analysis (see Fig. 2) (The full alignment used is available on
request.). Table 3 summarizes data on the SLRPs and
related genes used in this analysis. neighbor-joining trees (Saitou and
Nei 1987
) were built with both p-distances (no correction for
multiple mutations) and poisson-corrected distance matrices, each
calculated by pairwise or complete deletions (Kumar et al. 1993
).
Because all four trees were very similar, only that using the
p-distance with complete deletion of sites in which one or more sequence has a deletion is shown in Figure 3.
The tree shows that the SLRP genes with similar genomic structures
group together. DSPG3 (PG-Lb) and
osteoglycin appear to have evolved separately from
biglycan/decorin and
lumican/fibromodulin/PRELP/osteomodulin. Our data correlates
with previous dendrograms that have been constructed (Bengtsson et al.
1995
; Iozzo 1997
, 1998
; Sommarin et al. 1998
). However, this is the
first study to include sequence data from multiple species allowing us
to analyze the conservation of these genes as a family.
|
|
Interestingly, in all four trees (Fig. 3; data not shown), the chicken
decorin gene is more closely related to the human and bovine
decorin genes than is the murine decorin gene.
Because the bootstrap support for this unexpected finding is highly
significant (97%, Fig. 3), it indicates that there was probably
a duplication of an ancestral decorin gene prior to the
bird/reptile/mammal divergence and that the cloned mouse gene
represents one paralog, whereas the chicken, bovine, and human genes
represent the other paralog. Somatic cell mapping of
decorin demonstrated two signals on chromosome 12, also
indicating that there may potentially be another member of the SLRPs
present at that chromosomal region that is highly homologous to
decorin (McBride et al. 1990
). This may prove to be the
human ortholog of the cloned mouse decorin gene.
In this study we have identified and sequenced the intron/exon borders of human DSPG3 and determined that the gene is composed of seven exons. Two intronic polymorphisms were identified and characterized. We have also cloned and sequenced the promoter and first intron of DSPG3. Several putative transcription factor-binding sites, including the potential SOX9-binding sites, were identified. Further analysis of the transcriptional elements present in DSPG3 will be necessary to determine the mechanisms involved in the specific regulation and expression of DSPG3 in cartilage. We have compared the genomic structures of DSPG3 and other members of the SLRP gene family, and have shown that the introns within the LRRs must have arisen separately in DSPG3 and decorin/biglycan. Our evolutionary analysis of the SLRP gene family confirms this hypothesis. SLRP genes with similar gene structures were more closely related to each other than they were to the other SLRP genes. It appears that the ancestral SLRP gene was composed of two (or possibly three) exons and that additional introns were inserted in DSPG3, decorin/bigycan, and (probably) fibromodulin/lumican/PRELP. In addition, there appears to have been intron slippage in the intron upstream of the start codon and in the second fibromodulin intron. It will be interesting to see how additional data (genomic structure, chromosomal location) on the osteoglycin and osteomodulin genes help shape the evolutionary scheme proposed, and whether a second decorin gene is present.
| |
METHODS |
|---|
|
|
|---|
Identification of Cosmids Containing DSPG3
cDNA template was amplified with primers hepn3/hepn2 (Deere et al.
1996
). This PCR product was random prime labeled and used to hybridize
a dot blot of chromosome 12-specific cosmids following standard
procedures. Cosmids 167H5, 24C10, 231B8, 133C5, 204F1 196B7, 61B9, and
207C11 were positive for the DSPG3 probe.
Identification and Sequencing of the Intron/Exon Borders of Human DSPG3
cDNA primers were used to amplify genomic DNA from cosmid 167H5 to
identify the locations of introns in the gene. The primer sets used
were hepn3/hepn15, hepn1/239861, bepn4/hepn6, and hepn5/hepn8 (Deere et
al. 1996
; Table 4). Intron/exon borders were
sequenced by a series of primers (Table 4) by direct sequencing of
cosmid 167H5 using an ABI automated sequencer. Sequencing of each
region was performed at least twice in two separate laboratories. The resulting sequence was analyzed by the GCG database system (Genetics Computer Group 1994
).
|
Analysis of Polymorphic Repeats
The 19-bp insertion/deletion in intron one was amplified from 120 unrelated caucasian individuals by PCR primers (forward, 5'-TCTTCACCTATAAAATGGTATGACA-3'; and reverse,
5'-TCTTCATTTTTCAAGCTTTCC-3') following standard conditions
(Sambrook et al. 1989
). The PCR products were analyzed on 6%
acrylamide gels.
PCR primers (forward, 5'-TTTGCTGTCATTGACTACC-3'; and reverse,
5'-GCGAAACCATGTCTCTAC-3') were designed to amplify the
tetranucleotide repeat (TATT)n in intron 5 of DSPG3
with a predicted PCR product size of 275 bp. Fifty-six unrelated
individuals were amplified following standard procedures (Sambrook et
al. 1989
). The samples were analyzed on 6% denaturing polyacrylamide
gels and silver-stained by the GelCode System (Pierce).
Sequencing Promoter Region of Human DSPG3
Cosmid 167H5 does not contain the promoter region of
DSPG3. Therefore, cosmid 207C11 was used to sequence the
promoter region. Cosmid 207C11 was subcloned into the EcoRI
site in pBlueScript SK(+). Clones were then sequenced with a series of
primers (Table 4) on an ABI automated sequencer. The promoter sequence
was verified by amplification and sequencing of the promoter region
from genomic DNA in a separate laboratory. The sequence was analyzed by
the GAP program from the GCG database system (Genetics Computer Group 1994
).
Ribonuclease Protection Assay
Primers, hepn46 (5'-GAATTTGTTACAGATGAGG-3') and hepn47 (5'-GCAAGTATAAAAACTTACCT-3'), were used to amplify the first exon and 313 bp of the promoter region. The PCR product was cloned into the pGEM-T vector (Promega), and the clone was digested with NcoI to linearize the DNA for the probe. The probe was transcribed with SP6 polymerase and gel purified. The ribonuclease protection assays were performed with the RPA II kit from Ambion, Inc.
Evolutionary Analysis of the SLRP Gene Family
Published protein sequences of the human, bovine, murine, and
chicken SLRP genes and related proteins were collected and an alignment
made of the region between the first and last cysteines flanking the
leucine-rich repeats by the LINEUP, PILEUP, and PRETTY programs from
the GCG database system (Genetics Computer Group 1994
). The proteins
included DSPG3 (PG-Lb), osteoglycin, biglycan, decorin, lumican,
fibromodulin, PRELP, and osteomodulin (Table 3). Alignments were
verified with TBLASTN output of the DSPG3 protein sequence for the
GenBank database (Genetics Computer Group 1994
). Phylogenetic trees
were built with the Molecular Evolutionary Genetics Analysis program,
version 1.01 (MEGA) (Kumar et al. 1993
). Four different
neighbor-joining trees were built from p-distance and
poisson-corrected distance matrices with both complete and pairwise
deletions with 1000 bootstraps (Saitou and Nei 1987
; Kumar et al. 1993
).
| |
ACKNOWLEDGMENTS |
|---|
We thank Dr. Raju Kucherlapati for providing the chromosome 12-specific cosmids. This research was supported by a Schissler Foundation Fellowship to M.D., Shriners Hospital for Children grants 15,955 and 15,957 to J.T.H., and grant support from The Academy of Finland and The Ulla Hjelt Fund to A.C. and J.L.D. The Department of Biomathematics at M.D. Anderson Cancer Center is supported by grant NCI CA-16672.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
8 Corresponding author.
E-MAIL jhecht{at}ped1.med.uth.tmc.edu; FAX (713) 500-5689.
| |
REFERENCES |
|---|
|
|
|---|
1(II) collagen gene is sufficient for expression in cartilage and binds nuclear proteins that are selectively expressed in chondrocytes.
Mol. Cell. Biol.
16:
4512-4523[Abstract].
1(II) collagen gene.
Mol. Cell. Biol.
17:
2336-2346[Abstract].Received July 10, 1998; accepted in revised form March 15, 1999.
This article has been cited by other articles:
![]() |
B. P. Schick, I. Petrushina, K. C. Brodbeck, and P. Castronuevo Promoter Regulatory Elements and DNase I-hypersensitive Sites Involved in Serglycin Proteoglycan Gene Expression in Human Erythroleukemia, CHRF 288-11, and HL-60 Cells J. Biol. Chem., June 29, 2001; 276(27): 24726 - 24735. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||