|
|
|
|
Vol. 10, Issue 8, 1095-1102, August 2000
REPORT
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
A mouse locus called Lgn1 determines differences in macrophage permissiveness for the intracellular replication of Legionella pneumophila. The only regional candidate genes for this phenotype difference lie within a cluster of closely linked paralogs of the Neuronal Apoptosis Inhibitory Protein (Naip) gene. Previous genetic and physical mapping of the Lgn1 phenotype narrowed it to an interval containing only Naip2 and Naip5, suggesting that there is not complete functional overlap among the mouse Naip loci. In order to gather more information about polymorphisms among the Naip genes of the 129 mouse haplotype, we have determined the genomic sequence of a substantial portion of the 129 Naip gene array. We have constructed an evolutionary model for the expansion of the Naip gene array from a single progenitor Naip gene. This model predicts the presence of two distinct families of Naip paralogs: Naip1/2/3 and Naip4/5/6/7. Unlike the divergences among all the other Naip paralogs, the splits among Naip4, Naip5, Naip6, and Naip7 occurred relatively recently. The high degree of sequence conservation within the Naip4/5/6/7 family increases the likelihood of functional overlap among these genes.
[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF242431-AF242435.]
| |
INTRODUCTION |
|---|
|
|
|---|
Macrophages isolated from C57BL/6J and A/J mice
exhibit differences in permissiveness for intracellular replication of
L. pneumophila (Yamamoto et al. 1988
). This phenotype
difference segregates as a single-gene trait in crosses between
C57BL/6J and A/J and maps to a locus on distal chromosome 13 (Yamamoto et al. 1991
; Yoshida et al. 1991
; Dietrich et al. 1995
; Beckers et al.
1995
). Detailed physical mapping of this locus, called Lgn1,
reveals that it contains a series of 50 to 80 kb highly homologous
direct repeats and that a cluster of Naip gene paralogs map
inside these direct repeats (Scharf et al. 1996
; Growney et al. 2000
).
The region of the human genome that is orthologous to the mouse
Lgn1 region also contains a series of highly homologous
repeated segments. The human spinal muscular atrophy (SMA)
region has what appears to be an inverted duplication of some 500 kb
(Lefebvre et al. 1995
). This amplified genomic segment contains several transcriptionally active genes, including copies of survival motor neuron (SMN); NAIP; general transcription factor II
H, polypeptide 2 (GTF2H2); and small EDRK-rich factor 1 (SERF1) (reviewed by Growney et al. 2000
). However, the only
gene in common between the amplified segments from the mouse and human
Lgn1/SMA intervals is Naip/NAIP
(Growney et al. 2000
).
The fact that the mouse and human Lgn1/SMA regions
both have divergently organized sets of closely linked repeats
indicates that these amplified segments originated independently in the mouse and human lineages. This observation begs the question of whether
the amplification of Naip/NAIP in either mouse or
human has any functional significance. Although most of the mouse
Naip paralogs are transcriptionally active and encode similar
but not identical proteins, it is not known whether these transcripts provide redundant or diverse functions (Huang et al. 1999
). These questions about the functionality of the mouse Naip loci are
important to the identification of the Lgn1 mutation because
the current critical interval for the Lgn1 phenotype contains
two different transcriptionally active Naip genes
(Naip2 and Naip5) (Growney and Dietrich 2000
; Huang
et al. 1999
).
Mapping and sequence analysis of the mouse Lgn1 interval
suggests that the Naip genes have arisen through a series of
several distinct amplification events emanating from a single ancestral Naip. This model of the origins of the mouse Naip
array relies heavily on the sequences (Fig. 1A) of a
single exon from the clustered Naip paralogs to build a
phylogenetic tree (Growney et al. 2000
). A more rigorous basis for
determining the relationships of the mouse Naip genes would be
to compare their entire genomic sequences.
|
In this paper, we report the complete annotated sequence of 26f17, a
220-kb bacterial artificial chromosome (BAC) clone that contains the
three Naip genes on the centromere-distal side of the array in
the 129 haplotype (Naip1, Naip3, and Naip6)
(Fig 1A; Growney et al. 2000
). In addition, we present three large annotated fragments of genomic sequence from 9045, a 75-kb P1 clone
mapping to the central portion of the 129 Naip array (Fig 1A;
Growney et al. 2000
). Our analysis of these genomic sequences has
provided additional markers to refine the map of the Lgn1 interval (Growney and Dietrich 2000
) and allowed us to refine the
previously reported model of the origins of the mouse Naip array.
| |
RESULTS |
|---|
|
|
|---|
Genomic Sequence Determination
The 220-kb BAC clone 26f17 was roughly mapped to the distal side of
the Lgn1 region by others (Diez et al. 1997
). Subsequent precise mapping of the clone identified it as an ideal template for
sequencing the Lgn1 interval because it covered a large extent of the distal side of the Naip gene array (Fig 1A;
Growney et al. 2000
).Our prior map information about this
clone suggested that it was likely to contain multiple copies of
Naip gene sequences; so we used a tiered strategy for the
sequence assembly (see Methods; Endrizzi et al. 1999
).
The final sequence assembly of this clone consists of two contiguous sequences covering 117,791 bp and 90,650 bp (GenBank accession nos. AF242431 and AF242432). We could not complete the sequence across the remaining gap with certainty because it was composed of a 300-bp simple sequence repeat. We were able to link the two contiguous sequences using the polymerase chain reaction (PCR), and our estimate of the total sequence length (208,448 bp) suggests an extremely small gap of only 7 bp (Fig. 1B). The two consensus sequences were derived from 3960 sequencing reactions, with every base in the consensus representing data from at least one sequencing reaction on each strand. The average per-base sequencing redundancy is over fivefold. The sequence assembly was analyzed extensively for consistency with known restriction digest and PCR amplification patterns from clone and genomic DNA, indicating that the sequence represents both the clone and the genomic structure with fidelity (data not shown).
P1 clone 9045 was identified by us several years ago and subsequently
mapped with precision into the center of the Naip array (Fig.
1A) in 129 (Scharf et al. 1996
; Growney et al. 2000
). We chose to
sequence this clone because of its position in the center of the
Naip array because it could reveal significant discrepancies from our model of the origin of this repeat. We used a similar strategy
for sequence assembly as we did for 26f17.
The final sequence assemblies for 9045 consist of three contiguous sequences totaling 72,460 bp (GenBank Accession nos. AF242433-AF242435). The holes in the sequence represent areas that are difficult to sequence because they contain microsatellite sequences. However, we measured the size of the remaining gaps in the sequence using PCR and found them to be quite small (Fig. 1C). The three consensus sequences were derived from a total of 1355 sequencing reactions and as with 26f17, every base in the sequence represents data from each strand. The average per-base sequencing redundancy is approximately fivefold. The total size of the known sequence and our estimates of the gap sizes are in accordance with our estimates of the size of 9045 from NotI digestion and pulsed field gel analysis (data not shown).
Discovery and Annotation of Genes in 26f17 and 9045
We have used several methods to discover and annotate genes in our
new genomic sequences. Because we knew that the clones were going to
contain Naip gene loci, the first
and most
straightforward
annotation relied on aligning known Naip cDNA
sequences to the clones (Fig. 1).
Naip Loci in 26f17:
Naip1. The distal-most Naip gene in the cluster, Naip1, spans 45 kb and has 16 exons, including an exon 2 in its 5' untranslated region (UTR), which is a sequence found only in Naip1, Naip3, and Naip2 (see below; Endrizzi et al. 1999
Naip sequences (see below; Growney and Dietrich 2000
Naip locus.
Unfortunately, our sequence of 26f17 does not extend into the region
where these
Naips should reside. Nevertheless, a marker
called D13Die30, that specifically amplifies
Naips from
genomic DNA, maps proximally to Naip6 (Growney et al. 2000
Naip loci
from our assembly of 9045 (see below). The only ortholog of
Naip6 contained in the C57BL/6J genome has been excluded from
the Lgn1 interval (Growney and Dietrich 2000Naip Loci in 9045:
Naip7. Naip7, which spans approximately 30 kb, has many similarities to Naip5 and Naip6, including the number of exons and the presence of repeated microsatellite markers characteristic of the central Naip array. In addition, it is similar to Naip6 but diverges from Naip5 in that it has a
Naip juxtaposed at its 3' end. As we noted for
Naip6, it is possible that this gene is transcriptionally
active, since cDNAs from a relative of this locus in another mouse
strain have been isolated (Huang et al. 1999
Naips. We have sequenced portions of two different
Naip loci in 9045. From these two partial
Naip
sequences, we discerned two important features. First, the
Naip loci, which span approximately 20 kb, begin with an
exon 7 that is juxtaposed extremely close to the exon 16 of the
adjacent Naip. Second, the marker content of the
Naip loci are similar to that of Naip3, as can be
seen by the presence of D13Die36, the size of its intron 13, and the absence of an exon 10. All these data point strongly to the possibility that
Naips are recently diverged relatives of
Naip3. However, one significant difference between
Naip3 and the
Naips is seen in exon 11, which is
present in only a fragmentary form in the
Naips.
In addition to aligning our sequences with cDNAs known to map into the
interval, we subjected them to a series of homology searches and gene
prediction programs using the Genotator and Genotator-Browser packages
(Harris 1997Alignments of Mouse Naip Sequences
Given the sequence relatedness of the mouse Naip gene loci, it is likely that they all share a single common progenitor. We have done alignments of the known mouse Naip sequences in order to shed some light about the nature of the events that have taken place since the divergence from a single Naip gene (see Methods). The data from these alignments is presented in Figure 2 and Table 1.
|
|
Inspection of Figure 2A, in which the alignments of the Naip
genes are represented as a Percent Identity Plot (PIP), shows that
Naip5, Naip6, and Naip7 are extremely
closely related to each other, confirming either that they are the
result of recent gene duplications or that they are subject to
homogenization via gene conversion. Similarly, Naip1,
Naip2, and Naip3 share extensive alignments with each
other, indicating that they are closely related (Fig. 2A; Table 1). The
amount of alignment and levels of homology among the two groups of
paralogs suggest an early duplication of an ancestral Naip,
leading to the progenitors of what can be called the Naip1/2/3
and Naip4/5/6/7 families (Fig. 2A; Table 1). Even though we do
not have genomic sequence for Naip4, we have included it in
the Naip4/5/6/7 group based on prior published data
demonstrating a high degree of similarity in marker content (Growney et
al. 2000
).
Although the amplification of the Naip5, Naip6, and Naip7 gene loci seems to be a recent event (as demonstrated by their extremely high level of sequence conservation and their virtually complete alignment that is broken only by the insertion of interspersed repeat elements), the amplification and divergence of the Naip1, Naip2, and Naip3 loci appears to have happened longer ago (as suggested by their lower level of sequence conservation and alignment). Our analysis of the overall conservation of alignments between the Naip1/2/3 sequences, suggests that Naip2 diverged from Naip1/3 before a more recent split between Naip1 and Naip3 (Fig. 2A; Table 1).
Our alignments of the Naip3 locus confirmed our suspicion that
the
Naip loci are extremely close relatives of
Naip3
No other Naip locus exhibited such extensive
alignment and high level of sequence identity (Fig. 2B). This suggests
that the formation of the
Naip loci occurred after the
split between Naip1 and Naip3. Similarly, because the
structure of the
Naip loci are identical throughout the
central Naip repeat, the formation of the
Naip loci likely occurred before or as part of the amplifications that created Naip5, Naip6, and Naip7. We
summarized our interpretation of these data in a model of expansion of
the mouse Naip array in Figure 3.
|
| |
DISCUSSION |
|---|
|
|
|---|
The arrangement of highly related genes in closely linked clusters
is commonly seen in mammalian genomes. Broadly speaking, these arrays
are of two types: those whose members have acquired important divergent
functions and those whose members are redundant in function. Examples
of closely linked gene families whose members have divergences in
function are seen in the cases of the color-vision genes and the
beta-globins (Nathans et al. 1986
; Yokoyama et al. 1993
; Fritsch et al.
1980
; Hardies et al. 1984
). Similarly, there are examples of the
occurrence of closely linked gene copies that are redundant in
function, such as is seen in the observed amplification of ribosomal
RNA genes in various organisms and in the cellular aquisition of
resistance to chemotherapeutic agents (Nath and Bollon 1977
; Raymond et
al. 1990
).
The mouse Naip gene cluster is interesting because we
currently do not know if it represents an example of functional
diversity, functional redundancy having some important phenotypic
consequence or even perhaps a fixation of an amplification that has no
functional impact on the organism. Furthermore, the mouse Naip
cluster is interesting because one of the members of this family must
play an important role in determining the permissiveness of macrophages to the intracellular replication of L. pneumophila (Growney et al. 2000
). In light of these unanswered questions, we have determined the genomic sequence of substantial portions of the mouse Naip gene array from 129 in an attempt to measure the relatedness of all the
Naip genes.
In our analyses of these genomic sequences, we have definitively ascertained that the mouse Naip gene cluster can be divided into two families: the Naip1/2/3 family and the Naip4/5/6/7 family. The sequence relations of the members of these two families suggests that the Naip4/5/6/7 family members have diverged from each other relatively recently and may, as a consequence, share more functional relatedness than the members of the Naip1/2/3 family. However, since the molecular functions of each of the mouse Naip paralogs have been incompletely described, the sequence data alone cannot be used to make definitive statements about potential similarities or differences in function.
Nevertheless, two lines of additional evidence indicate that the
functions of the different mouse Naip paralogs can be
separated from each other. First, the recent report of a knockout of
the Naip1 gene illustrates a function of this gene in neuronal
survival during physiological insult (Holcik et al. 2000
). It is
unclear whether the inability of the other Naip gene paralogs
to compensate for the loss of Naip1 function has to do with
differences in the molecular activity of the Naip proteins,
with an overall diminishment of Naip function or with some
tissue specificity in expression of the Naip paralogs.
The second line of evidence in favor of divergent functions of the
mouse Naip genes comes from our knowledge of the genetic map
position of the mouse Legionella susceptibility locus
(Lgn1). Lgn1 has been mapped to an interval that only
includes Naip2 and Naip5, suggesting that the other
Naip paralogs cannot compensate for a mutation in one of these
genes (Growney and Dietrich 2000
). Unfortunately, based on the current
information, it is impossible to tell which of the two remaining
candidates is responsible for the Lgn1 phenotype.
Remaining unanswered is the broader question of whether the differences
in Naip/NAIP gene content in the mouse and human genomes indicate differences in gene function between the two species. Based on
previously published data, it seems that there is only a single human
NAIP locus that produces an intact, translationally competent
transcript (Roy et al. 1995
). Unfortunately, critical pieces of
information about the human region are missing or unclear.
For example, while it is well documented that differences in the
structure of the SMA region exist among human individuals, only a few haplotypes have been mapped in detail (Lefebvre et al. 1995
;
Roy et al. 1995
). The situation is further complicated by the fact that
human genomic libraries consist of clones from at least two different
haplotypes. Given that assembling a sensible map of the mouse
Lgn1 region was extremely difficult in a situation where only
one haplotype was being assembled, the complexity of making a
consistent human map from mixed haplotype libraries presents even more
of a challenge (Growney et al. 2000
; Growney and Dietrich 2000
).
Indeed, it remains possible that there is more variation in the number
of functional NAIP sequences among human individuals than had
been previously believed because of the technical difficulties involved
in mapping the region. In addition, the extent of human variation in
permissiveness to Legionella replication is currently unknown,
making any cross-species structure-function comparisons impossible.
Because of the complexities of mapping and studying the human interval, it seems likely that the mouse will serve as a springboard for progress into understanding the origins and functional diversity of the Naip array. Not only can the structures of the Naip array be well described in inbred mouse strains, but we and others are making significant progress in elucidating the functional roles of these genes in a variety of processes. With regard to identifying the Lgn1 gene, it is most likely that further comparative sequencing in search of causative mutations in Naip2 or Naip5 and/or attempts to complement the phenotype will resolve the matter. These experiments are currently underway in our laboratory.
| |
METHODS |
|---|
|
|
|---|
Sequencing
The strategy used for determining the sequence of clones that
contain multiple copies of highly related regions was described extensively elsewhere (Endrizzi et al. 1999
). Here, we briefly describe
the technical aspects to the sequencing.
BAC DNA Isolation
We isolated BAC (26f17) DNA from 100 ml overnight cultures (LB with 12.5 µg/ml chloramphenicol) following Research Genetics' BAC miniprep protocol. We isolated P1 (9045) DNA from 500 ml overnight cultures (LB with 50 µg/ml kanamycin) using Qiagen's Large Construct Kit.Library Construction
We sheared 10 µg of BAC DNA in 50 µl of 1X Mung Bean buffer (New England Biolabs) using a sonicator and made the fragment ends blunt by incubating 0.5 µl of Mung Bean nuclease with the sheared DNA for 30 min at 30° C. We ran total DNA through a 1% low-melt agarose gel (FMC) in 1X TAE buffer at 1.5 V/cm for 16 hr alongside a 1 kb DNA ladder (GIBCO). We excised DNA fragments in the range of 3.5 to 4.5 kb, extracted with buffer-saturated phenol and after ethanol precipitation, resuspended in 20 µl dH2O. We quantified the size-selected DNA against a low mass ladder (GIBCO) using an agarose gel. We ligated 150 ng of blunt-end murine DNA to 50 ng of dephosphorylated, SmaI blunt-cut pUC18 vector (Pharmacia) at 14° C for 16 hr and used 2 µl of the ligation reaction for transforming DH5
ultracompetent Escherichia coli cells (GIBCO).
Sequencing Template Preparation
We picked colonies by hand and inoculated in 96 deep-well plates containing 1.25 ml of TB plus ampicillin (50 µg/ml final). Cultures grew at 37° C for 20 hr while shaking at 225 rpm. We isolated plasmids using a 96-well alkali lysis protocol (Edge Biosystems) and resuspended in 30 µl of 1 mM Tris-Cl.Sequencing Reactions
We sequenced 500 ng of template using ABI Big Dye terminator chemistry (Perkin Elmer) according to the manufacturer's specifications. We performed the reaction in an MJ Research thermal cycler (PTC-225). We purified reactions with 96-well filter plates (Edge), dried samples in a Speedvac evaporator, and stored the samples at
20° C until resuspending in loading buffer. We used both an
ABI 377 and an ABI 3700 for detection. We extracted DNA sequences using Bass, Grace, and Trout (Whitehead/MIT) for ABI 377 data and ABI Data
Collection software (Perkin Elmer) for ABI 3700 data.
Assembly
We imported approximately 4X coverage for each genomic clone in sequence reads from both ends of 4-kb subclones into a Gap4 database (Staden 1996Long PCR to Obtain Gap-spanning Fragments
We chose primers for long PCR using Primer 0.5 on consensus sequence from the ends of assembled contiguous sequences for which we had no linking subclones (Lincoln et al. 1991Confirmation of Sequence
To check the sequence assembly for errors, we compared the restriction digest pattern of each clone to a virtual digest of the consensus sequence. In all cases, the predictions were consistent with the digest pattern (data not shown).Analysis and Annotation of the Sequence
Alignment with Known cDNA Sequences
We assembled sequences of Naip cDNAs (Huang et al. 1999Genotator
After the assembly was complete, we utilized Genotator/Genotator Browser (Harris 1997Alignments with Mouse Paralogous Sequences
Sequences were aligned using a program called Blastz (Schwartz et al. 2000
1; gap of length
k,
6-0.2k) and the Chaining option, which forces
aligned regions to have the same order and orientation in the two sequences.
Display of Alignments
For overviews of the alignment results, we used a visual representation called the percent identity plot (PIP) (Oeltjen et al. 1997| |
ACKNOWLEDGMENTS |
|---|
We thank Victor Boyartchuk, James Watters, and Rebecca Mosher for critical evaluation of the manuscript and Jeremiah Scharf and Lou Kunkel for helpful discussions. This work was supported by a grant from the Muscular Dystrophy Association to W.F.D., who is an assistant investigator of the Howard Hughes Medical Institute. W.M. was supported by grant LM05110 from the National Library of Medicine.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
4 Present address: Whitehead Institute/MIT Center for Genome Research, Cambridge, MA 02142.
5 Corresponding author.
E-MAIL dietrich{at}rascal.med.harvard.edu; FAX (617) 432-3993.
| |
REFERENCES |
|---|
|
|
|---|
Received March 15, 2000; accepted in revised form June 2, 2000.
This article has been cited by other articles:
![]() |
Y. Yin, W.-W. Huang, C. Lin, H. Chen, A. MacKenzie, and L. Ma Estrogen Suppresses Uterine Epithelial Apoptosis by Inducing Birc1 Expression Mol. Endocrinol., January 1, 2008; 22(1): 113 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Dziarmaga, P.-A. Hueber, D. Iglesias, N. Hache, A. Jeffs, N. Gendron, A. MacKenzie, M. Eccles, and P. Goodyer Neuronal apoptosis inhibitory protein is expressed in developing kidney and is regulated by PAX2 Am J Physiol Renal Physiol, October 1, 2006; 291(4): F913 - F920. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Reed, K. S. Doctor, and A. Godzik The Domains of Apoptosis: A Genomics Perspective Sci. Signal., June 29, 2004; 2004(239): re9 - re9. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. F. Dietrich Using Mouse Genetics to Understand Infectious Disease Pathogenesis Genome Res., March 1, 2001; 11(3): 325 - 331. [Full Text] |
||||
![]() |
Q. Wu, T. Zhang, J.-F. Cheng, Y. Kim, J. Grimwood, J. Schmutz, M. Dickson, J. P. Noonan, M. Q. Zhang, R. M. Myers, et al. Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters Genome Res., March 1, 2001; 11(3): 389 - 404. [Abstract] [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||