|
|
|
|
Vol. 12, Issue 12, 1871-1884, December 2002
LETTER
|
| |
ABSTRACT |
|---|
|
|
|---|
The diversity of the largest group of plant disease resistance genes, the nucleotide binding site-leucine-rich repeat (NBS-LRR) genes, was examined in cereals following polymerase chain reaction (PCR) cloning and database mining. NBS-LRR genes in rice are a large and diverse class with more than 600 genes, at least three to four times the complement of Arabidopsis. Most occur in small families containing one or a few cross-hybridizing members. Unlike in Arabidopsis and other dicots, the class of NBS-LRR genes coding for a Toll and mammalian interleukin-1 receptor (TIR) domain were not amplified during the evolution of the cereals. Genes coding for TIR domains are present in the rice genome, but have diverged from the NBS-LRR genes. Most cereal genes are similar in structure to the members of the non-TIR class of dicots, although many do not code for a coiled-coil domain in their amino termini. One unique class of cereal genes, with ~50 members, codes for proteins similar to the N-termini and NBS domains of resistance genes but does not code for LRR domains. The resistance gene repertoire of grasses has changed from that of dicots in their independent evolution since the two groups diverged. It is not clear whether this reflects a difference in downstream defense signaling pathways.
[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank under accession nos. AF516886-AF516895.]
| |
INTRODUCTION |
|---|
|
|
|---|
Plants use a variety of different types of
disease-resistance genes to detect the presence of pathogens and induce
defense responses. The largest class of these genes code for proteins with nucleotide binding site (NBS) and leucine-rich repeat (LRR) domains (Bent 1996
; Hammond-Kosack and Jones 1997
; Hulbert et al.
2001
). The Col0 ecotype of Arabidopsis has been estimated to
carry ~150 genes coding for NBS-LRR proteins, or more if genes coding for truncated versions of the protein are considered, and the
rice genome was estimated to carry even more (Meyers et al. 2002
). No
function other than disease resistance has yet been assigned to this
large class of genes.
NBS-LRR genes in plants are typically divided into two classes
depending on whether they code for a TIR domain (having homology to the
intracellular domain of the Drosophila Toll and
mammalian interleukin-1 receptors in their
N-terminus). The TIR group genes are composed of an N-terminal TIR
domain, a central NBS domain, and a C-terminal LRR region. This group
of genes has been observed only in dicot plant species (Meyers et al.
1999
; Pan et al. 2000a
, Goff et al. 2002
). The non-TIR group is
sometimes referred to as the coiled-coil (CC) group because they
typically have CC domains at their N termini.
The sequences of the central portion of NBS-LRR genes, including the
NBS domain, have been used extensively to identify and to classify
these genes. The popular use of this domain stems from a number of
reasons: The NBS domain has some conserved amino acid motifs that
assist in cloning these genes via PCR amplification and recognizing
them in databases; the conserved motifs assist in aligning the
sequences for phylogenetic analyses, and classification of NBS-LRR
genes by their NBS region sequences accurately predicts whether they
belong to the TIR or non-TIR class (Meyers et al. 1999
; Pan et al.
2000a
).
While LRR regions typically appear to be under strong diversifying
selection pressure, NBS domains do not, or at least not to the same
extent (Parniske et al. 1997
; McDowell 1998
; Meyers et al. 1998
; Sun et
al. 2001
). This likely reflects the role of the LRR region in
recognition of constantly evolving pathogen ligands and a role for the
NBS domain in recognition signaling (Ellis et al. 1999
; Ellis et al.
2000
; Dodds et al. 2001
). Analysis of the L locus of flax has
demonstrated that the TIR domain can also play a role in determining
pathogen recognition and that it may be under diversifying selection
like the LRR (Luck et al. 2000
). LRR regions can also be difficult to
use for comparative sequence analysis because even closely related
genes often show size polymorphisms, making alignment difficult.
There are now sufficient sequences available from the cereals, especially rice, to reveal the diversity and general nature of NBS-LRR genes and related genes in cereal genomes. The present study was conducted to characterize the numbers and structures of NBS-LRR genes in rice for comparison with those of dicot species. Comparisons of these genes with those of other cereal species can be used to identify possible orthologs. We also describe a collection of probes that can be used for mapping and isolating these genes in different cereal crops.
| |
RESULTS |
|---|
|
|
|---|
Isolation of NBS Clones for a Probe Collection
Over 150 sequences were isolated by PCR amplification and cloning of
the NBS region of rice and maize NBS-LRR genes using primers designed
from conserved regions of known cereal NBS-LRR genes, or from
sequences that were mined from the databases (see below). Most primers
were designed to match the conserved P-loop motif marking the
N-terminal end of the NBS domain and a conserved MHD motif that occurs
near the beginning of the LRR. This allowed isolation of the
whole NBS region of the gene, corresponding to ~900 nucleotides. The
majority of the clones were amplified from the rice cultivar
Nipponbare. Nondegenerate primers designed from rice sometimes
amplified maize DNA but usually worked poorly and provided only six
unique clones from maize. Many primer pairs amplified several closely
related genes. When these related gene fragments were used as probes in
gel-blot hybridization experiments, they typically detected identical
restriction fragments (data not shown). To make a collection of gene
probes that would each identify different families of R genes (Table
1), only sequences that
were sufficiently different from previously collected sequences were
retained. An arbitrary cutoff of 75% amino acid identity was used to
represent sequences from different families, because sequences with
less identity than this typically identified different fragments in
genomic hybridization experiments (data not shown). Using this
criterion, the cloned rice probes were derived from 96 different
NBS-LRR families.
|
Collection of NBS Sequences by Database Mining
The initial stages of NBS-LRR gene sequence collection consisted of
mining sequences (mostly gene fragments) from GenBank and the Monsanto
Rice Genome Sequence Database. Searches for rice sequences in the
GenBank nonredundant and the high throughput genomic sequence databases
were performed in March 2002 using a variety of predicted protein
sequences from monocot and dicot NBS-LRR genes as queries. Examination
of all sequences with TBLASTN scores >1e-4 yielded 216 rice bacterial
artificial chromosomes (BACs) with one or more genes predicted to code
for NBS-LRR proteins. An additional 61 gene fragments were identified,
mostly from PCR-amplified genomic fragments including parts of
NBS-coding sequences. The Monsanto database contained 259 Mb of
assembled sequence from the Japonica rice cultivar Nipponbare. We were
able to collect 144 sequences that code for the complete NBS area, from
the P-loop to the MHD motifs from this database. A very large Indica
(line 93-11) rice database (Yu et al. 2002
), consisting of assembled whole-genome shotgun sequences of >360 Mb of the rice genome, was also
searched for NBS-LRR sequences. Approximately 560 different sequence
contigs containing NBS-LRR sequences were identified in this database.
Most of these contigs coded for at least one sequence of the complete
NBS-coding area of the gene (P-loop to MHD) and 253 predicted
full-length genes were identified. Altogether, including sequences from
the NBS-coding probe collection and the GenBank, Monsanto, and Indica
databases, over 1080 sequences coding for complete NBS areas were
examined. Pairwise comparisons of these sequences allowed the
identification of identical sequences from different databases and
their classification into different families. Genes were grouped into
354 different families, where members of a family share >75% amino
acid sequence identity with other members. Efforts were made to isolate
at least one full-length coding region for each family of NBS-LRR
genes. These predicted coding regions can be searched or examined at
(http://coding.plantpath.ksu.edu/blast/blastNBS.html).
Estimation of NBS-LRR Gene Number and Copy Number of Different Families
Approximately 560 sequences predicted to code for NBS domains of
NBS-LRR genes were identified in the Indica database. To project a
total number of NBS-LRR genes in the rice genome, an estimate of the
number of NBS-LRR genes missing from the Indica database was made. The
predicted amino acid sequences of the 96 cloned NBS fragment probes
were used in TBLASTN searches with the Indica sequences. The 96 sequences represent a diverse collection of genes (see below) and
should represent an unbiased estimate of the coverage of the whole
genome. Most of the sequences used in the search were from the Japonica
cultivar Nipponbare, so that identification of Indica alleles for
specific sequences was sometimes ambiguous. In most cases, near-perfect
matches were identified; 74 of the 96 clones matched Indica sequences
with 98% or better identity for at least 150 amino acids (Table 1). In
other cases, presumed alleles were identified but sequence identity was
lower. For example, two sequences, rNBS21 and rNBS70, which were
estimated to exist in single copies in the genomes (below), were
matched by single sequences in the Indica database that were ~95%
identical. If sequences with identities of 95% or better amino acid
identity are considered probable alleles, then the Indica database
carries alleles for all but 14 of the 96 sequences tested (85%). If
the 560 NBS-coding sequences in this database represent 85% of the NBS-LRR genes in the genome, the estimated number of genes would be
~660, which is very close to the estimate of Goff et al. (2002)
.
The average copy number of 73 of the rice NBS-region clones was
estimated by hybridizing to DNAs of four rice varieties, each cut with
four different enzymes (Fig. 1). The rice
varieties examined included one Indica line (IR64), two Japonica types
(Azucena and Gihobyeo) and cultivar Milyang23 derived from an Indica X
Japonica cross. All of the probes hybridized to all four cultivars
demonstrating that both rice subspecies generally carry the same
families of NBS-LRR genes. The ~30 probes that appeared to detect
single-copy genes typically revealed one or two bands in most enzyme
digests of all four cultivars, although lanes in some digests were
sometimes lacking bands, probably because small fragments migrated off
the gel. Multiple-copy probes usually detected similar numbers of fragments in the different cultivars, with one exception. The rNBS41
probe identified an estimated five genes in the Azucena and Gihobyeo
cultivars, but single genes in the other two. This copy number
difference was also apparent when the sequence databases were examined.
The Indica database carried a single gene that was highly similar to
rNBS41, while the Monsanto database carried six sequences with >70%
amino acid identity. The probe therefore detects a family that has
become amplified in some lines, possibly most Japonica types. The
number of hybridizing restriction fragments is probably not an accurate
reflection of genomic copy number for the genes with several copies. In
our experience with the maize Rp1 and Rp3 gene
families, lines with 10 to 20 family members generally show fewer
distinct fragments with most enzymes (Webb et al. 2002
).
|
Phylogenetic Analysis of the NBS Region of Rice NBS-LRR Genes
As mentioned above, the rice sequences were grouped into 354 different families by sequence similarity. The predicted amino acid
sequences of the NBS regions of one member of each of these families
were aligned for phylogenetic analyses (Fig.
2). Among the rice genes aligned were three
conferring known resistance phenotypes: the Xa1 (Yoshimura et
al. 1998
), Pib (Wang et al. 1999
), and Pi-ta (Bryan
et al. 2000
) genes. Four other cereal genes with demonstrated or
suspected resistance phenotypes were also included for comparison;
these were barley Mla1 (Zhou et al. 2000
), and single members
of the maize Rp1, putative Rp3 (Webb et al. 2002
),
and PIC19 (Collins et al. 1998
) gene families. The different
rice sequences formed many distinct clades in the phylogenetic analysis, forming 117 groups when bootstrap values of >75% were used
to define the groups. Some of the clades were composed of single
families or even single genes. For example, the rNBS1 and rNBS69
sequences each form a distinct branch on the tree (Fig. 2) and detect a
single gene in gel-blot hybridization experiments (Table 1). Other
families formed distinct groups with long branch lengths. These have
apparently arisen from ancient duplication events where the different
families have diverged considerably in sequence, but still show good
homology in conserved regions. Other rice genes are grouped into
nondistinct subgroups with different branch lengths indicating a range
of different times for the duplication and divergence of their family
members. The maize and barley sequences were dispersed on the tree into
different clades of rice sequences. Similar results were found when
other cereal sequences were included in the analysis (not shown). This
would be expected if these groups of resistance genes had already
differentiated when the different cereal lineages separated.
|
Architectural Diversity in Cereal NBS-LRR Genes
Full-length sequences of cereal NBS-LRR genes were compared to examine their structural diversity. Sequences compared included several cereal genes for which full-length transcripts had been characterized, including the known resistance genes Rp1, Mla1, Xa1, Pib, and Pi-ta, and a full-length rice cDNA AB017914. Full-length transcripts for two additional maize gene families, the putative rp3 family and the PIC19 family, were also isolated. Other sequences included coding regions predicted from genomic sequences. These include annotated sequences obtained from GenBank and gene predictions from genomic sequences by GENSCAN and FGENESH. To represent the full range of diversity of NBS-LRR sequences in rice, we examined a full-length coding sequence for most rice NBS-LRR gene families included in the phylogenetic analysis. Predicted full-length members were identified for >250 genes.
N-terminal Domain Structures
Most of the N-terminal regions in the cereal genes ranged from 200-250 amino acids from the start of the coding region to the beginning of the NBS domain (P-loop), similar to most non-TIR genes in dicots. Because the non-TIR genes from dicots typically have CC motifs, the cereal sequences were examined for this domain structure. Using the COILS program (threshold set to 0.9), CC motifs were apparent in only 47 of the 100 randomly selected sequences. The Paircoil program predicted even fewer CC domains with a threshold of 0.5. The predicted CC domains were poorly conserved in sequence and in their position, occurring in the beginning, middle, or ends of the N-terminal regions. Most, but not all, of the N-terminal sequences could be aligned reasonably well using alignment programs like ClustalX because of their sequence similarity. To look for conserved aspects of the sequences that were common to all the genes, we examined them with the MEME and Block Maker programs. One conserved sequence motif, designated nT (for non-TIR), was identified, which occurred in nearly all of the rice NBS-LRR genes examined. The nT motif (WVxxIRELAYDIEDIVDxY) was usually located ~130 amino acids before the P-loop. The N-terminal region of the Xa1 gene was unusual. Initial analysis indicated the region was relatively long compared to the other genes, coding for a predicted 327 amino acids before the P-loop. A CD (Conserved Domain) search of GenBank (deploys Pfam and Smart databases and NCBI collections) predicted that it codes for a zinc-finger, DNA-binding domain (gnl|Smart|smart00614, score = 69.7 bits [169], expect = 4e-13). The zinc-finger domain corresponds to residues 140-188 of the predicted amino acid sequence, within a relatively typical N-terminal domain and before the NBS domain. Database searches found two other genes with similar amino termini. The genes clustered together with Xa1 in the phylogenetic analyses based on NBS-coding sequences (Fig. 2; sequences rNBS19 and AL606660-5). The two genes flank a gene that is highly similar (presumably allelic) to Xa1 within a 63-kb interval of a BAC clone in the GenBank HTGS database (AL606660). The amino acid sequences of both genes align well with Xa1 but both have diverged, showing only 54% (rNBS19) and 47% (AL606660-5) amino acid identity in the NBS region. Both genes were predicted to code for zinc-finger, DNA-binding motifs (expected probabilities = 9e-07 and 3e-06).The LRR and C-Terminal Regions
The leucine-rich repeat regions of the rice NBS-LRR genes were quite variable in size and sequence. The repeats in most of the genes were imperfect, with few repeats conforming to a consensus sequence. In some, like Pib (Wang et al. 1999Intron Positions in the NBS Regions
Introns in the NBS region of cereal NBS-LRR genes have important practical implications for identifying resistance gene sequences by PCR amplification with degenerate primers or identifying them in genomic sequence databases and distinguishing potentially functional genes from pseudogenes. Intron positions can also be used to support phylogenetic interpretation of the relationships between the genes. In a survey of 20 characterized dicot NBS-LRR resistance genes, only the Arabidopsis Rpp8/Hrt gene family had introns in the NBS domain. Three of the characterized cereal resistance genes have introns in their NBS region, that is, Mla1 (Zhou et al. 2000
|
NBS Regions With Unusual Structures
As described by Wang et al. (1999)A Novel Class of Rice Genes Has No LRR-Coding Region
While mining NBS-LRR sequences from GenBank, we identified a gene family in rice with a different structure than the known NBS-LRR resistance genes. The most striking difference is that none of the family members possesses an LRR domain. We found a total of 32 genes on five BAC clones from the GenBank database. Eleven of the genes reside on a 202-kb interval spanned by two BAC clones (AC079843 and AC074283) on chromosome 10. The distances between the genes in these two overlapping BACs range from 5.5-52 kb. The first three genes are in the opposite orientation compared to the other eight genes. Another nine clones were closely spaced, in the same orientation, on a 43-kb interval of rice chromosome 1 (AP003292). A single member was found on another BAC clone (AP000570) on chromosome 1. Eleven additional genes were found in the same orientation on a 50-kb interval of a chromosome 7 BAC clone (AP003810). Only one of the genes was interrupted by a gap in the sequence of this clone. Searches of the Monsanto database with these genes identified four additional genes with sequence and structural similarity. A search of the Indica database found 24 genes coding for proteins with 97-100% sequence identity to those found in the Japonica databases, and these were therefore considered possible alleles. Fourteen additional genes from the Indica database were <90% identical to any of the Japonica genes. It is difficult to determine the degree of genomic clustering or the map positions of genes from the Monsanto and Indica databases, as they were identified on smaller, unmapped sequence contigs. In total, 50 genes in this class were identified (Fig. 3). Five partial gene sequences were also observed on some of the smaller sequence contigs that were not identical to any of these 50, indicating that there are more than 50 genes in this class. The predicted coding regions were composed of single exons and ranged from 385-556 amino acids. One sequence (from Indica contig 32812) was predicted to code for only 258 amino acids, but may be a pseudogene because it appears truncated at both the N and C termini. Two other genes from GenBank (AP003292-4 and AC79843-5) and four genes in the Indica database were predicted to have introns, but on closer inspection were found to be likely pseudogenes. At least some of the genes are transcribed, as rice ESTs (BE229855, AU096505, AU166590, AU063352, and AA754293) matching five of the genomic sequences were found in GenBank.
|
|
|
Rice Genes Similar to TIR Coding Sequences
To determine if the rice genome carries any genes with TIR domains, we searched databases with a consensus TIR sequence designed from the tobacco N, flax L and Arabidopsis Rpp5 genes. One sequence was identified in both the Monsanto (OSM12752) and GenBank HTGS (AP003932 and AP003866) databases. A GenScan analysis predicted a coding region composed of three exons coding for 196, 21, and 29 amino acids separated by introns of 121 and 644 nucleotides. A presumably allelic sequence was identified in the Indica database that coded for a protein with only three amino acid differences. An alignment with the TIR domains of N, L, RPP5, TOLL and a human Toll-like receptor gene showed similarity throughout the whole predicted protein (not shown). The sequence apparently represents an expressed gene in cereals, as a barley EST (accession no. BI948029) was identified that was very similar to the rice gene (72% amino acid identity). The gene is also similar (50% identity for 144 amino acids) to a predicted Arabidopsis protein (accession no. AAG52286). The Arabidopsis protein is also small (199 amino acids) and composed mainly of a TIR domain. A second class of genes was identified that code for divergent TIR and NBS domains. In total, three genes from this family and one pseudogene were identified. The first was identified in the GenBank (AP000364; protein_id = BAB61209.1) and Monsanto databases (OSM1850). The Indica rice sequence Contig4057 also codes for most of a protein (except the first 106 amino acids) that is identical in sequence to this predicted protein. A second coding region was carried on overlapping GenBank sequences (AP003256 and AP003274) and on contig OSM15552 and was >99% identical to a sequence coded by a presumed allele on the Indica sequence Contig2492. An additional gene was coded by the Indica sequence Contig17995, and Contig5477 appears to code for a pseudogene with a stop codon and at least one small deletion. A sequence similar to this latter locus was present in the Japonica rice sequences of the Monsanto database, but it could not be determined if it was a potentially functional allele because the coding region was incomplete. The NBS regions of these genes code for motifs similar to most of the conserved motifs in NBS-coding domains of R genes (Table 3), but their sequences diverge from the R genes after the GLPL motif. The three genes were predicted to encode functional proteins ranging from 986 to 1002 amino acids. The genes code for~165-168 amino acids before their TIR-like domain and this N-terminal region is the least conserved region of the gene (Fig. 5). One feature common to the N-terminal regions of these genes is that they are serine-rich, with 14.5%-23% serine residues. Most of the remainder of the coding regions are more conserved, including the C-terminal 250-300 amino acids domain after the predicted NBS domain. No known protein-coding domains were detected in the C-terminal region, but several highly conserved sequences were apparent among the genes. The C-terminal 275 amino acids of the three genes range from 12%-14.5% leucine. This presents the possibility that these sequences evolved from a degenerate LRR, but the patterns of leucines poorly match that of a leucine-rich repeat.
|
A Phylogeny of Rice Genes Based on NBS Region Sequences
To examine the evolutionary relationships of the different types of NBS-coding sequences in rice, the sequences of the rice nT-NBS genes and TIR-NBS genes were compared to representative NBS-LRR genes from rice and other species (Fig. 6). The rice NBS-LRR genes were selected to represent a diversity of different clades from the previous analysis of these genes (Fig. 2). Amino acid sequences utilized from the nT-NBS and TIR-NBS genes were limited to the region between the P-loop to the GLPL motif because of the limited identity between classes outside this region. For comparative purposes, several NBS-LRR resistance genes from dicots also were used, including five representative genes from the TIR class (Arabidopsis Rpp1, flax L6 and M, and tobacco N) and four genes from the non-TIR subclass (Arabidopsis Rps5 and Rpm1, tomato I2 and potato Rx). Two Arabidopsis genes that were related to the rice TIR-NBS genes also were included. Trees based on distance and parsimony had similar topologies.
|
Similarity Between Rice NBS-LRR Genes and Those From Other Cereals
To estimate the extent to which the genomic sequences include most NBS-LRR gene families, we selected 61 rice gene fragments that potentially code for partial NBS-coding domains from GenBank and used to search the available genomic sequences. Forty-nine (80%) of the NBS gene fragments matched highly similar genomic sequences (
95% amino
acid identity) and all but one matched sequences with >85% sequence
identity. The remaining sequence identified a less closely related
family member, with 77% amino acid identity. This indicates that
members of nearly all of the NBS-LRR families are represented in the
available genomic sequences, although the gene fragments used in the
similarity searches are clearly not a random sample of the NBS-LRR
genes. The rice sequences in our clone collection and in the GenBank,
Monsanto, and Indica databases should therefore represent most of the
NBS-LRR genes in rice, or at least carry members of most NBS-LRR gene families.
A similar approach was used to examine the extent of similarity of rice
NBS-LRR genes to those of other cereal species. Forty-seven fragments
that potentially code NBS genes from different cereal species were used
to identify the most similar rice sequences in the genomic sequence
databases (Table 4). Comparisons of
the 47 sequences to one another revealed 28 groups or families with >75% amino acid identity within the families. The different members of these groups generally showed similar levels of identity to the same
rice sequence. The different cereal genes exhibited a range of sequence
similarities to the rice genes. Two of the maize sequences (mNBS2 and
AF056161) showed 84% and 85% amino acid identity with rice sequences,
and one wheat (AF087521) sequence showed 85% identity. Alternatively,
many of the 28 families did not identify likely orthologs.
Surprisingly, 10 of the 28 families showed amino acid identities of
60%.
|
| |
DISCUSSION |
|---|
|
|
|---|
As the largest class of disease resistance genes, the NBS-LRR genes
play a critical role in defending plants from a multitude of pathogens
and pests. The availability of nearly complete genomic sequences of two
distantly related plant species, rice and Arabidopsis, allows
comparative evolutionary analyses of these genes. Previous analyses of
available cereal sequences have implied that the cereal NBS-LRR genes
may be more homogeneous in their domain architecture than similar genes
in dicots. The dicot genes can be divided into two distinct groups,
those coding for a TIR domain at their N-termini and those without the
TIR domain. The TIR class NBS-LRR genes account for the largest
proportion of the NBS-LRR genes in the Arabidopsis genome
(http://niblrrs.ucdavis.edu/At_RGenes/), but this class has
not yet been found in cereals. We failed to detect these sequences
after examining roughly 820 Mb of assembled rice genomic sequence from
two different genotypes of the estimated 430-Mb rice genome. They also
are absent in cereal EST databases. It therefore seems the TIR class is
not only rare in cereals, but probably absent. The presence of these
sequences in gymnosperm databases, like the Pinus taeda EST
sequence database (e.g., GenBank accession no. BI077056) indicates that
this class of gene was present in the progenitors of grass species,
but lost in the grass family (Meyers et al. 2002
). It is likely that
there were a very small number of TIR class genes in early angiosperms,
and that their numbers amplified in the progenitors of modern dicots as they became more dependent on them for defense against pathogens. On
the other hand, our results and those of others (Cannon et al. 2002
)
have shown that specific monocot and dicot nT-NBS-LRR genes cluster
together in phylogenetic analysis as expected if several members of
this class had already diverged in early angiosperms. The numbers of
genes in this class has amplified to ~600 or more in rice, compared
to ~50 in Arabidopsis
(http://niblrrs.ucdavis.edu/At_RGenes/).
One class of NBS-LRR-related gene was identified in cereals that may
be specific to cereal, or monocot genomes. The nT-NBS class shows
similarity to the N-terminal half of nT-NBS-LRR genes but has no LRR
domain. Other genes related to nT-NBS-LRR genes without LRR domains
were observed in cereal genomes and are also present in
Arabidopsis (Meyers et al. 2002
). For example, the rp1-pd5 gene is a transcribed member of the Rp1
family of maize and is 99% identical to the Rp1-D gene, but
is truncated before the LRR coding region (Sun et al. 2001
). The nT-NBS
family is different from these sequences in several respects. As a
group, their sequences have diverged considerably from the other
NBS-LRR genes, evidence that they have evolved independently from
these genes for most of their evolution. The family is also different in that it is monophyletic and contains no known members that code for
LRR domains. Although the family appears very rare, or missing, in
dicot genomes, it is a very old gene family as evidenced by the
extensive sequence divergence among members. Over 50 members exist in
rice, making the family roughly the size of the nT-NBS-LRR family in
Arabidopsis.
Some limited structural heterogeneity was observed in the cereal NBS-LRR genes as several genes with duplicated or novel domains were observed. For example, the Xa1 gene carries sequences near its N terminus that are predicted to code for a zinc-finger, DNA-binding domain. The N termini of the vast majority of the N-terminal domains were quite homogeneous, that is, they were typically small with at least one highly conserved region. Many code for predicted CC domains, although this is not a distinguishing feature of this class of genes. While the structures of these genes are fairly conserved, they are extremely diverse in sequence. The C-terminal regions of many of the genes are barely recognizable as LRR domains. Even the NBS regions of the genes have diverged extensively and classification of the genes based on the sequence of this region reveals over 100 distinct clades. Some of these clades consist of one or a few genes in the rice genome, while others have amplified into large groups with varying degrees of similarity. Much of the divergence among these genes apparently occurred before the different cereal species separated, as NBS coding sequences of other cereals typically cluster within the rice clades.
While typical TIR class NBS-LRR genes do not appear to be present in cereal species, genes related to this class are present. One rice gene was identified, with strong homology to a barley EST, which coded for a TIR domain but no other recognizable domains. A second class, with at least three genes in rice, coded for divergent TIR and NBS domains. The C-terminal domain of these genes may have been derived from an LRR-like domain but was unique. Homologs of both of these classes of genes were observed in the Arabidopsis genome. In fact, sequence affinities between the rice TIR-NBS genes, and two similar Arabidopsis genes provide evidence that two members of this gene class were present when monocots and dicots diverged.
The structural differences between NBS-LRR genes in
Arabidopsis are partially correlated with their dependence on
certain other disease-signaling components (Glazebrook 2001
; Austin et al. 2002
). For example, a functional Eds1 gene is required for resistance mediated by the TIR-NBS-LRR genes Rps4 and
Rpp5 but is not required for several non-TIR class genes, such
as Rps2 and Rpm1 (Aarts et al. 1998
). The
predominance of the non-TIR class in cereals might indicate that the
cereal R genes signal through fewer or simpler pathways. On the other
hand, the Mla1 and Mla6 genes are highly similar in
sequence and structure, but differ in their requirements for the
Rar1 and Sgt1 gene products (Zhou et al. 2000
;
Azevedo et al. 2002
; Halterman et al. 2002
). It seems likely,
therefore, that the different defense signaling pathways that cereal R
genes utilize depend on factors other than obvious differences in
structural domains.
NBS-coding sequences from other cereals exhibit a surprising range of
similarity to the rice sequences. Some maize and wheat sequences
exhibit 85% amino acid sequence identity to rice genes, while 10 of 28 families showed <60% sequence identity. It is possible that the rice
orthologs of some of these families are missing from the available rice
databases, but most of the rice genes are recently duplicated, and it
is unlikely that all the sequences would be missing for very many
families. This would also explain why we and others (Leister et al.
1998
), have found that many of the rice sequences cross-hybridize
weakly to other cereal species. This may be an indication
that the resistance genes are evolving at very different rates.
Alternatively, it could be from loss of resistance genes, or gene
families in different species lineages (Michelmore and Meyers 1998
;
Cannon et al. 2002
). If resistance genes are commonly lost from species
lineages, comparative mapping experiments might frequently mistake
similar sequences for orthologs when they are actually more distantly
related paralogs. This may explain why the initial comparative mapping
experiments with resistance genes in cereals have implied that their
relative map positions may be less conserved than other types of genes
(Leister et al. 1998
). The present collection of NBS coding clones
should provide sufficient probes for more detailed comparative mapping
experiments, allowing a more extensive test of relative levels of
synteny. Examination of the presence or absence, and estimation of copy numbers of the different NBS-LRR gene families in the different grass
species will shed light on the evolutionary dynamics of resistance-gene
gain and loss in cereal genomes. The sequences identified in the
present study provide a framework for classification of additional
cereal genes.
| |
METHODS |
|---|
|
|
|---|
Sequence Acquisition
Cereal resistance gene sequences were obtained either from cloned PCR products (below) or by searching various databases with amino acid sequences of specific resistance genes as queries. Initial queries were performed with known resistance genes. Additional queries were done with more unique sequences after initial cladistic analyses. Two things that were considered do determine whether sequences were retained for phylogentic and structural analysis: (1) whether they were NBS-LRR sequences as indicated by conserved domains (domains in Table 3); legitimate NBS-LRR genes often had <30% amino acid identity to the sequences used in the initial queries and sometimes had TBLASTN scores <1e-4, but sequences with lower scores were often apparent pseudogenes with interrupted coding regions; and (2) whether they were <75% identical in amino acid sequence to already collected sequences. To determine this, a local database of the predicted proteins of these sequences was sequentially searched by BLASTP with each of the new sequences. This allows the identification of identical sequences and their classification into families. The local database was updated with the new sequences periodically. Databases searched include the GenBank nonredundant and high-throughput genomic sequence databases (http://www.ncbi.nlm.nih.gov/blast/index.html), the Monsanto rice genomic database (www.rice-research.org), and the Indica rice genomic database posted by the Beijing Genomics Institute (http://btn.genomics.org.cn/rice). The final search to above-mentioned databases was in May 2002.
Cloning and Characterization of Maize and Rice NBS Region Sequences
Primers were designed from the NBS-coding sequences obtained from the database searches. Forward primers were produced in such a way that they ended at the beginning of the P-loop of the NBS region of each gene. The reverse primers were designed about 900 bp after the P-loop, and at the end of the NBS region where the amino acid sequence `MHD` is moderately conserved. The PCR products that showed 0.9-1 kb in size were cloned into pCR2.1-TOPO cloning vector from Invitrogen and sequenced in the Kansas State University DNA sequencing facilities. Maize inbred line B73 and an Indica rice variety Nipponbare were used to generate RNA or DNA templates. Genomic DNA was usually used as a template, but RNA was used to amplify several NBS sequences to verify intron positions.
Full-length coding sequences were isolated from two previously isolated
maize NBS clones, PIC13 and PIC19. Genomic
clones homologous to the
probes were isolated from libraries made from the maize lines B73 (for
PIC19) and Rp3-A-R168 (PIC13). The PIC13 probe is thought to
represent a member of the Rp3 gene family (Collins et al.
1998
; Webb et al. 2002
). The NBS-LRR genes were sequenced, following
subcloning, into a pUC19 vector. Transcripts corresponding to the genes
were isolated by RACE PCR using 5` and 3` RACE System
For Rapid Amplification of cDNA Ends from Gibco Invitrogen Corporation.
Genomic Hybridizations
Cloned fragments of rice NBS-LRR genes were used as probes on genomic blots of rice to estimate the genomic copy numbers of each of the different families. Five micrograms of genomic DNA from four rice cultivars (Azucena, IR64, Gihobyeo, and Milyang23) were digested with four restriction endonucleases (EcoRI, EcoRV, HindIII, and XbaI), separated on 0.8% TBE agarose gels and blotted prior to hybridization. Probe labeling, hybridization, and signal detection were performed using ECL Direct Nucleic Acid Labeling And Detection System from Amersham Pharmacia Biotech. Blots were washed following hybridization at a moderate hybridization stringency of 0.5X SSC (75 mM NaCl and 7.5 mM sodium citrate) at 65°C.
Bioinformatic Programs Used for Phylogenetic Studies
The bioinformatic programs used in this study are listed in Table
5. All parameters in these programs
were set to default except that `Arabidopsis' was
specified as the organism in GenScan, and `Monocots' was specified in
FGENESH.
|
| |
WEB SITE REFERENCES |
|---|
|
|
|---|
http://www.ncbi.nlm.nih.gov/; National Center for Biotechnology Information.
http://niblrrs.ucdavis.edu/At_RGenes/; database of Arabidopsis NBS-LRR encoding disease resistance gene homologs.
http://www.rice-research.org/; Monsanto Rice Genome Sequence Database.
http://btn.genomics.org.cn/rice/; Indica rice database from Beijing Institute of Genomics.
| |
ACKNOWLEDGMENTS |
|---|
The authors wish to thank Blake Meyers and Richard Michelmore for valuable discussions. This work was supported by NSF Plant Genome grant 9975971.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 Corresponding author.
E-MAIL shulbrt{at}plantpath.ksu.edu; FAX (785) 532-5692.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.454902.
| |
REFERENCES |
|---|
|
|
|---|