|
|
|
|
Genome Res. 14:1932-1937, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Letter Characterization of the Maize Endosperm Transcriptome and Its Comparison to the Rice Genome1 Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA 2 Department of Genetics, Development & Cell Biology, Iowa State University, Ames, Iowa 50011, USA 3 Department of Plant Science, University of Arizona, Tucson Arizona 85721, USA 4 Munich Information Center for Protein Sequences, Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany
The cereal endosperm is a major organ of the seed and an important component of the world's food supply. To understand the development and physiology of the endosperm of cereal seeds, we focused on the identification of genes expressed at various times during maize endosperm development. We constructed several cDNA libraries to identify full-length clones and subjected them to a twofold enrichment. A total of 23,348 high-quality sequence-reads from 5'- and 3'-ends of cDNAs were generated and assembled into a unigene set representing 5326 genes with paired sequence-reads. Additional sequencing yielded a total of 3160 (59%) completely sequenced, full-length cDNAs. From 5326 unigenes, 4139 (78%) can be aligned with 5367 predicted rice genes and by taking only the "best hit" be mapped to 3108 positions on the rice genome. The 22% unigenes not present in rice indicate a rapid change of gene content between rice and maize in only 50 million years. Differences in rice and maize gene numbers also suggest that maize has lost a large number of duplicated genes following tetraploidization. The larger number of gene copies in rice suggests that as many as 30% of its genes arose from gene amplification, which would extrapolate to a significant proportion of the estimated 44,027 candidate genes of its entire genome. Functional classification of the maize endosperm unigene set indicated that more than a fourth of the novel functionally assignable genes found in this study are involved in carbohydrate metabolism, consistent with its role as a storage organ.
Comparative genetic mapping has shown that the chromosomes of many grass species exhibit extensive synteny (Helentjaris et al. 1988 ová et al. 2004
Although the sequences of rice chromosomes permit us to use computational prediction programs to locate all the genes, predicted genes need to be verified by the identification and characterization of expressed sequence tags (ESTs). Furthermore, the tissue-specificity of predicted genes needs to be investigated based on where genes are expressed and by their allelic variants. For instance, one of the important organs of cereal grain is the endosperm. Endosperm is a nutritive tissue that is used by the germinating embryo as an energy source. Because of its high nutritional content, it is also a major food source for humans and animals. Among cereals, the endosperm of maize has been extensively studied because of its large size, and there are many mutations affecting its development and its effect on kernel appearance. Studies based on analysis of EMS (ethyl methane sulphonate) mutagenesis suggested there are at least 300 genes in maize that can cause a visible endosperm phenotype (Neuffer and Sheridan 1980 In an effort to identify and relate the maize and rice endosperm transcriptomes, we constructed several full-length cDNA libraries using maize endosperm from early-to-middle mature growth stages (4-6 d after pollination [DAP] and 7-23 DAP). The libraries were subjected to both normalization and subtraction. Out of 5504 unique ESTs, 3160 (59%) represented completely sequenced cDNAs. All sequences have been placed onto rice chromosome sequences to provide positional information and to assess gene amplification in rice. Furthermore, functional assignments of total maize versus maize endosperm-specific cDNAs have shown that novel cDNAs contain motifs typical for carbohydrate metabolism.
Library Construction, Normalization, and Subtraction A total of six libraries were constructed, four from mRNA of the 4-6-DAP tissue and two from the 7-23-DAP tissue (Table 1). To evaluate the quality (QC) of these libraries, initially a few plates (96-well) from each of the six libraries were sequenced. Based on insert size, only three out of the six libraries (endosperm_3, 4, and 5) were found to meet the required standards, and were, therefore, further analyzed (Supplemental Fig. A). To overcome representation of abundant cDNAs, 48 plates of the QC-passed libraries were arrayed on high-density filters and probed with labeled cDNAs made from the corresponding mRNA preparation. Autoradiographs were read to select against strong hybridizing clones, and the low-abundance clones were rearrayed in fresh 96-well plates for sequencing. This normalization step resulted in nearly 40% enrichment for two of the three libraries, endosperm_3 (31 plates) and endosperm_5 (32 plates). There was a less striking effect on the third (endosperm_4) library, which was directly processed for sequencing without rearraying. All clones were sequenced from both ends. Because the sequence analysis indicated that even after normalization, redundant cDNAs represented a substantial portion of the EST collection (see below), we introduced an additional enrichment step. Pools of sequenced cDNAs were used as probes to hybridize against new filters of arrayed clones from two libraries that represented the early and late immature endosperm RNA preparations. Therefore, 96 plates of endosperm_3 and 48 plates of endosperm_5 were hybridized with both cDNA probes and pooled plasmid probes; the lowabundance clones were rearrayed in 23 plates for endosperm_3 and 13 plates for endosperm_5. Clones from these plates were again sequenced from both ends.
Full-Length cDNAs Out of a total of 35,520 attempts, 23,348 reads (66% success) were of high quality (>200 bp, Q20). Earlier samples were run on ABI 3700 DNA sequencers, later ones on an ABI 3730xl; as a consequence, the average read length improved from 576 bp to 830 bp. Of the total, 12,659 (54%) were derived from the 5'-end, whereas 10,689 (46%) were from the 3'-end of cDNAs. The number of 3'-end sequences was somewhat lower, because the poly(A) tract caused a slightly higher failure rate. In total, 8939 clones had sequences both from the 5'- and the 3'-ends, of which 5455 clones (61%) with overlapping 5'- and 3'-ends represented complete cDNAs, whereas 3484 (39%) had bigger insert sizes that are not completely covered yet. Clustering the 5455 full-length cDNAs resulted in 2198 (40%) unique sequences, thereby proving the robustness of the above-mentioned enrichment procedures. Assembling all 5'-end sequences resulted in 5504 non-redundant cDNAs.
To investigate whether these cDNAs encode full-length ESTs, we have compared our EST sequences to a database containing protein information as described under Methods. According to the results from the comparisons with our initial 1334 sequences, x98% contained the ATG start codon in the appropriate place, with >50 nt before the ATG start codon (data not shown). Given that such a significant portion of our cDNA collection contained the ATG start codon, it is reasonable to assume that the isolation of mRNA and the construction protocol of the cDNA libraries were of sufficient quality to yield full-length cDNAs and maintain them as full-length clones in Escherichia coli. Alignment of protein sequences to cDNAs also proved useful in determining their predicted lengths, which enabled us to estimate gap sizes between 5'- and 3'-ends of the longer cDNA clones (data not shown). This analysis suggested that a large number of cDNAs could be completely sequenced with just one round of primer walking. Therefore, a set of 3357 primers was designed to carry out an additional cycle of sequencing for clones that had gaps in the center of their sequence, thereby yielding another 992 full-length cDNAs, bringing the total to 3160 completely sequenced clones. The size distribution of the maize endosperm-expressed cDNAs was compared with that of the set of full-length cDNAs from rice (The Rice Full-Length cDNA Consortium 2003
Functional Classification of the Unigene Set
Comparison of Maize Endosperm Unigenes With ESTs in GenBank/MaizeGDB To investigate how many of the 5504 endosperm-specific unigenes were novel transcripts, they were searched against all the current maize ESTs available in GenBank/MaizeGDB. Out of 397,000 maize ESTs, a total of 49,991 unigenes were assembled as described under Methods (Table 2). Comparison of the two unigene sets provided a total of 677 (12.3%) novel cDNAs (BLASTN E-value 1e-20). Because the current ESTs are derived from endosperm tissues, these 677 novel ESTs most likely represent endosperm-specific or at least endosperm-preferred expressed unigenes. Nevertheless, it is interesting to note that more than a fourth of the novel functionally assignable ESTs belong to the category of "Metabolism," consistent with the tissue-specificity of the endosperm. A list of all 677 cDNAs is provided as online information in Supplemental Table B.
Recently a genome-wide analysis of a very large number of ESTs was done in tomato (Van der Hoeven et al. 2002
Organization of Rice Genes Homologous to Maize Endosperm mRNAs
EST sequencing has been considered an efficient way of gene discovery in many common species (http://www.ncbi.nlm.nih.gov/dbEST). For maize alone, there have been two major largescale EST sequencing projects (Gai et al. 2000
At this time, it is still difficult to predict the size of the maize endosperm transcriptome. One would expect that many mutant genes might not have a phenotype and, therefore, belong to a class other than the 300 essential genes in maize endosperm (Neuffer and Sheridan 1980
Comparative mapping to the complete genome sequence of a close maize relative, the rice genome, serves two purposes: one, it adds functional annotation to the rice genome; and two, it leads to the prediction of the location of these genes in the maize genome for those chromosomal regions that are syntenic in both. We expected that most if not all maize endosperm-expressed genes could be detected in rice. However, from 5326 repeat-free maize unigenes, only 4139 (78%) of them could be mapped to 3108 locations on the rice genome. It is not surprising to find 1020 fewer map positions in rice than the total number of maize genes. The reason for this is that maize originated by the hybridization of two progenitors that split from the sorghum progenitor 11.9 Mya (Swigo
On the other hand, the large percentage of conserved genes between maize and rice could be accounted for in two ways. The endosperm tissues of maize and rice have a very similar function and storage capacity, although rice seeds are much smaller than maize kernels. A large percentage of the endosperm transcriptome encodes many common cellular functions and may represent a large fraction of the total plant transcriptome. If we ask the question how many rice genes have homology to maize endosperm ESTs, then the number of 5367 was much higher than the 4139 maize endosperm cDNA matches. The difference of 1239 genes in the rice genome would suggest that 30% of the rice genes are part of gene families that could vary significantly in size and would exceed even the proportion of 25% previously determined for a single chromosome (Rice Chromosome 10 Sequencing Consortium 2003
Library Construction Two sets of endosperm tissue were harvested from maize inbred line W22, one at 4-6 DAP and the second pooled at 2-d intervals, from 7-23 DAP, during the summer of 2001 and used to purify RNA using mRNA isolation system II (Promega). The intactness of RNA was checked by gel electrophoresis/blot hybridization, and the purity was determined by its A260/A280 ratio. The cDNA libraries were constructed using a cDNA synthesis kit (Cat #200401-5) from Stratagene. Clones were plated on LB medium containing X-GAL and IPTG, and white colonies were selected by robotic transfer into 96-well microtiter plates. Random sampling of isolates was used to determine insert sizes by agarose gel electrophoresis.
Filter Hybridization
EST Sequencing and Assembly
Computational Analysis For comparison to rice, maize cDNA sequences were related to the 12 pseudomolecules of the rice genome generated from the BAC/PAC sequences of the IRGSP (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml). The coding sequences of all the putative genes were predicted using the FGENESH program with the monocot trained program. The unigene set of maize ESTs was screened for potential repeat sequences using the cereal repeat database. The repeat-free EST sequences were then subjected to homology searches against the database of the predicted rice coding sequences.
We thank G. Fuks, S. Kavchok, G. Keizer, A.B. Nelson, S. Young, and V. Zohovetz for technical assistance. This work was supported by NSF grant 0077676.
5 Present address: Agricultural Plant Stress Research Center, Chonnam National University, Kwangju 500-757, Korea
6 Present address: Turku Center for Biotechnology, Tykistoekatu 6, Turku, Finland.
7 Corresponding author. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2780504. [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank under accession nos. CA398264 [GenBank] -CA405362 and CD43287-CD44042.]
Ahn, S. and Tanksley, S.D. 1993. Comparative linkage maps of the rice and maize genomes. Proc. Natl. Acad. Sci. 90: 7980-7984. Becraft, P.W. 2001. Cell fate specification in the cereal endosperm. Semin. Cell Dev. Biol. 12: 387-394.[CrossRef][Medline] Engel, M., Chaboud, A., Dumas, C., and McCormick, S. 2003. Sperm cells of Zea mays have a complex complement of mRNAs. Plant J. 34: 697-707.[CrossRef][Medline]
Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: 175-185. Feng, Q., Zhang, Y., Hao, P., Wang, S., Fu, G., Huang, Y., Li, Y., Zhu, J., Liu, Y., Hu, X., et al. 2002. Sequence and analysis of rice chromosome 4. Nature 420: 316-320.[CrossRef][Medline]
Frishman, D., Albermann, K., Hani, J., Heumann, K., Metanomski, A., Zollner, A., and Mewes, H.W. 2001. Functional and structural genomics using PEDANT. Bioinformatics 17: 44-57.
Gai, X., Lal, S., Xing, L., Brendel, V., and Walbot, V. 2000. Gene discovery using the maize genome database ZmDB. Nucleic Acids Res. 28: 94-96.
Gale, M.D. and Devos, K. 1998. Plant comparative genetics after 10 years. Science 282: 656-659.
Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller, P., Varma, H., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100.
Helentjaris, T., Weber, D., and Wright, S. 1988. Identification of the genomic locations of duplicate nucleotide sequences in maize by analysis of restriction fragment length polymorphism. Genetics 118: 353-363.
Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9: 868-877.
Lai, J., Ma, J., Swigo
Neuffer, G. and Sheridan, W.F. 1980. Defective kernel mutants of maize. I. Genetic and lethality studies. Genetics 95: 929-944.
Qiu, F., Guo, L., Wen, T., Liu, F., Ashlock, D., and Schnable, P. 2003. DNA sequence-based "bar codes" for tracking the origins of expressed sequence tags from a maize cDNA library constructed using multiple mRNA sources. Plant Physiol. 133: 475-481.
Rice Chromosome 10 Sequencing Consortium. 2003. In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300: 1566-1569.
The Rice Full-Length cDNA Consortium. 2003. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301: 376-379.
Rudd, S., Mewes, H.-W., and Mayer, K.F.X. 2003. Sputnik: A database platform for comparative plant genomics. Nucleic Acids Res. 31: 128-132. Sasaki, T., Matsumoto, T., Yamamoto, K., Sakata, K., Baba, T., Katayose, Y., Wu, J., Niimura, Y., Cheng, Z., Nagamura, Y., et al. 2002. The genome sequence and structure of rice chromosome 1. Nature 420: 312-316.[CrossRef][Medline] Scanlon, M. and Myers, A. 1998. Phenotypic analysis and molecular cloning of discolored-1 (dsc1), a maize gene required for early kernel development. Plant Mol. Biol. 37: 483-493.[CrossRef][Medline] Scanlon, M., Stinard, P., James, M., Myers, A., and Robertson, D. 1994. Genetic analysis of 63 mutations affecting maize kernel development isolated from Mutator stocks. Genetics 136: 281-294.[Abstract]
Segal, G., Song, R., and Messing, J. 2003. A new opaque variant of maize by a single dominant RNAi-inducing transgene. Genetics 165: 387-397.
Song, R. and Messing, J. 2002. Contiguous genomic DNA sequence comprising the 19-kDa-zein gene family from Zea mays. Plant Physiol. 130: 1626-1635.
____. 2003. Gene expression of a gene family in maize based on noncollinear haplotypes. Proc. Natl. Acad. Sci. 100: 9055-9060.
Song, R., Llaca, V., and Messing, J. 2002. Mosaic organization of orthologous sequences in grass genomes. Genome Res. 12: 1549-1555.
Swigo
Van der Hoeven, R., Ronning, C., Giovannoni, J., Martin, G., and Tanksley, S. 2002. Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14: 1441-1456. Vieira, J. and Messing, J. 1982. The pUC plasmids, an M13mp7 derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene 19: 259-268.[CrossRef][Medline]
Wu, J., Maehara, T., Shimokawa, T., Yamamoto, S., Harada, C., Takazaki, Y., Ono, N., Mukai, Y., Koike, K., Yazaki, J., et al. 2002. A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell 14: 525-535.
Yu, J., Hu, S., Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79-92.
http://endosperm.org/; segregation analysis of mutant endosperm phenotypes. http://rgp.dna.affrc.go.jp; International Rice Genome Sequencing Project, IRGSP. http://www.ncbi.nlm.nih.gov/dbEST; NCBI EST sequencing. http://www.softberry.com; FGENESH. http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml; TIGR, pseudomolecules for all chromosomes.
Received April 10, 2004; accepted in revised format July 28, 2004. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||