|
|
|
|
Genome Res. 14:1916-1923, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Letter Close Split of Sorghum and Maize Genome Progenitors ová1,61 Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey 08854, USA 2 Department of Biological Sciences and Genetics Program, West Lafayette, Indiana 47907, USA
It is generally believed that maize (Zea mays L. ssp. mays) arose as a tetraploid; however, the two progenitor genomes cannot be unequivocally traced within the genome of modern maize. We have taken a new approach to investigate the origin of the maize genome. We isolated and sequenced large genomic fragments from the regions surrounding five duplicated loci from the maize genome and their orthologous loci in sorghum, and then we compared these sequences with the orthologous regions in the rice genome. Within the studied segments, we identified 11 genes that were conserved in location, order, and orientation. We performed phylogenetic and distance analyses and examined the patterns of estimated times of divergence for sorghum and maize gene orthologs and also the time of divergence for maize orthologs. Our results support a tetraploid origin of maize. This analysis also indicates contemporaneous divergence of the ancestral sorghum genome and the two maize progenitor genomes about 11.9 million years ago (Mya). On the basis of a putative conversion event detected for one of the genes, tetraploidization must have occurred before 4.8 Mya, and therefore, preceded the major maize genome expansion by gene amplification and retrotransposition.
Maize (Zea mays L. ssp. mays), from the grass tribe Andropogoneae, is an agronomically important crop and also a traditional genetic model. The theory that maize is a tetraploid first arose from the fact that maize has a haploid chromosome number of 10 (2n = 20), whereas many closely related grasses, such as Coix aquatica, Saccharum sp., and Erianthus sp., have only five chromosomes in the haploid nucleus (Celarier 1956
Three models can explain the large-scale duplications in the maize genome, that is, segmental duplication (multiple independent duplications within a genome), autotetraploidy (intraspecific genomic duplication), and allotetraploidy (interspecific genome hybridization). Gaut and Doebley (1997
Isolation of BAC Clones and Their Sequencing Probes for six loci (c1/pl1, tbp1/2, r1/b1, orp1/2, tb1/2, and Zmfie1/2) were used to screen maize and sorghum BAC libraries. These loci are located on seven of the 10 chromosomes (Fig. 1). These locations do not have a bias for centromeric or telomeric regions, but do reflect the segmental nature of the syntenous regions of the maize genome. For sorghum, each of the six probes identified only one overlapping series of BAC contigs, indicating that those six genes are not duplicated in the sorghum genome. In maize, the clones hybridizing with each probe belonged to two separate BAC contigs, demonstrating that those six loci are duplicated in maize. A total of 18 BAC clones, six from sorghum, and 12 from maize, were sequenced (Table 1). Two of the six loci, the genes encoding the FIE-like and ORP proteins, are physically linked. Nevertheless, because of the low gene density in maize, optimal alignment between sorghum and maize required us to select an additional six maize BAC clones to extend the size of the contigs. Table 1 shows the 24 BACs (providing more than 3 Mb of DNA) that were sequenced.
Annotation of the BAC Clones We predicted 62 protein-coding genes in the six sorghum BACs (total length of 681 kb) with an average gene density (avg) of 11 kb/gene. Within the 18 maize BACs ( 2554 kb), we identified 74 putative genes, avg = 34.5 kb/gene. Genes were embedded between large blocks of retrotransposons, including nested retrotransposons, as reported for other loci in maize (SanMiguel et al. 1996 798 kb), avg = 11.4 kb/gene, much lower than the predicted avg of 6.5 kb/gene (Rice Chromosome 10 Sequencing Consortium 2003Among the genes identified in the five chromosomal regions, only 11 genes, present as orthologous gene pairs in maize (Fig. 1), are also collinear in sorghum and rice. Four of the 11 gene duplicates (homoeologs), r1/b1, c1/pl1, tb1/tb2, and orp1/orp2, can (except for tb2) be distinguished by maize mutant phenotypes that segregate as single Mendelian traits, typical for the disomic nature of maize. No mutant phenotypes are known for the other seven gene duplicates. Although we can find homologs for all seven genes, only three genes have homologs that encode known protein. The tbp1/2 duplicate is a homolog of the TATA box-binding protein; grf1 is homologous with a putative growth-regulating factor 1 gene (grf1, rice homolog AAF17567 [GenBank] e < 1E-74); and gp apparently encodes a homolog of a putative glutathione peroxidase (gp, barley homolog CAB59895 [GenBank] e > 2.8E-82). The other four genes (pg1, pg2, pg3, pg4) show considerable sequence similarity to unknown proteins in Arabidopsis (pg1: At3g07660, e < 1E-30; pg2: At5g12080, e < 1E-100; pg3: At5g22090, e < 2.9E-11; pg4: At4g39410, e < 6E-08).
In the orp1/2 region (Fig. 1), we identified the Zmfie1/Zmfie2 genes, encoding a protein that is homologous to the Arabidopsis FIE protein (Ohad et al. 1999
Phylogenetic Analyses Phylogenetic analyses were carried out to resolve the relationship of the sorghum genome and the two subgenomes of maize. However, analysis of the relationship was confounded by a short internode. Only three of the 11 genes (grf1, orp1/2, and r1/b1) recover a relationship supported by >85 bootstrap values (see Fig. 2). To determine whether the topology of the ML gene tree differs significantly from a trichotomous tree (in which the three genomes diverge from the same node), we applied the likelihood ratio test (LRT). Using adjusted according to the Bonferroni correction, we found that the orp1/2 and r1/b1 gene trees differ significantly from the trichotomous tree. However, those two gene trees show different topologies (Fig. 2).
Sequence Divergence and Nucleotide Substitution Synonymous and nonsynonymous distances between gene orthologs are shown in Table 2. Synonymous distance varies 2.8-fold between the two maize orthologs and 3.2-fold between gene orthologs from maize and sorghum. Unusually low divergence at nonsynonymous sites was found for the two maize tbp1/2 orthologs. Nonsynonymous distance between maize and sorghum orthologs varies 15.5-fold. Graphs of estimated synonymous distances (Fig. 3A) show that the standard deviations overlap for the three pairs of orthologs, except in the cases of pg2 and r1/b1 genes. Therefore, we performed Z-tests for each gene for a null hypothesis (H0), which proposes that the three gene orthologs (one from sorghum and two from maize) diverged within a short time period and therefore, that the pairwise synonymous distances are equal. Using value adjusted by the Bonferroni method, the H0 was found to be true for all but the r1/b1 gene.
We also compared estimated distances between sorghum and maize and also between maize gene orthologs across genes. Both homogeneity tests (Gaut and Doebley 1997 2 = 100.49, P < 0.001; 2 = 97.97, P < 0.001, respectively). The recovered heterogeneity may be caused by variable divergence times of the compared sequences, or by unequal rates of substitution across genes, or by both. Rate heterogeneity at synonymous sites was shown for the pg2, orp1/2, and tb1/tb2 genes, and therefore, these three gene pairs were not used for estimating divergence times. Nonsynonymous substitution rates varied among lineages more dramatically; only three gene pairs (pg2, pg3, and pg4) showed rate homogeneity across lineages. All other genes exhibited rate heterogeneity for at least one pair of sequences.
Assuming that rice diverged from an ancestor of sorghum and maize at
Estimation of Divergence Time
From cytology to genetic and molecular mapping, there is considerable evidence that the maize genome contains extensive chromosomal duplication. The large-scale genome duplication in maize could have resulted from segmental duplication or whole-genome duplication by autotetraploidy or allotetraploidy. On the basis of recovered bimodal distribution of synonymous distances between duplicated genes, Gaut and Doebley (1997 20.5 Mya. Using sequence divergence between sorghum and maize from two genes (mdh and waxy), they further suggested that the sorghum genome and one of the duplicated maize regions might be closer relatives than are the two duplicated maize regions themselves. At that time, sorghum sequences orthologous to maize gene duplicates were unavailable. It was also unknown whether the gene duplicates in maize were true orthologs.
To establish the orthology of gene duplicates across taxa, it became necessary to isolate gene duplicates in maize along with other physically closely linked genes (microcollinear regions). To isolate such genes, we sequenced large duplicated chromosomal fragments in five different loci that are located on seven different maize chromosomes (Fig. 1; Table 1). We found that gene islands and blocks of retrotransposons are quite variable in length in the homoeologous regions of maize. We also note that tb1 and tbp1 on maize chromosome 1L are about 7 cM apart (www.agron.missouri.edu) and that both of their orthologs in the rice genome are present on chromosome 3 at a distance of about 2 Mb, indicating that microcollinearity between maize and rice is probably preserved over this interval. However, the analysis of >4,000,000 bp of DNA across taxa yielded only 11 genes that were conserved between the two duplicated regions of maize and the sorghum and rice genomes. Larger conservation was found between the sorghum and rice segments, whereas both maize regions experienced extensive genomic rearrangements. A more detailed investigation of gene mobility and gene/transposon organization can be found in the article by Lai et al. (2004
Sequences of the 11 orthologous genes, shared by the maize duplicated regions and the rice and sorghum genomes, were subjected to several analyses. First, phylogenetic analyses demonstrated a close relationship of the two maize progenitor genomes and the sorghum genome, indicating "a near-instantaneous" speciation of the three genomes (trichotomy). Multiple tests (the LTR test and the Z-test) showed that the r1/b1-gene tree provides a significantly nontrichotomic tree. Because the r1/b1 genes were extensively studied (Purugganan and Wessler 1994
In our second analysis, we analyzed patterns of estimated distances for each pair of sequences from maize and sorghum. In contrast to Gaut and Doebley (1997
Furthermore, we compared estimated distances between sorghum and maize and also between maize homoeologs across genes. Both
If maize is indeed a tetraploid with 10 haploid chromosomes, then how did sorghum arrive at the same number of chromosomes as maize? One possibility would be that sorghum is a tetraploid as well. There is evidence from QTL and RFLP studies that the genome of sorghum (Sorghum bicolor) contains large duplicated regions (Pereira et al. 1994
Here, we demonstrate that the two progenitor genomes of maize and the sorghum genome diverged from each other
BAC Clone Isolation and Sequencing High-density filters for maize (Zea mays L. cv B73) BAC libraries (Yim et al. 2002
BAC DNA was isolated using the Large Construct Kit (Qiagen, Inc.). The purified BAC DNA was physically sheared and then ligated into a pUC vector for shotgun libraries (Song et al. 2001
Sequence Analysis
To decrease the error in our rate and distance estimates, we used several approaches. First, we selected strictly orthologous genes to investigate the evolutionary history of genomes. Second, we used a codon-based likelihood model for estimation of substitution rates that account for biases in transition/transversion rates and in codon frequencies. Third, unlike Gaut and Doebley (1997
Estimation of the Rate of Synonymous Substitution
where VAD = Var(dAD), etc., is the estimated variance of the distance dAD, etc. The variance Var(rS,ABC) of rS,ABC is given by
where Cov(dAD,dBD) can be approximated by Var(dOD), where O is the common ancestor of A and B (Nei et al. 1985
Calculation of Divergence Times
For the divergence time t of two ingroup sequences, we have
When, as is the present case,
and
one can approximate the term Var(dAB/rS,AB) by the following formula:
with
where the covariances Cov(dAB,dAD) and Cov(dAB,dBD) can be approximated by Var(dOA) and Var(dOB) with O the common ancestor of A and B (Nei et al. 1985
We thank Dr. H.K. Dooner for discussions of emerging results and B.S. Gaut and J.F. Doebley for critical comments on the manuscript, and G. Fuks, Dr. Arvind Bharti, A. Bronzino Nelson, S. Kavchok, G. Keizer, and S. Young for technical assistance. This work was supported by NSF grant 9975618.
6 These authors contributed equally to this work.
3 Present address: Department of Genetics, University of Georgia, Athens, GA 30602, USA
4 Present address: Department of Biological Sciences, Michigan Tech University, MI 49931, USA
5 Present address: Analytical and Genomic Technologies, Crop Genetics R&D, DuPont Agriculture & Nutrition, Wilmington, DE 19880, USA.
7 Corresponding author. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2332504.
Ahn, S. and Tanksley, S.D. 1993. Comparative linkage maps of the rice and maize genomes. Proc. Natl. Acad. Sci. 90: 7980-7984.
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. Celarier, R.P. 1956. Cytotaxonomy of the Andropogoneae. 1. Subtribes Dimeriinae and Saccharinae. Cytologia 21: 272-291.
Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: 175-185. Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783-791.[CrossRef]
Gale, M.D. and Devos, K. 1998. Plant comparative genetics after 10 years. Science 282: 656-659.
Gaut, B.S. and Doebley, J.F. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. 94: 6809-6814.
Gaut, B.S., Le Thierry d'Ennequin M., Peek, A.S., and Sawkins, M.C. 2000. Maize as a model for the evolution of plant nuclear genomes. Proc. Natl. Acad. Sci. 97: 7008-7015.
Gómez, M.I., Islam-Faridi, M.N., Zwick, M.S., Czeschin Jr., D.G., Hart, G.E., Wing, R.A., Stelly, D.M., and Price, H.J. 1998. Tetraploid nature of Sorghum bicolor (L.) Moench. J. Hered. 89: 188-190.
Goodman, M.M., Stuber, C.W., Newton, K., and Weissinger, H.H. 1980. Linkage relationships of 19 enzyme loci in maize. Genetics 96: 697-710.
Gordon, D., Abajian, C., and Green, P. 1998. Consed: A graphical tool for sequence finishing. Genome Res. 8: 195-202. Graur, D. and Li, W.H. 2000. Rates and patterns of nucleotide substitution. In Fundamentals of molecular evolution (2nd ed.), pp. 99-164. Sinauer Associates, Sunderland, MA.
Helentjaris, T., Weber, D., and Wright, S. 1988. Identification of the genomic locations of duplicate nucleotide sequences in maize by analysis of restriction fragment length polymorphism. Genetics 118: 353-363. Hu, J., Anderson, B., and Wessler, S.R. 1996. Isolation and characterisation of rice R genes: Evidence for distinct evolutionary path in rice and maize. Genetics 142: 1021-1031.[Abstract] Karper, R.E. and Chisholm, A.T. 1936. Chromosome numbers in Sorghum. Am. J. Bot. 23: 369-374.[CrossRef]
Kimura, M. and Ohta, T. 1974. On some principles governing molecular evolution. Proc. Natl. Acad. Sci. 71: 2848-2852. Kishimoto, N., Higo, H., Abe, K., Arai, S., Saito, A., and Higo, K. 1994. Identification of the duplicated segments in rice chromosomes 1 and 5 by linkage analysis of cDNA markers of known functions. Theor. Appl. Genet. 88: 722-726.[CrossRef]
Lai, J., Ma, J., Swigo Li, W.H. 1997. Rates and patterns of nucleotide substitution. In Molecular evolution, pp. 177-213. Sinauer Associates, Sunderland, MA. Lin, Y.-R., Schertz, K.F., and Paterson, A.H. 1995. Comparative analysis of QTLs affecting plant height and maturity across the Poaceae, in reference to an interspecific sorghum population. Genetics 141: 391-411.[Abstract] Mason-Gamer, R.J., Weil, C.F., and Kellogg, E.A. 1998. Granule-bound starch synthase: Structure, function, and phylogenetic utility. Mol. Biol. Evol. 15: 1658-1673.[Abstract]
McClintock, B. 1984. The significance of responses of the genome to challenge. Science 226: 792-801.
McMillin, D.E. and Scandalios, J.G. 1980. Duplicated cytosolic malate dehydrogenase genes in Zea mays. Proc. Natl. Acad. Sci. 77: 4866-4870. Mehra, P.N. and Sharma, M.L. 1975. Cytological studies in some central and eastern Himalayan grasses I. The Andropogoneae. Cytologia 40: 61-74. Moore, G., Devos, K., Wang, Z., and Gale, M.D. 1995. Grasses, line up and form a circle. Curr. Biol. 5: 737-739.[CrossRef][Medline] Muse, S.V. and Gaut, B.S. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates with application to the chloroplast genome. Mol. Biol. Evol. 11: 715-724.[Abstract] Nagamura, Y., Inoue, T., Antonio, B.A., Shimano, T., Kajiya, H., Shomura, A., Lin, S.Y., Kuboki, Y., Harushima, Y., Kurata, N., et al. 1995. Conservation of duplicated segments between rice chromosome-11 and chromosome-12. Breed. Sci. 45: 373-376. Nei, M. and Gojobori, T. 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418-426.[Abstract] Nei, M., Stephens, J.C., and Saitou, N. 1985. Methods for computing the standard errors of branching points in an evolutionary tree and their application to molecular data from humans and apes. Mol. Biol. Evol. 2: 66-85.[Abstract]
Ohad, N., Yadegari, R., Margossian, L., Hannon, M., Michaeli, D., Harada, J.J., Goldberg, R.B., and Fischer, R.L. 1999. Mutations in FIE, a WD polycomb group gene, allow endosperm development without fertilization. Plant Cell 11: 407-416.
Paterson, A., Lin, Y., Li, Z., Schertz, K., Doebley, J., Pinson, S., Liu, S., Stansel, J., and Irvine, J. 1995. Convergent domestication of cereal crops by independent mutations at corresponding genetic loci. Science 269: 1714-1718. Paterson, A., Lan, T., Reischmann, K., Chang, C., Lin, Y., Liu, S., Burow, M., Kowalski, S., Katsar, C., DelMonte, T., et al. 1996. Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nat. Genet. 14: 380-382.[CrossRef][Medline] Pereira, M.G., Lee, M., Bramel-Cox, P., Woodman, W., Doebley, J., and Whitkus, R. 1994. Construction of an RFLP map in sorghum and comparative mapping in maize. Genome 37: 236-243.
Peschke, V.M., Phillips, R.L., and Gengenbach, B.G. 1987. Discovery of a transposable element activity among progeny of tissue culture-derived maize plants. Science 238: 804--807.
Posada, D. and Crandall, K.A. 1998. MODELTEST: Testing the model of DNA substitution. Bioinformatics 14: 817-818. Purugganan, M.D. and Wessler, S.R. 1994. Molecular evolution of the plant R regulatory gene family. Genetics 138: 849-854.[Abstract]
Ramakrishna, W., Dubcovsky, J., Park, Y.J., Busso, C., Emberton, J., SanMiguel, P., and Bennetzen, J.L. 2002. Different types and rates of genome evolution detected by comparative sequence analysis of orthologous segments from four cereal genomes. Genetics 162: 1389-1400. Rhoades, M.M. 1951. Duplicated genes in maize. Am. Nat. 85: 105-110.[CrossRef]
Rice Chromosome 10 Sequencing Consortium. 2003. In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300: 1566-1569.
SanMiguel, P., Tikhonov, A., Jin, Y.K., Motchoulskaia, N., Zakharov, D., Melake-Berhan, A., Springer, P.S., Edwards, K.J., Lee, M., and Avramova, Z. 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274: 765-768.
Simillion, C., Vandepoele, K., Van Montagu, M.C., Zabeau, M., and Van De Peer, Y. 2002. The hidden duplication past of Arabidopsis thaliana. Proc. Natl. Acad. Sci. 99: 13627-13632.
Song, R. and Messing, J. 2003. Gene expression of a gene family in maize based on non-collinear haplotypes. Proc. Natl. Acad. Sci. 100: 9055-9060.
Song, R., Llaca, V., Linton, E., and Messing, J. 2001. Sequence, regulation, and evolution of the maize 22-kD zein gene family. Genome Res. 11: 1817-1825.
Song, R., Llaca V., and Messing, J. 2002. Mosaic organization of orthologous sequences in grass genomes. Genome Res. 12: 1549-1555. Swofford, D.L. 1998. PAUP*: Phylogenetic analysis using parsimony (*and other methods), version 4. Sinauer Associates, Sunderland, MA. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The ClustalX windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24: 4876-4882.
Vandepoele, K., Simillion, C., and Van de Peer, Y. 2003. Evidence that rice and other cereals are ancient aneuploids. Plant Cell 15: 2192-2202.
Vision, T.J., Brown, D.G., and Tanksley, S.D. 2000. The origins of genomic duplications in Arabidopsis. Science 290: 2114-2117. Wendel, J.F., Goodman, M.M., and Stuber, C.W. 1985. Mapping data for 34 isozyme loci currently being studied. Maize Genet. Coop. News Lett. 59: 90. Wendel, J.F., Stuber, C.W., Edwards, M.D., and Goodman, M.M. 1986. Duplicated chromosomal segments in Zea mays L.: Further evidence from Hexokinase isozymes. Theor. Appl. Genet. 72: 178-185.[CrossRef]
Wolfe, K.H., Gouy, M., Yang, Y.W., Sharp, P.M., and Li, W.-H. 1989. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. 86: 6201-6205.
Yang, Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood Comput. Appl. Biosci. 13: 555-556.
Yim, Y.S., Davis, G.L., Duru, N.A., Musket, T.A., Linton, E.W., Messing, J., McMullen, M.D., Soderlund, C.A., Polacco, M.L., Gardiner, J.M., et al. 2002. Characterization of three maize bacterial artificial chromosome libraries toward anchoring of the physical map to the genetic map using high-density bacterial artificial chromosome filter hybridization. Plant Physiol. 130: 1686-1696.
http://genes.mit.edu/GENSCAN.html; gene prediction. www.agron.missouri.edu; maize map. www.softberry.com; gene prediction. http://www.hyphy.org/; software tool.
Received January 4, 2004; accepted in revised format April 6, 2004. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||