|
|
|
|
Published online before print
February 14, 2006, 10.1101/gr.4602906 Genome Res. 16:441-450, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Resources A global assembly of cotton ESTs1 Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011, USA 2 Arizona Genomics Computational Laboratory, BIO5 Institute, University of Arizona, Tucson, Arizona 85721, USA 3 Arizona Genomics Institute, Department of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA 4 CSIRO Plant Industry, Canberra City ACT 2601, Australia 5 Department of Plant Sciences, University of CaliforniaDavis, Davis, California 95616, USA 6 Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Shanghai, 200032, China 7 United States Department of AgricultureAgricultural Research Service, Stoneville, Mississippi 38776, USA 8 United States Department of AgricultureAgricultural Research Service, Lubbock, Texas 79415, USA 9 Department of Biology, Texas Tech University, Lubbock, Texas 79409, USA 10 Department of Crop Science and Department of Botany, North Carolina State University, Raleigh, North Carolina 27695, USA 11 Bioinformatics Core Facility, Michigan State University, East Lansing, Michigan 48824, USA 12 Institute of Genetics and Developmental Biology, Beijing, 100101, China 13 Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602, USA 14 Oklahoma Agricultural Experiment Station, Oklahoma State University, Stillwater, Oklahoma 74078, USA
Approximately 185,000 Gossypium EST sequences comprising >94,800,000 nucleotides were amassed from 30 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including drought stress and pathogen challenges. These libraries were derived from allopolyploid cotton (Gossypium hirsutum; AT and DT genomes) as well as its two diploid progenitors, Gossypium arboreum (A genome) and Gossypium raimondii (D genome). ESTs were assembled using the Program for Assembling and Viewing ESTs (PAVE), resulting in 22,030 contigs and 29,077 singletons (51,107 unigenes). Further comparisons among the singletons and contigs led to recognition of 33,665 exemplar sequences that represent a nonredundant set of putative Gossypium genes containing partial or full-length coding regions and usually one or two UTRs. The assembly, along with their UniProt BLASTX hits, GO annotation, and Pfam analysis results, are freely accessible as a public resource for cotton genomics. Because ESTs from diploid and allotetraploid Gossypium were combined in a single assembly, we were in many cases able to bioinformatically distinguish duplicated genes in allotetraploid cotton and assign them to either the A or D genome. The assembly and associated information provide a framework for future investigation of cotton functional and evolutionary genomics.
Cotton is the world's most important fiber plant, being grown in more than 80 countries with a record forecast of 119.8 million 480-pound bales in world production during the 20042005 growing season (United States Department of AgricultureForeign Agricultural Service [USDAFAS] 2005
Most modern cotton varieties are forms of G. hirsutum, or Upland cotton, although three other species are also utilized to a lesser extent, Gossypium barbadense, Gossypium arboreum, and Gossypium herbaceum. G. barbadense and G. hirsutum are allotetraploids, each containing both an AT and a DT genome (Skovsted 1934
EST sequencing projects have been completed or are under way for many plant species. These projects have provided useful tools for intragenomic comparisons (Schlueter et al. 2004 Here we report the sequencing, clustering, and analysis of 30 EST libraries generated by an international consortium of research groups. While many of these libraries are relatively small and from specialized tissues or growth conditions, we included two larger cDNA libraries (floral and seedling) from G. raimondii (D genome) and the previously mentioned A-genome cDNA fiber library. Our strategy was to simultaneously include EST sequences from allopolyploid (AD genome) cotton and species representing its two progenitor genomes (A, D genomes), thereby facilitating the identification of duplicated AT and DT (i.e., homoeologous) transcripts for numerous genes. The resulting assembly enables an examination of sequence divergence within a well-defined system of diploid and polyploid plant species on an unprecedented scale, provides insight into gene expression in numerous different tissues and environmental conditions, and sets the stage for the development of a cotton oligonucleotide microarray with deep genomic coverage.
EST assembly A total of 185,198 EST sequences from 30 cDNA libraries were collected from 14 different research groups across the globe (Table 1). These libraries were constructed from a variety of tissues and organs under a range of conditions, including drought stress and pathogen challenges, and include representation of allopolyploid cotton as well as its two diploid progenitors. Most cDNA libraries were derived from G. hirsutum and were relatively small (from 576 to 8643 ESTs). Collectively, these G. hirsutum EST collections comprised 38% of the total used in the assembly. The remaining ESTs were derived from three, more deeply sampled cDNA libraries generated from the two diploids (one library from 710 dpa fiber of G. arboreum and two libraries of G. raimondii), comprising 24% and 38% of the total number of ESTs, respectively.
Of that initial set of ESTs, 153,969 were selected as input for the global EST assembly based on length, complexity, and sequence quality (see Methods). Nearly all of the cDNA clones of diploid libraries were sequenced from both the 5'-end and 3'-end of the transcript as were portions of other G. hirsutum libraries. After the EST selection process, a total of 87,697 clones were included as input into the assembly pipeline, where 41% of the 153,959 selected ESTs had a mate-pair (a cDNA clone was sequenced in both directions). Individual ESTs were assembled using the Program for Assembling and Viewing ESTs (PAVE). A conservative philosophy was used to align the ESTs and form a consensus sequence, that is, aligned portions of ESTs must share 95% sequence identity with <20% of overhanging sequence. Hence, alleles, homoeologs, orthologs, and paralogs were only combined into the same contig if they have a low level of divergence. Most alleles and homoeologs generally were expected to coalesce into the same contig, except the relatively rare cases of alternatively spliced transcripts. When the assembly was based on less stringent sequence similarity, it resulted in massive contigs that were joined because of similar domains (data not shown). The PAVE assembly process yielded 22,030 contigs and 29,077 singletons (51,107 unigenes) in 40.4 Mb of transcribed sequence with an average length of 791 bp (SD = 374). The number of ESTs in a contig ranged from two to 714, with a median of three sequences per contig (Fig. 1); 10,624 contigs contained forward and reverse sequence pairs from at least one cDNA clone. As expected, contigs with four or more EST members exhibited a higher percentage of mate-pairs (51%) than contigs with two (37%) or three (37%) EST members. The assembly of the ESTs into contigs used multiple libraries from three different Gossypium species (Fig. 2), of which 60% of the contigs (13,268) had EST members from more than one library and 40% of the contigs had EST members from more than one species. The values of these two numbers suggested that interspecific nucleotide variation did not have much of an effect on the global assembly process. However, other factors, such as RNA quality, library construction, indels, paralogy, differential gene expression, and systematic sequencing errors may have played a role in the EST assembly, resulting in library biases among the EST members of a contig (Supplemental Table 1). The extent that library bias reflected technical issues and not differential gene expression was unknown. Several aspects of the assembly were evaluated to assess its quality: (1) the frequency of chimeric contigs; (2) the frequency of mate-pairs in the same contig; (3) phylogenetic analysis using known genes and their relationships; and (4) the amount of redundancy among the assembly's contigs and singletons. In an ideal assembly, only ESTs transcribed from a single gene are conjoined into a single contig. However, spurious EST-contig associations can be generated through the complexity of multigene families (along with the attendant issue of paralogy) and technical errors such as EST misnaming, resulting in chimeric contigs. A straightforward means of visualizing spurious associations is to inspect contigs containing the largest number of EST sequences (Supplemental Table 2). On average, these 20 well-sampled contigs (from 136 to 714 members) contained forward and reverse sequence pairs spanning 91% of the respective contig length, suggesting that nearly the entire length of these contigs could be attributed to a single cDNA clone (i.e., single gene). Perhaps a better indication of spurious associations could be found within contigs having a poorly sampled interior region. In well-sampled contigs, most of the consensus sequence was represented by four or more individual ESTs. A possible spurious association may be where three or fewer ESTs tie together two flanking sequence segments containing four or more member ESTs. Such occurrences resulted in a "barbell" shape of the contig's EST alignment. The present assembly contained 1397 such contigs (6% of contigs), and a few of these may represent large genes that simply had poor sampling of the internal sequences. However, a subset of these contigs (n = 100) had a pair of ESTs belonging to a single cDNA clone (sequenced in both directions probably representing the 5' and 3' boundaries of a gene) and also had at least one EST member whose 5'-end did not overlap this clonal pairrather, it (and usually other sequences) was erroneously tied to the contig by a few other ESTs bridging the two regions. Both types of these contigs were flagged as "suspicious" contigs in PAVE, and based on this annotation it is possible for researchers to exclude these sequences while using PAVE. The overall distribution of the forward-reverse EST pairs also provided general insight regarding the assembly quality. From the clones sequenced in both directions, 65% of the sequence pairs had both directional reads in the same contig, 11% of the sequence pairs had both reads in different contigs, 17% of sequence pairs had one in a contig and another as a singleton, and 7% of sequence pairs had both reads as singletons, although this final percentage may partially reflect insert size and transcript frequency rather than the assembly process. The fact that only two-thirds of the forward-reverse EST pairs had both directional reads in the same contig may be explained by a combination of a conservative percent-identity parameter during the assembly process, short EST reads (or a long gene), low frequency of rare transcripts, and misnaming.
A phylogenic approach was also used to assess EST assembly quality (Close et al. 2004
While there appeared to be a low percentage of chimeric contigs in the assembly, it was more difficult to assess whether the number of contigs and singletons could be accurately reduced by further refinement. Ideally, each unigene in the assembly will correspond to a single version of an expressed gene; however, some level of gene redundancy often remains in EST assemblies when conservative parameters are used (Whitfield et al. 2002
These 33,665 exemplar sequences represent a nonredundant set of putative Gossypium genes containing partial or full-length coding regions and, usually, one or two identifiable UTRs. The coding and UTR regions were identified using ESTScan (Iseli et al. 1999
Gene annotation and Pfam
The exemplar sequences were also analyzed for their protein domains to assess assignment to characterized protein families, of which 1815 protein domains with a Pfam cutoff threshold of <1e10 were identified in 6797 (20%) exemplar sequences (Fig. 3). Here, the Pfam cutoff threshold of 1e10 was used because the conserved, characterized Pfam domains are "average domains" from many divergent species (Bateman et al. 2004
Identification of putative homoeologs The number of contigs that had one, two, three, or all four of the relevant sequence types is shown in Table 2. For 309 contigs, A, D, AT, and DT sequences were each identified. For 1870 contigs, ESTs from either one or both genomes of allopolyploid cotton were not identified. For the remaining 1966 ortholog-containing contigs, ESTs were found from only one of the two diploid species and its orthologous counterpart in G. hirsutum (i.e., A and AT, or D and DT). Because of the deep sampling of cDNA libraries from G. arboreum and G. raimondii, gene discovery was particularly rich in these species, leading to the detection of 3928 and 8626 contigs, respectively, for which only sequences from that species were recovered.
Orthologous and homoeologous sequences were occasionally split among two or more contigs or singletons because of imperfect contig assembly. To identify these cases, the pools of unigenes (above) were further examined to identify more cases of either orthology or homoeology. Within each pool of unigenes, all possible pairwise alignments were made for all contigs (or singletons) containing A and D ESTs. Alignments having 95% or greater sequence similarity (total or coding) and fewer than five gaps were designated as putative ortholog pairs. Using these criteria, 1464 additional pairs of putatively orthologous sequences were identified in the assembly, along with the position and composition of genome-diagnostic single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) that distinguish the A- and D-genome ESTs (Table 2). These ESTs were joined into a single contig, along with their counterparts from G. hirsutum, increasing the number of putatively orthologous pairs from 2179 to 3643. Within this orthologous set, polymorphisms between the A (and AT where possible) and D (and DT where possible) sequences were recorded, resulting in 2342 orthologous gene pairs distinguished by 10,000 SNPs and indels. The numerical difference between the total number of orthologous loci and those with distinguishing polymorphisms was mostly due to cases in which the A and/or D EST transcripts had little to no overlap within the contig, or lack of polymorphism in the region of overlap.
A global collection of cotton EST sequences and unigene collection EST assemblies have previously been published for cotton (G. hirsutum and G. arboreum), but these have either been limited to one library of cotton fiber (Arpat et al. 2004
The assembly process resulted in a collection of 51,107 unigenes, which were further reduced by sequence similarity using BLASTN to 33,665 Gossypium exemplar sequences (http://agcol.arizona.edu/pave/cotton/). This set of exemplar sequences represents a nonredundant collection of cotton genes, and the total number of genes was close to the number of expected genes in diploid Gossypium. Wortman et al. (2003
Because multiple libraries were used in the assembly, the EST collection provides a starting point for comparisons of expression differences between specific tissue treatments, environmental conditions, stress challenges, or plant organs (Supplemental Table 1). Statistical methods have been developed to correlate transcript frequency among libraries with differential gene expression (Claverie 1999
Cotton ESTs as a foundation for expression profiling
Because ESTs from diploid and allotetraploid Gossypium were combined in a single assembly and because the genomic origin of the diploid ESTs is known, we were often able to bioinformatically determine the genomic origin of ESTs from allotetraploid cotton. Intra- and intercontig polymorphisms were identified between and within putative genes, resulting in 3644 orthologous loci, whereas only 2052 loci had ESTs represented from one or both of the homoeologs. This expanded set of orthologous genes may provide novel resources for quantifying homoeologous transcript levels in allotetraploid cotton. Using single-strand conformational polymorphisms (SSCPs) to separate similarly sized sequences containing SNPs, Adams et al. (2003
The Gossypium EST assembly presented here provides an unprecedented look at the cotton transcriptome and contributes tools for cotton genetics and genomics efforts. The unigene set (contigs and singletons) has been reduced to a set of
Plant material, RNA extraction, and cDNA libraries Various methods were used to create cDNA libraries and perform subtractive hybridization to produce the ESTs reported in this study. Details for 28 of these libraries, including cloning vectors, sequencing methods, and library normalization (if applicable) and GenBank accession numbers have either been published (Wu et al. 2002
EST sequencing and processing
EST assembly and accessibility
The orientation of unigenes was determined by maximizing a weighted score derived from ESTScan (Lottaz et al. 2003
Putative Gene Ontology and Pfam domain analysis
Identification of G. hirsutum homoeologs
We thank J. Mesterhazy and S. Aluru for use of the Mountain cluster at ISU for Pfam analysis. We also gratefully acknowledge the support of the National Science Foundation Plant Genome Program.
[Supplemental material is available online at www.genome.org. The ESTs from GR_Ea and GR_Eb were deposited in GenBank under accession nos. CO069431CO100583 and CO100584CO132899.] Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4602906.
15 Corresponding author.
Adams, K.L., Cronn, R., Percifield, R., and Wendel, J.F. 2003. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. 100: 46494654. Adams, K.L., Percifield, R., and Wendel, J.F. 2004. Organ-specific silencing of duplicated genes in a newly synthesized cotton allotetraploid. Genetics 168: 22172226. Alba, R., Fei, Z., Payton, P., Liu, Y., Moore, S.L., Debbie, P., Cohn, J., D'Ascenzo, M., Gordon, J.S., Rose, J.K.C., et al. 2004. ESTs, cDNA microarrays, and gene expression profiling: Tools for dissecting plant physiology and development. Plant J. 39: 697714.[CrossRef][Medline] Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. 2004. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 32: D115D119. The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815.[CrossRef][Medline] Arpat, A., Waugh, M., Sullivan, J.P., Gonzales, M., Frisch, D., Main, D., Wood, T., Leslie, A., Wing, R., and Wilkins, T. 2004. Functional genomics of cell elongation in developing cotton fibers. Plant Mol. Biol. 54: 911929.[CrossRef][Medline] Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25: 2529.[CrossRef][Medline] Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., et al. 2004. The Pfam protein families database. Nucleic Acids Res. 32: D138D141. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., and Apweiler, R. 2004. The Gene Ontology Annotation (GOA) database: Sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32: D262D266. Cedroni, M.L., Cronn, R.D., Adams, K.L., Wilkins, T.A., and Wendel, J.F. 2003. Evolution and expression of MYB genes in diploid and polyploid cotton. Plant Mol. Biol. 51: 313325.[Medline] Chou, H.H. and Holmes, M.H. 2001. DNA sequence quality trimming and vector removal. Bioinformatics 17: 10931104. Chou, H.-H., Hsia, A.-P., Mooney, D.L., and Schnable, P.S. 2004. Picky: Oligo microarray design for large genomes. Bioinformatics 20: 28932902. Claverie, J.-M. 1999. Computational methods for the identification of differential and coordinated gene expression. Hum. Mol. Genet. 8: 18211832. Close, T.J., Wanamaker, S.I., Caldo, R.A., Turner, S.M., Ashlock, D.A., Dickerson, J.A., Wing, R.A., Muehlbauer, G.J., Kleinhofs, A., and Wise, R.P. 2004. A new resource for cereal genomics: 22K barley GeneChip comes of age. Plant Physiol. 134: 960968. Cronn, R.C., Small, R.L., Haselkorn, T., and Wendel, J.F. 2002. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. Am. J. Bot. 89: 707725. Dowd, C., Wilson, I.W., and McFadden, H. 2004. Gene expression profile changes in cotton root and hypocotyl tissues in response to infection with Fusarium oxysporum f. sp. vasinfectum. Mol. Plant Microbe Inter. 17: 654667.[Medline] Edgar, R.C. 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 17921797. Endrizzi, J.E., Turcotte, E.L., and Kohel, R.J. 1985. Genetics, cytology, and evolution of Gossypium. Adv. Genet. 23: 271375. Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: 175185. Ewing, R.M., Kahla, A.B., Poirot, O., Lopez, F., Audic, S., and Claverie, J.-M. 1999. Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res. 9: 950959. Felsenstein, J. 2004. PHYLIP (Phylogeny Inference Package) version 3.6. Department of Genome Sciences, University of Washington, Seattle, http://evolution.genetics.washington.edu/phylip.html. Fulton, T.M., Van der Hoeven, R., Eannetta, N.T., and Tanksley, S.D. 2002. Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell 14: 14571467. Greller, L.D. and Tobin, F.L. 1999. Detecting selective expression of genes and proteins. Genome Res. 9: 282296. Haigler, C.H., Zhang, D., and Wilkerson, C.G. 2005. Biotechnological improvement of cotton fibre maturity. Physiol. Plant. 124: 285294.[CrossRef] Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9: 868877. Hughes, A. and Friedman, R. 2004. Expression patterns of duplicate genes in the developing root in Arabidopsis thaliana. J. Mol. Evol. 60: 247256. International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome. 436: 793800. Iseli, C., Jongeneel, C.V., and Bucher, P. 1999. ESTScan: A program for detecting, evaluating, and reconstructing potential coding regions in EST sequence. Proc. Int. Conf. Intell. Syst. Mol. Biol. 138: 48. Ji, S.-J., Lu, Y.-C., Feng, J.-X., Wei, G., Li, J., Shi, Y.-H., Fu, Q., Liu, D., Luo, J.-C., and Zhu, Y.-X. 2003. Isolation and analyses of genes preferentially expressed during early cotton fiber development by subtractive PCR and cDNA array. Nucleic Acids. Res. 31: 25342543. Jiang, C., Wright, R.J., El-Zik, K.M., and Paterson, A.H. 1998. Polyploid formation created unique avenues for response to selection in Gossypium (cotton). Proc. Natl. Acad. Sci. 95: 44194424. Kalyanaraman, A., Aluru, S., Kothari, S., and Brendel, V. 2003. Efficient clustering of large EST data sets on parallel computers. Nucleic Acids Res. 31: 29632974. Kawasaki, S., Borchert, C., Deyholos, M., Wang, H., Brazille, S., Kawai, K., Galbraith, D., and Bohnert, H.J. 2001. Gene expression profiles during the initial phase of salt stress in rice. Plant Cell 13: 889906. Kim, H.J., William, M.Y., and Triplett, B.A. 2002. A novel expression assay system for fiber-specific promoters in developing cotton fibers. Plant Mol. Biol. Rep. 20: 718. Lacape, J.-M., Nguyen, T.-B., Courtois, B., Belot, J.-L., Giband, M., Gourlot, J.-P., Gawryziak, G., Roques, S., and Hau, B. 2005. QTL analysis of cotton fiber quality using multiple Gossypium hirsutum x Gossypium barbadense backcross generations. Crop Sci. 45: 123140. Lazo, G.R., Chao, S., Hummel, D.D., Edwards, H., Crossman, C.C., Lui, N., Matthews, D.E., Carollo, V.L., Hane, D.L., You, F.M., et al. 2004. Development of an expressed sequence tag (EST) resource for wheat (Triticum aestivum L.): EST generation, unigene analysis, probe selection and bioinformatics for a 16,000-locus bin-delineated map. Genetics 168: 585593. Li, X.-B., Cai, L., Cheng, N.-H., and Liu, J.-W. 2002. Molecular characterization of the cotton GhTUB1 gene that is preferentially expressed in fiber. Plant Physiol. 130: 666674. Lottaz, C., Iseli, C., Jongeneel, C.V., and Bucher, P. 2003. Modeling sequencing errors by combining Hidden Markov models. Bioinformatics 19: ii103ii112.[Abstract] Meyers, B.C., Galbraith, D.W., Nelson, T., and Agrawal, V. 2004. Methods for transcript profiling in plants. Be fruitful and replicate. Plant Physiol. 135: 637652. Michalek, W., Weschke, W., Pleissner, K.-P., and Graner, A. 2002. EST analysis in barley defines a unigene set comprising 4,000 genes. Theor. Appl. Genet. 104: 97103.[CrossRef][Medline] Mochida, K., Yamazaki, Y., and Ogihara, Y. 2003. Discrimination of homoeologous gene expression in hexaploid wheat by SNP analysis of contigs grouped from a large number of expressed sequence tags. Mol. Gen. Genomics 270: 371377.[CrossRef][Medline] Orford, S.J. and Timmis, J.N. 1998. Specific expression of an expansin gene during elongation of cotton fibers. Biochem. Biophys. Acta 1398: 342346.[Medline] Orford, S.J., Carney, T.J., Olenicky, N.S., and Timmis, J.N. 1999. Characterization of a cotton gene expressed late in fibre cell elongation. Theor. Appl. Genet. 98: 757764.[CrossRef] Pavy, N., Laroche, J., Bousquet, J., and Mackay, J. 2005. Large-scale statistical analysis of secondary xylem ESTs in pine. Plant Mol. Biol. 57: 203224.[CrossRef][Medline] Rabinowicz, P.D., Citek, R., Budiman, M.A., Numberg, A., Bedell, J.A., Lakey, N., O'Shaughnessy, A.L., Nacimiento, L.U., McCombie, W.R., and Martienssen, R.A. 2005. Differential methylation of genes and repeats in land plants. Genome Res. 15: 14311440. Rong, J., Abbey, C., Bowers, J.E., Brubaker, C.L., Chang, C., Chee, P.W., Delmonte, T.A., Ding, X., Garza, J.J., Marler, B.S., et al. 2004. A 3347-locus genetic recombination map of sequence-tagged sites reveals features of genome organization, transmission and evolution of cotton (Gossypium). Genetics 166: 389417. Ronning, C.M., Stegalkina, S.S., Ascenzi, R.A., Bougri, O., Hart, A.L., Utterbach, T.R., Vanaken, S.E., Riedmuller, S.B., White, J.A., Cho, J., et al. 2003. Comparative analyses of potato expressed sequence tag libraries. Plant Physiol. 131: 419429. Schlueter, J.A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J., and Shoemaker, R. 2004. Mining EST databases to resolve evolutionary events in major crop species. Genome 47: 868876.[Medline] Senchina, D.S., Alvarez, I., Cronn, R.C., Liu, B., Rong, J., Noyes, R.D., Paterson, A.H., Wing, R.A., Wilkins, T.A., and Wendel, J.F. 2003. Rate variation among nuclear genes and the age of polyploidy in Gossypium. Mol. Biol. Evol. 20: 633643. Skovsted, A. 1934. Cytological studies in cotton. II. Two interspecific hybrids between Asiatic and New World cottons. J. Genet. 28: 407424. Small, R.L. and Wendel, J.F. 2000a. Phylogeny, duplication, and intraspecific variation of Adh sequences in new world diploid cottons (Gossypium L., Malvaceae). Mol. Phylo. Evol. 16: 7384.[CrossRef][Medline] Small, R.L. and Wendel, J.F. 2000b. Copy number lability and evolutionary dynamics of the Adh gene family in diploid and tetraploid cotton (Gossypium). Genetics 155: 19131926. Small, R.L. and Wendel, J.F. 2002. Differential evolutionary dynamics of duplicated paralogous Adh loci in allotetraploid cotton (Gossypium). Mol. Biol. Evol. 19: 597607. Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G.R., Korf, I., Lapp, H., et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12: 16111618. Stekel, D.J., Git, Y., and Falciani, F. 2000. The comparison of gene expression from multiple cDNA libraries. Genome Res. 10: 20552061. Suo, J., Liang, X., Pu, L., Zhang, Y., and Xue, Y. 2003. Identification of GhMYB109 encoding a R2R3 MYB transcription factor that expressed specifically in fiber initials and elongating fibers of cotton (Gosspyium hirsutum L.). Biochem. Biophys. Acta 1630: 2534.[Medline] USDAFAS. 2005. Cotton: World markets and trade. United States Department of Agriculture Foreign Agricultural Service. FC-07-05, http://www.fas.usda.gov/cotton/circular/2005/07/CottonWMT.pdf Vettore, A.L., da Silva, F.R., Kemper, E.L., Souza, G.M., da Silva, A.M., Ferro, M.I.T., Henrique-Silva, F., Giglioti, E.A., Lemos, M.V.F., Coutinho, L.L., et al. 2003. Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane. Genome Res. 13: 27252735. Wendel, J.F. 1995. Cotton. In Evolution of crop plants (eds. N. Simmonds and J. Smartt), pp. 358366. Longman, London. Wendel, J.F. and Cronn, R.C. 2003. Polyploidy and the evolutionary history of cotton. Adv. Agronomy 78: 139186. Whitfield, C.W., Band, M.R., Bonaldo, M.F., Kumar, C.G., Liu, L., Pardinas, J.R., Robertson, H.M., Soares, M.B., and Robinson, G.E. 2002. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 12: 555566. Wilkins, T.A. and Smart, L.B. 1996. Isolation of RNA from plant tissue. In A laboratory guide to RNA: Isolation, analysis, and synthesis (ed. P.A. Krieg), pp. 2141. Wiley-Liss, New York. Wisman, E. and Ohlrogge, J. 2000. Arabidopsis microarray service facilities. Plant Physiol. 124: 14681471. Wortman, J.R., Haas, B.J., Hannick, L.I., Smith Jr., R.K., Maiti, R., Ronning, C.M., Chan, A.P., Yu, C., Ayele, M., Whitelaw, C.A., et al. 2003. Annotation of the Arabidopsis genome. Plant Physiol. 132: 461468. Wright, R.J., Thaxton, P.M., El-Zik, K.M., and Paterson, A.H. 1998. D-subgenome bias of Xcm resistance genes in tetraploid Gossypium (cotton) suggests that polyploid formation has created novel avenues for evolution. Genetics 149: 19871996. Wu, Y., Llewellyn, D.J., and Dennis, E.S. 2002. A quick and easy method for isolating good-quality RNA from cotton (Gossypium hirsutum L.) tissues. Plant Mol. Biol. Rep. 20: 213218. Zhang, D., Hrmova, M., Wan, C.-H., Wu, C., Balzen, J., Cai, W., Wang, J., Densmore, L.D., Fincher, G.B., Zhang, H., et al. 2004. Members of a new group of chitinase-like genes are expressed preferentially in cotton cells with secondary walls. Plant Mol. Biol. 54: 353372.[Medline] Zhao, G. and Liu, J. 2001. Isolation of a cotton RGP gene: A homolog of reversibly glycosylated polypeptide highly expressed during fiber development. Biochem. Biophys. Acta 1574: 370374. Zuo, K., Wang, J., Wu, W., Chai, Y., Sun, X., and Tang, K. 2005. Identification and characterization of differentially expressed ESTs of Gossypium barbadense infected by Verticillium dahliae with suppression of subtractive hybridization. Mol. Biol. 39: 191199.[CrossRef]
Received August 25, 2005; accepted in revised format November 14, 2005. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||