|
|
|
|
Genome Res. 13:2030-2041, 2003 ©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00 Letter Conserved Noncoding Sequences in the Grasses41 Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, USA 2 College of Natural Resources, University of California, Berkeley, Berkeley, California 94720, USA 3 Torrey Mesa Research Institute, Syngenta Corporation, San Diego, California, USA
As orthologous genes from related species diverge over time, some sequences are conserved in noncoding regions. In mammals, large phylogenetic footprints, or conserved noncoding sequences (CNSs), are known to be common features of genes. Here we present the first large-scale analysis of plant genes for CNSs. We used maize and rice, maximally diverged members of the grass family of monocots. Using a local sequence alignment set to deliver only significant alignments, we found one or more CNSs in the noncoding regions of the majority of genes studied. Grass genes have dramatically fewer and much smaller CNSs than mammalian genes. Twenty-seven percent of grass gene comparisons revealed no CNSs. Genes functioning in upstream regulatory roles, such as transcription factors, are greatly enriched for CNSs relative to genes encoding enzymes or structural proteins. Further, we show that a CNS cluster in an intron of the knotted1 homeobox gene serves as a site of negative regulation. We showthat CNSs in the adh1 gene do not correlate with known cis-acting sites. We discuss the potential meanings of CNSs and their value as analytical tools and evolutionary characters. We advance the idea that many CNSs function to lock-in gene regulatory decisions.
Regions of DNA coding for protein are expected to exhibit sequence conservation between related species over evolutionary time due to the functional constraints of protein structure. It has become apparent during the past five years that noncoding sequence also exhibits functional constraints. Conservation outside of coding exons can be detected by cross-species orthologous gene comparisons. Such regions of noncoding DNA displaying strong sequence homology among distantly related organisms have been described as "phylogenetic footprints" (Gumucio et al. 1996
Various algorithms and stringency schemes are currently employed to
identify CNSs in mammals. Dubchak and coworkers
(2000
Individual CNSs in mousehuman gene comparisons have been shown to
function based on transgenic CNS knock-out experiments
(Loots et al. 2000
In mammals, using a human versus mouse comparison, Jareborg and coworkers
(1999
The grass family (Poaceae) is a monophyletic taxon of approximately 10,000
species, including all the major grain crops, such as Oryza sativa
(rice) and Zea mays (maize). All but the most basal grasses are
thought to have originated from a common ancestor approximately 50 million
years ago (Mya; Kellogg 2001
Development of Bioinformatic Tools In order to identify regions of high conservation between gene pairs, we developed software to identify and display CNSs in a graph format. Although most mammalian studies identified conserved regions based on a global sequence alignment followed by an analysis of percent nucleotide identity within a window of a defined number of nucleotides (e.g., Vista; Dubchak et al. 2000
There are two elements to our software package. The first element, the
"CNS Blaster" uses a Perl-based script to read a GenBank text file
with CDS (coding sequence) annotation and a second text file containing the
orthologous genomic comparison sequence, both chosen by the user. Coding exons
are masked from comparison in the annotated sequence. The software then
employs BLAST 2 SEQUENCES (bl2seq) to generate files containing the CNS
analysis data, with parameters set exactly as by Kaplinsky and coworkers
(2002
For this study, we focused on sequences having a bl2seq statistical score
of at least the significance of a 15-nucleotide (nt) identical match between
the compared sequences. Kaplinsky and coworkers
(2002
Fifty-Two MaizeRice Orthologous Gene CNS Descriptions
Grass CNS Statistics
CNS-Rich Genes Tend to Be Upstream Regulatory Genes
We calculated CNS density in grass genes by dividing the total number of bases occupied by CNS by the total number of noncoding bases studied per gene (Supplemental Table 4). Table 3B presents these CNS density data organized by type of gene. The results show that genes encoding regulatory proteins have a higher CNS density than genes of other functions.
Grass Versus Mammalian CNS Frequencies, Sizes, and Densities Since grass genes encoding housekeeping enzyme functions have, in general, few and small CNSs, but all mousehuman gene comparisons revealed numerous, large CNSs, we expected that a comparison of mammalian housekeeping orthologs would reveal comparatively abundant CNSs. To test this hypothesis, we chose three humanmouse enzyme-encoding genes whose functions are also represented on our grass list (Table 1). The grass gene glyceraldehyde-3-phosphate dehydrogenase (gpa1) has only two CNSs, and a density of 2.49%, however, the homologous mammalian gene GAPDH (human J04038 [GenBank] ; mouse NW_000265) has 15 CNSs, five of them over 70 bp in length, at a density of 12.3%. Whereas the grass comparison for alcohol dehydrogenase (adh1) produced two CNSs, both under 20 bp in length for a density of 0.27%, a homologous gene from mammals (ADH1; human NT_016354 [GenBank] ; mouse NT_039242 [GenBank] ) revealed 15 CNSs, four of them over 50 bp, and a noncoding space that is 3.6% CNS. Finally, we compared two pairs of genes encoding enzymes involved in sugar homeostasis in both grassessucrose synthase (sus1)and mammalsglucose-6-phosphatase (G6PC: human NT_010755 [GenBank] ; mouse NT_03952). The grass gene comparison produced six CNSs, none longer than 24 bp in length, and CNSs constituted 2.37% of the gene's noncoding sequence. The mammalian gene pair produced over 50 CNSs with one 403 bp and another 174 bp in length at a density slightly higher than 10%.
A very low proportion of grass noncoding sequence is conserved between rice
and maize: Approximately 2% of the noncoding nucleotides used in this study
are conserved, as identified by our parameters (Suppl. Table 4). Such a low
percentage of conserved noncoding sequence content in plants is in sharp
contrast to the situation in the mammalian genome (first observed by
Kaplinsky et al. 2002
A Cluster of CNSs in a knotted1 Transcription Factor Gene
Intron Corresponds to a Region Where Transposon Insertions Result in Ectopic
Expression
To further investigate the potential regulatory nature of the kn1
third intron CNSs, we sought to address the question: Is the 5' portion
of the intron, which has numerous CNSs, enriched for known plant transcription
factor (TF) binding motifs relative to the 3' portion, which has very
few CNSs? We submitted the maize kn1 intron sequence to the PlantCARE
Web application
(http://oberon.rug.ac.be:8080/PlantCARE/index.html;
Lescot et al. 2002
Conserved Elements in the Promoter of adh1 Do Not Include
Known cis-Acting Binding Sites Nor a Potential Scaffold Attachment
Region
ADH1 is expressed constitutively in certain vascular cells, root caps,
scutellum, pollen mother cells, and the generative cell of the male
gametophyte, is anaerobically and auxin-inducible in root and mesocotyl
parenchyma, but is not expressed in leaves (for review, see
Freeling and Bennett 1985 Because the TATA, CAAT, and ARE regions, for which protein binding is certain, coincide with no CNSs, we lowered our CNS statistical significance cutoff to the level of a 13/13 exact match. We did not find hits in the general area within 200 bp from the start of transcription where these sites exist. The DNA-seI-hypersensitive sites include almost the entire promoter, and so lack specific diagnostic power. The putative scaffold attachment region is not a CNS. In general, areas of the Adh1-F promoter known to bind proteins are not CNSs, and those protected by bound molecules in footprint experiments show a poor correlation with CNS1 or other, less significant phylogenetic footprints. Although it is possible that the S1 hypersensitive region is related to CNS1, which is 5 bp proximal, CNSs are not obvious beacons to the sites where proteins bind the adh1 promoter.
CNS Discovery Made Unexpected Contributions to Grass Gene
Annotation
In this analysis of 52 grass gene pairs for conserved noncoding sequences, we found that, in general, those genes with the largest number, size, and density of CNSs are transcription factors or other regulatory genes. Those genes with few or no CNSs are generally enzyme-encoding or structural protein-encoding. Although most genes had from zero to four conserved elements, 11 genes had from five to 35 CNSs. We were unable to identify any CNSs using our parameters in 27% of the genes studied. The CNSs identified were usually small segments between 15 and 19 bp in length, although some CNSs were much larger, on the order of 50100 or more bp. CNSs were approximately as likely to occur in upstream regions as in introns or downstream regions. The short size and infrequency of grass CNSs is consistent with the promoter comparisons of Gau and Moose (2003
We know of no study in any taxon that identifies a subpopulation of genes
as "richer" in noncoding components. It is not apparent from
previous work that upstream regulatory genes (such as developmental
transcription factors that are only "on" at specific times and
places) should be thought of as more complex than worker genes (such as
enzymes) that are used at many developmental times and places. On the
contrary, genes that are available for transcriptional regulation at many
different times and places in an organism are known to have complex promoters
with many, often reiterated, binding sites for the many different specific
transcription factors used to exert control
(Quinn 1996
Our most CNS-complex genes (lrs1, te1, gn1, kn1), along with the
previously reported lg1
(Kaplinsky et al. 2002
In One Case, a Cluster of CNSs May Bind Negative Regulatory
Factors
Investigating the potential molecular function of kn1 CNSs, we
found that the 13 CNSs in maizerice kn1-intron3 are also
present in the intron3 of the kn1 ortholog in barley, a
grass in yet another subfamily of grass, as was expected from the previous
work of Kaplinsky and coworkers
(2002 Comparisons of global and local alignments in this 13-CNS region, and local alignments done at a variety of stringencies, found that about half of these CNSs are discreet, in that they are surrounded by gaps, whereas about half are surrounded by sequence that can be aligned, but is less well conserved. Thus, examination of the structure of this CNS-rich region of kn1-intron3 (Fig. 5 and associated text) does not clearly answer the question: Are these conserved CNSs 13 independent modules, or one big module, or some combination? Future studies could look for factors binding the kn1 third intron CNSs and may elucidate any potential role in the tissue-specific regulation of gene expression.
Are CNSs Transcription Factor Binding Sites? In contrast to mammalian studies, plant orthologous gene comparisons return a significant fraction having no detectable conserved segments above the threshold level. Because all of the sequences compared presumably represent functional alleles, and because genes require cis-acting binding sites for expression, any conserved regulatory sequences must fall below the 15/15 significance threshold of our parameters. Interestingly, the known regions of protein binding in the adh1 promoter did not coincide with a promoter CNS, indicating that the promoter regions responsible for tissue-specific, anaerobic response, hormone response, and constitutive expression do not produce a large enough footprint to have been conserved between maize and rice since their common ancestor. It is possible that we were unable to detect enrichment for transcription factor binding motifs in CNSs due to the rudimentary nature of current knowledge of such sites in plants. Indeed, the majority of sites listed in the PlantCARE database are 4 bp in length, and only constitute a few hundred known sites. Further, very small sequences are extremely common in coding and noncoding DNA, and the presence of functional sites is often difficult to distinguish due to the high number of sequence matches for nonfunctional sites. Since the "noise" at any element length below approximately 10 bp is expected to be high, our results were expected. Therefore, future attempts to correlate CNSs with DNA-binding motifs should utilize a larger database of sites and focus primarily on the larger, less frequent motifs, or clusters of such motifs.
Mammalian and Grass Genes Differ in CNS Content It is intriguing that mammalian CNS studies find vastly greater tracts of conserved noncoding sequences among genes of all functional classes. We showed that for the homologous glyceraldehyde-3-phosphate dehydrogenase, alcohol dehydrogenase, and sugar homeostasis "worker" genes in grasses and mammals, the maizerice comparisons revealed fewer and shorter CNSs than the humanmouse comparisons. Overall, we found that the average grass gene has about three CNSs whereas the typical mammalian gene has between 15 and 20. Whereas plant genes generally have short (20-bp) CNSs that range to approximately 100 bp in length, mammalian genes routinely have CNSs over 100 bp. If mammalian genes' noncoding sequences have been subject to selection over more numerous and much larger regions than grass genes, then the functional constraints on noncoding regions are greatly weaker in grasses than mammals. Perhaps grass genomes have undergone a particularly rapid evolutionary process during periods of high transposon activity and genome restructuring. Such a process might be expected to contribute to the evolution of chromosome-level structure, but seems to have little effect on the relatively small stretches of DNA surrounding and encoding genes. In all eight cases where more than one gene was found on our maize sequence, both genes were found together in rice, and relative exons-intron sizes were similar as well.
Are CNSs Signatures of Developmental Complexity?
A simple definition of complexity involves the product of the number of
parts and the number of connections between those parts. Because mammals have
so many more stem-cell populations, organs, and organ systems than do grasses,
and because plant cells are cemented inside walls, which prohibits making new
connections by rotating or migrating, mammals must surely be the more complex
organisms. Mammalian cells follow strict developmental pathways and
differentiate terminally, events undoubtedly necessitating the permanent
modification of chromatin state, often associated with the binding of
regulatory proteins to DNA (Li
2002 It is possible that mammalian RNA transcripts are richer in information content compared to grasses, allowing for the production of numerous specialized proteins from a single gene. In such a scenario, mammalian introns and UTRs would be rich in conserved instructions for the proper function of the splicing machinery. Alternative generation and splicing of RNA transcripts has been shown to be common in mammals generating different transcripts depending on developmental or other cues, whereas alternative splicing in plants is relatively uncommon. Perhaps binding at CNSs facilitates this capacity for which mammalian genes are more complex.
Perhaps CNSs are involved in transcriptional regulation, but do not serve
as traditional transcription factor binding sites. Some may perform roles in
binding or producing regulatory RNAs (microRNAs). Alternatively, some CNSs may
serve not to catalyze the stepwise binding of individual proteins, but in the
assembly of large multiprotein complexes to perform complex regulatory
functions. In such a case, a large CNS might not be a "binding
site," in the classical sense but rather a "structural
template" for the formation of chromatin-level control factors, as
proposed by Kaplinsky et al.
(2002
One Particularly Robust Idea for the Function of the CNS Difference
Between Grasses and Mammals
Limitations
Currently, the high-quality rice genome (the public Rice Genome Project) is
incomplete, and maize, sorghum, and other grasses have very limited stretches
of genomic sequence completed. All cross-species comparisons are by necessity
limited by the length and quality of the available genomic sequences. We are
not currently able to assess the possibility that plant genes may be regulated
by distant, conserved elements, such as is the case with the mammalian
Because our maize genomic sequences were generally limited to the space surrounding a single gene, we were only able to use synteny to certify orthology for eight of our 52 genes. For these genes the true ortholog was also the best hit in rice. Analysis of the other 47 genes required that the maize gene be compared to its best homolog in all of the rice genomic sequence available, which included two whole genome contig databases, plus the collection of contigs produced by the Rice Genome Project. We also checked to see whether average exon nucleotide sequence identities were near 85% (75%90%), which is what we generally find between syntenic maize and rice genes. We are confident that the comparisons presented are orthologs, and derive from a common ancestor about 50 Mya. It is possible that some gene pairs may share sequence similarity because they are members of a diverged gene family, but trace back to an ancestor more ancient than the first grass. This situation could result, for example, from rapid evolution of an ortholog or from a deletion removing an ortholog, leaving only distant homologs for comparison. As more long genomic sequences become available from rice and other grasses, synteny will be increasingly valuable to identify unambiguous orthologs.
At the bioinformatics level, the number of CNSs detected is limited by the
BLAST parameters (the stringency of the local alignment). Although we define
CNS size, position, and sequence based on our standard settings, it is always
possible to obtain more or fewer "hits" by altering the settings
(Kaplinsky et al. 2002
Practical Considerations
CNSs as Tools for the Study of Gene Evolution and the Origin of
Novelty
Maize is an excellent model system with which to test hypotheses about the
consequences of duplication, with a well documented history of
tetraploidization (Ahn and Tanksley
1993
All source code for the computational analysis presented here is available upon request. Our blaster and viewer work well but presently require the use of the Unix command line.
CNS BLAST and Gene Annotation Viewer Software Before running either program, one must first start with an annotated sequence (typically a maize sequence from GenBank). A cutoff of 3000 bases upstream and downstream of the coding region is used to define the noncoding sequence associated with a given maize gene. This annotated sequence is used to probe other available databases, here rice, via BLASTN, and to manually select the sequence, or sequences, considered strongest from the BLAST results. At this point, user involvement ends and a Perl-based program takes these two sequences and begins analysis. Before beginning analysis, the program must identify the strand orientation of the compared genes. In all cases for this study, the maize sequence is in the plus orientation. The program analyzes the plus/plus and plus/minus matches in a bl2seq output. If the plus/minus matches carry more weight than the plus/plus matches, the program generates the reverse complement of the rice sequence, and reruns BLAST. The next step is to establish a rough alignment of the two sequences beginning with the start of translation of the gene product. If both sequences are annotated, the program records the exon indices for both maize and rice and proceeds. However, if the rice sequence is not annotated, then the maize annotation must be used to predict the coding regions in the rice ortholog. Using the previously generated BLAST result, which should have created sequences in the same orientation, the start and end indices of the maize coding region are used to predict the rice boundaries. If the corresponding rice indices are found, then the program records the indices and continues. If only one index is returned, then the missing rice index is estimated by adding or subtracting the length of the coding region of maize. If the genes are not similar enough to include matches that cover the start and end indices of the maize gene, then a simpler, less accurate secondary procedure is executed. This simpler alignment takes the first BLAST result hit representing a coding region in the maize gene, and discerns the distance of the maize sequence from the start and end indices of the maize gene in question. Using those distances, the program guesses where the rice coding region starts and stops. Although the last two techniques are much less exact, they provide a rough outline that helps in understanding the relative position of the retrieved CNSs. The last step for the Perl program is the retrieval of CNSs. The program begins by substituting an `n' for every coding nucleotide in the annotated sequence, thereby masking the coding regions from the subsequent bl2seq search. The bl2seq is rerun, returning all hits in noncoding regions. The program keeps only matches greater than 15 nucleotides in length in order to enrich the results for the most significant regions of similarity. The second aspect of this software package is the viewer, which is designed to take the information generated above and create an easy to understand interface, as described earlier. The viewer is programmed in Java and utilizes the Swing package, available free as part of the JavaTM Foundation Classes (JFC). The applet itself takes in a flat file and parses it; each gene is then represented as an Object containing Objects of coding regions, noncoding regions, and CNSs. Information is then retrieved from each Object by the user, primarily through mouse interaction. Java was chosen for several reasons. It allows for interoperability across most platforms running the Java 2 Runtime Environment. Second, it easily and quickly permits user interaction. Finally, one can convert the applet into an application while conserving most of the original code, for use on auxiliary or private databases. The initial window of the Gene Annotation Viewer presents a thumbnail view of information regarding the accession numbers of the compared sequences and provides an overview of the CNSs identified, their sequences, their orientation, and their cross-species nucleotide conservation. Most CNSs are indicated by a black line connecting orthologs, these sequences being in the +/+ orientation; red lines connect +/ alignments. A toggle switch allows the user to choose which CNSs will be displayed on the graphical view. The graphical viewer itself has a zoom function, and upon dragging across the gene schematic, displays the nucleotide sequence text desired. Exons, introns, and CNSs can be selected individually for text-based web applications such as BLASTN, BLASTX, and open reading frame finders. In the Annotation Viewer, an annotated sequence is colored dark blue, and unannotated sequence is marked with a single light blue rectangle to indicate the predicted position of the corresponding gene's coding region, including introns. The software does not annotate the comparison sequence; therefore, it does not display exon structure of an unannotated input sequence.
Manual Adjustment of CNSs
We thank members of the Freeling Laboratory for contributions in the early stages of this project. Karen Osmont, Noriko Inada, Damon Lisch, Keith Slotkin, Maggie Woodhouse, George Theodoris, Stanley Lee, and Randall Tyers provided sequences or analyses toward our data set. We thank Virginia Walbot for comments on plant plasticity, Randall Tyers for critical review of the manuscript, and Nancy Nelson for her valuable assistance. We are grateful for the thoughtful comments of two anonymous reviewers. D.C.I. was supported by a National Science Foundation Graduate Research Fellowship. This work was supported by the Plant and Microbial Biology-Syngenta Collaborative Research Agreement, UC-Berkeley. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1280703. [Supplemental material is available online at www.genome.org.]
4 Present address: Syngenta Biotechnology, Research Triangle Park, NC
27713, USA.
5 Corresponding author.
Ahn, S. and Tanksley, S.D. 1993. Comparative linkage
maps of rice and maize genomes. Proc. Natl. Acad. Sci.
90:
79807984. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tools. J. Mol. Biol. 215: 403410.[CrossRef][Medline]
Bennetzen, J.L. 2000. Comparative sequence analysis of
plant nuclear genomes: Microcolinearity and its many exceptions.
Plant Cell 12:
10211029.
Blanchette, M. and Tompa, M. 2002. Discovery of
regulatory elements by a computational method for phylogenetic footprinting.
Genome Res. 12:
739748. Bolouri, H. and Davidson, E.H. 2002. Modeling DNA sequence-based cis-regulatory gene networks. Dev. Biol. 246: 213.[CrossRef][Medline] Bucher, P. 1999. Regulatory elements and expression profiles. Curr. Opin. Struct. Biol. 9: 400407.[CrossRef][Medline]
Colinas, J., Birnbaum, K., and Benfey, P. 2002. Using
cauliflower to find conserved noncoding regions in Arabidopsis.
Plant Physiol. 129:
451454.
Devos, K.M. and Gale, M.D. 2000. Genome relationships:
The grass model in current research. Plant Cell
12:
637646.
Dubchak, I., Brudno, M., Loots, G.G., Pachter, L., Mayor, C.,
Rubin, E.M., and Frazer, K.A. 2000. Active conservation of
noncoding sequences revealed by three-way species comparisons.
Genome Res. 10:
13041306. Ferl, R.J., Nick, H.S., and Laughner, B.H. 1987. Architecture of a plant promoter: S1 nuclease hypersensitive features of maize Adh1. Plant Mol. Biol. 8: 299307.[CrossRef]
Force, A.M., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.-L., and
Postlethwait, J. 1999. Preservation of duplicate genes by
complementary, degenerate mutations. Genetics
151:
15311545. Freeling, M. 1992. A conceptual framework for maize leaf development. Dev. Biol. 153: 4458.[CrossRef][Medline] Freeling, M. and Bennett, D.C. 1985. Maize Adh1. Annu. Rev. Genet. 19: 297323.[Medline]
Gau, H. and Moose, S.P. 2003. Conserved noncoding
sequences among cultivated cereal genomes identify candidate regulatory
sequence elements and patterns of promoter evolution. Plant
Cell 15:
11431158.
Gaut, B.S. and Doebley, J.F. 1997. DNA sequence
evidence for the segmental allotetraploid origin of maize. Proc.
Natl. Acad. Sci. 94:
68096814.
Goff, S.A., Ricke, D., Lan, T-H., Presting, G., Wang, R., Dunn, M.,
Glazebrook, J., Sessions, A., Oeller, P., Varma, H., et al. 2002.
A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).
Science 296:
92114.
Göttgens, B., Gilbert, J.G.R., Barton, L.M., Grafham, D.,
Rogers, J., Bentley, D.R., and Green, A.R. 2001. Long-range
comparison of human and mouse SCL loci: Localized regions of sensitivity to
restriction endonucleases correspond precisely with peaks of conserved
noncoding sequences. Genome Res.
11:
8797. Greene, B., Walko, R., and Hake, S. 1994. Mutator insertions in an intron of the maize knotted1 gene result in dominant suppressible mutations. Genetics 138: 12751285.[Abstract]
Gumucio, D.L., Shelton, D.A., Zhu, W., Millinoff, D., Gray, T.,
Bock, J.H., Slightom, J.L., and Goodman, M. 1996. Evolutionary
strategies for the elucidation of cis and trans factors that
regulate the developmental switching programs of the Hardison, R.C. 2000. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16: 369372.[CrossRef][Medline]
Hardison, R.C., Oeltjen, J., and Miller, W. 1997. Long
humanmouse sequence alignments reveal novel regulatory elements: A
reason to sequence the mouse genome. Genome Res.
7:
959966. Hogenesch, J.B., Ching, K.A., Batalov, S., Su, A.I., Walker, J.R., Zhou, Y., Kay, S.A., Schultz, P.G., and Cooke, M.P. 2001. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 106: 413415.[CrossRef][Medline]
Jareborg, N., Birney, E., and Durbin, R. 1999.
Comparative analysis of noncoding regions of 77 orthologous mouse and human
gene pairs. Genome Res.
9:
815824.
Kaplinsky, N.J., Braun, D.M., Penterman, J., Goff, S.A., and
Freeling, M. 2002. Utility and distribution of conserved
noncoding sequences in the grasses. Proc. Nat. Acad.
Sci. 99:
61476151.
Kellogg, E.A. 2001. Evolutionary history of the
grasses. Plant Physiol.
125:
11981205.
Lescot, M., Déhais, P., Thijs, G., Marchal, K., Moreau, Y.,
Van de Peer, Y., Rouzé, P., and Rombauts, S. 2002.
PlantCARE, a database of plant cis-acting regulatory elements and a
portal to tools for in silico analysis of promoter sequences.
Nucleic Acids Res. 30:
325327.
Levy, S., Hannenhalli, S., and Workman, C. 2001.
Enrichment of regulatory signals in conserved noncoding sequence.
Bioinformatics 17:
871877. Lewis, E.B. 1951. Pseudoallelism and gene evolution. Cold Spring Harbor Symp. Quant. Biol. 16: 159174.[Medline] Li, E. 2002. Chromatin modification and epigenetic reprogramming in mammalian development. Nat. Rev. Genet. 3: 662673.[CrossRef][Medline]
Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller,
W., Rubin, E.M., and Frazer, K.A. 2000. Identification of a
coordinate regulatory of interleukins 4, 13, and 5 by cross-species sequence
comparisons. Science
288:
136140.
Loots, G.G., Ovcharenki, I., Pachter, L., Dubchak, I., and Rubin,
E. 2002. rVista for comparative sequence-based discovery of
functional transcription factors binding sites. Genome
Res. 12:
832839.
Lynch, M. and Force, A. 2000. The probability of
duplicate gene preservation by subfunctionalization.
Genetics 154:
459473.
Nakano, R., Matsumura, T., Sakakibara, H., Sugiyama, T., and Hase,
T. 1997. Cloning of maize ferredoxin III gene: Presence of a
unique repetitive nucleotide sequence within an intron found in the
5'-untranslated region. Plant Cell Physiol.
38:
11671170.
Paul, A.-L. and Ferl, R.J. 1991. In vivo footprinting
reveals unique cis-elements and different modes of hypoxic induction
in maize Adh1 and Adh2. The Plant Cell
3:
159168. Paul, A.-L. and Ferl, R.J. 1993. Osmium tetroxide footprinting of a scaffold attachment region in the maize Adh1 promoter. Plant Mol. Biol. 22: 11451151.[CrossRef][Medline]
Paul, A.-L., Vasil, V., Vasil, I.K., and Ferl, R.J.
1987. Constitutive and anaerobically induced
Dnase-I-hypersensitive sites in the 5' region of the maize Adh1
gene. Proc. Nat. Acad. Sci.
84:
799803. Pennacchio, L.A., Olivier, M., Hubacek, J.A., Cohen, J.C., Cox, D.R., Fruchart, J-C., Krauss, R.M., and Rubin, E.M. 2001. An apolipoprotein influencing triglycerides in humans and mice revealed by comparative sequencing. Science 294: 169173. |