Published online before print
July 15, 2004, 10.1101/gr.1953904
Genome Res. 14:1562-1574, 2004
©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Letter
Clustering of DNA Sequences in Human Promoters
Peter C. FitzGerald1,
Andrey Shlyakhtenko2,
Alain A. Mir2 and
Charles Vinson2,3
1 Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
2 Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1953904. Article published online before print in July 2004.
3 Corresponding author. E-MAIL Vinsonc{at}dc37a.nci.nih.gov; FAX (301) 496-8419.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
C. Linhart, Y. Halperin, and R. Shamir
Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets
Genome Res.,
July 1, 2008;
18(7):
1180 - 1189.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Tharakaraman, O. Bodenreider, D. Landsman, J. L. Spouge, and L. Marino-Ramirez
The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site
Nucleic Acids Res.,
May 1, 2008;
36(8):
2777 - 2786.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. C. Scarpulla
Transcriptional Paradigms in Mammalian Mitochondrial Biogenesis and Function
Physiol Rev,
April 1, 2008;
88(2):
611 - 638.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Ceribelli, D. Dolfini, D. Merico, R. Gatta, A. M. Vigano, G. Pavesi, and R. Mantovani
The Histone-Like NF-Y Is a Bifunctional Transcription Factor
Mol. Cell. Biol.,
March 15, 2008;
28(6):
2047 - 2058.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Sampath and N. A. DeLuca
Binding of ICP4, TATA-Binding Protein, and RNA Polymerase II to Herpes Simplex Virus Type 1 Immediate-Early, Early, and Late Promoters in Virus-Infected Cells
J. Virol.,
March 1, 2008;
82(5):
2339 - 2349.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. C. Chia, A. Leung, T. Krushel, N. M. Alajez, K. W. Lo, P. Busson, H. J. Klamut, C. Bastianutto, and F.-F. Liu
Nuclear Factor-Y and Epstein Barr Virus in Nasopharyngeal Cancer
Clin. Cancer Res.,
February 15, 2008;
14(4):
984 - 994.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Y. Yamamoto and J. Obokata
ppdb: a plant promoter database
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D977 - D981.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Jiang, M. Q. Zhang, and X. Zhang
OSCAR: One-class SVM for accurate recognition of cis-elements
Bioinformatics,
November 1, 2007;
23(21):
2823 - 2828.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Y. Yamamoto, H. Ichida, T. Abe, Y. Suzuki, S. Sugano, and J. Obokata
Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis
Nucleic Acids Res.,
September 25, 2007;
35(18):
6219 - 6226.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. C. Hollenhorst, A. A. Shah, C. Hopkins, and B. J. Graves
Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family
Genes & Dev.,
August 1, 2007;
21(15):
1882 - 1894.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Xi, Y. Yu, Y. Fu, J. Foley, A. Halees, and Z. Weng
Analysis of overrepresented motifs in human core promoters reveals dual regulatory roles of YY1
Genome Res.,
June 1, 2007;
17(6):
798 - 806.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F. Muller, M. A. Demeny, and L. Tora
New Problems in RNA Polymerase II Transcription Initiation: Matching the Diversity of Core Promoters with a Variety of Promoter Recognition Factors
J. Biol. Chem.,
May 18, 2007;
282(20):
14685 - 14689.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Ma, Y. Pan, J. Zheng, A. J. Levine, and R. Nussinov
Sequence analysis of p53 response-elements suggests multiple binding modes of the p53 tetramer to DNA targets
Nucleic Acids Res.,
May 14, 2007;
35(9):
2986 - 3001.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Bellora, D. Farre, and M. Mar Alba
PEAKS: identification of regulatory motifs by their position in DNA sequences
Bioinformatics,
January 15, 2007;
23(2):
243 - 244.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. J. Martinez, A. D. Smith, B. Li, M. Q. Zhang, and K. S. Harrod
Computational prediction of novel components of lung transcriptional networks
Bioinformatics,
January 1, 2007;
23(1):
21 - 29.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. D. Deppmann, R. S. Alvania, and E. J. Taparowsky
Cross-Species Annotation of Basic Leucine Zipper Factor Interactions: Insight into the Evolution of Closed Interaction Networks
Mol. Biol. Evol.,
August 1, 2006;
23(8):
1480 - 1492.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. Abnizova and W. R. Gilks
Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes
Brief Bioinform,
March 1, 2006;
7(1):
48 - 54.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. T. Smith, R. D. Nicholls, and D. Reines
The gene encoding the fragile X RNA-binding protein is controlled by nuclear respiratory factor 2 and the CREB family of transcription factors
Nucleic Acids Res.,
February 25, 2006;
34(4):
1205 - 1215.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Stepanova, T. Tiazhelova, M. Skoblov, and A. Baranova
A comparative analysis of relative occurrence of transcription factor binding sites in vertebrate genomes and gene promoter areas
Bioinformatics,
May 1, 2005;
21(9):
1789 - 1796.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Testa, G. Donati, P. Yan, F. Romani, T. H.-M. Huang, M. A. Vigano, and R. Mantovani
Chromatin Immunoprecipitation (ChIP) on Chip Experiments Uncover a Widespread Distribution of NF-Y Binding CCAAT Sites Outside of Core Promoters
J. Biol. Chem.,
April 8, 2005;
280(14):
13606 - 13615.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Bielinska, J. Lu, D. Sturgill, and B. Oliver
Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline
Genetics,
January 1, 2005;
169(1):
161 - 172.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|