Genome Res. 13:2381-2390, 2003
©2003 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/03 $5.00
A Biophysical Approach to Transcription Factor Binding Site Discovery
Marko Djordjevic1,
Anirvan M. Sengupta2 and
Boris I. Shraiman2,3
1 Department of Physics, Columbia University, New York, New York 10025, USA
2 Department of Physics and BioMaPS Institute, Rutgers University, Piscataway, New Jersey 08854, USA
Identification of transcription factor binding sites within regulatory segments of genomic DNA is an important step toward understanding of the regulatory circuits that control expression of genes. Here, we describe a novel bioinformatics method that bases classification of potential binding sites explicitly on the estimate of sequence-specific binding energy of a given transcription factor. The method also estimates the chemical potential of the factor that defines the threshold of binding. In contrast with the widely used information-theoretic weight matrix method, the new approach correctly describes saturation in the transcription factor/DNA binding probability. This results in a significant improvement in the number of expected false positives, particularly in the ubiquitous case of low-specificity factors. In the strong binding limit, the algorithm is related to the support vector machine approach to pattern recognition. The new method is used to identify likely genomic binding sites for the E. coli transcription factors collected in the DPInteract database. In addition, for CRP (a global regulatory factor), the likely regulatory modality (i.e., repressor or activator) of predicted binding sites is determined.
3 Corresponding author. E-MAIL shraiman{at}physics.rutgers.edu; FAX (805) 893-4111.
[Supplemental material is available online at www.genome.org. The complete list of predicted sites may be found at http://www.biomaps.rutgers.edu/bioinformatics/QPMEME.htm.]
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1271603.
4 For brevity, from now on we refer to the free energy of binding simply as binding energy. In biophysical literature, the commonly used notation for this quantity would be G(S) rather than E(S).
5 Although it is convenient to refer to TFs with variable binding sites as low-specificity factors, it must be remembered that the variability of binding sites is likely to be the result of these TFs being present at higher concentration than high-specificity factorsas opposed to having intrinsically weaker sequence dependence of TF/DNA interaction.
6 This provides a possible explanation for the case of FNR (see Fig. 4), where we remarked that the chemical potential we deduced from the search may be too low. This could happen if the experiments that generated the collection of FNR binding sites did not include the physiological condition of maximal FNR activation.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
L. D. Ward and H. J. Bussemaker
Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences
Bioinformatics,
July 1, 2008;
24(13):
i165 - i171.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Djordjevic and R. Bundschuh
Formation of the Open Complex by Bacterial RNA Polymerase--A Quantitative Model
Biophys. J.,
June 1, 2008;
94(11):
4233 - 4248.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. van Oeffelen, P. Cornelis, W. Van Delm, F. De Ridder, B. De Moor, and Y. Moreau
Detecting cis-regulatory binding sites for cooperatively binding proteins
Nucleic Acids Res.,
May 1, 2008;
36(8):
e46 - e46.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Chen, L. Guo, Z. Fan, and T. Jiang
W-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data
Bioinformatics,
May 1, 2008;
24(9):
1121 - 1128.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Weindl, P. Hanus, Z. Dawy, J. Zech, J. Hagenauer, and J. C. Mueller
Modeling DNA-binding of Escherichia coli {sigma}70 exhibits a characteristic energy landscape around strong promoters
Nucleic Acids Res.,
November 29, 2007;
35(20):
7003 - 7010.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. X. Jin, H. O'Geen, S. Iyengar, R. Green, and P. J. Farnham
Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches
Genome Res.,
June 1, 2007;
17(6):
807 - 817.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. G. Roider, A. Kanhere, T. Manke, and M. Vingron
Predicting transcription factor affinities to DNA from a biophysical model
Bioinformatics,
January 15, 2007;
23(2):
134 - 141.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. J. Maerkl and S. R. Quake
A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors
Science,
January 12, 2007;
315(5809):
233 - 237.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Radde, J. Gebert, and C. V. Forst
Systematic component selection for gene-network refinement
Bioinformatics,
November 1, 2006;
22(21):
2674 - 2680.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. GuhaThakurta
Computational identification of transcriptional regulatory elements in DNA sequence
Nucleic Acids Res.,
July 19, 2006;
34(12):
3585 - 3598.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. Bhardwaj, R. E. Langlois, G. Zhao, and H. Lu
Kernel-based machine learning protocol for predicting DNA-binding proteins
Nucleic Acids Res.,
November 10, 2005;
33(20):
6486 - 6493.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Mustonen and M. Lassig
Evolutionary population genetics of promoters: Predicting binding sites and functional phylogenies
PNAS,
November 1, 2005;
102(44):
15936 - 15941.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. Ben-Gal, A. Shani, A. Gohr, J. Grau, S. Arviv, A. Shmilovici, S. Posch, and I. Grosse
Identification of transcription factor binding sites with variable-order Bayesian networks
Bioinformatics,
June 1, 2005;
21(11):
2657 - 2666.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. A. O'Flanagan, G. Paillard, R. Lavery, and A. M. Sengupta
Non-additivity in protein-DNA binding
Bioinformatics,
May 15, 2005;
21(10):
2254 - 2263.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. I. Gershenzon, G. D. Stormo, and I. P. Ioshikhes
Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites
Nucleic Acids Res.,
April 22, 2005;
33(7):
2290 - 2301.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Sabatti, L. Rohlin, K. Lange, and J. C. Liao
Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites
Bioinformatics,
April 1, 2005;
21(7):
922 - 931.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Z. Liu, F. Mao, J.-t. Guo, B. Yan, P. Wang, Y. Qu, and Y. Xu
Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential
Nucleic Acids Res.,
January 26, 2005;
33(2):
546 - 558.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Das, N. Banerjee, and M. Q. Zhang
Interacting models of cooperative gene regulation
PNAS,
November 16, 2004;
101(46):
16234 - 16239.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. T. Brown and C. G. Callan Jr.
Evolutionary comparisons suggest many novel cAMP response protein binding sites in Escherichia coli
PNAS,
February 24, 2004;
101(8):
2404 - 2409.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|