Published online before print
April 10, 2006, 10.1101/gr.4866006
Genome Res. 16:656-668, 2006
©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Methods
Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression
Mathieu Blanchette1,5,
Alain R. Bataille2,
Xiaoyu Chen1,
Christian Poitras2,
Josée Laganière3,
Céline Lefèbvre3,
Geneviève Deblois3,
Vincent Giguère3,
Vincent Ferretti4,
Dominique Bergeron2,
Benoit Coulombe2 and
François Robert2,5
1 McGill Centre for Bioinformatics, Montreal, Quebec, Canada, H3A 2B4;
2 Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada H2W 1R7;
3 Molecular Oncology Group Department of Medicine, Oncology and Biochemistry, McGill University, Montreal, Quebec, Canada H3A 1A1;
4 McGill University and Genome Quebec Innovation Center, Montreal, Quebec, Canada H3A 1A4
The identification of regulatory regions is one of the most important and challenging problems toward the functional annotation of the human genome. In higher eukaryotes, transcription-factor (TF) binding sites are often organized in clusters called cis-regulatory modules (CRM). While the prediction of individual TF-binding sites is a notoriously difficult problem, CRM prediction has proven to be somewhat more reliable. Starting from a set of predicted binding sites for more than 200 TF families documented in Transfac, we describe an algorithm relying on the principle that CRMs generally contain several phylogenetically conserved binding sites for a few different TFs. The method allows the prediction of more than 118,000 CRMs within the human genome. A subset of these is shown to be bound in vivo by TFs using ChIP-chip. Their analysis reveals, among other things, that CRM density varies widely across the genome, with CRM-rich regions often being located near genes encoding transcription factors involved in development. Predicted CRMs show a surprising enrichment near the 3' end of genes and in regions far from genes. We document the tendency for certain TFs to bind modules located in specific regions with respect to their target genes and identify TFs likely to be involved in tissue-specific regulation. The set of predicted CRMs, which is made available as a public database called PReMod (http://genomequebec.mcgill.ca/PReMod), will help analyze regulatory mechanisms in specific biological systems.
5 Corresponding authors.
E-mail blanchem{at}mcb.mcgill.ca; fax (514) 398-3387.
E-mail francois.Robert{at}ircm.qc.ca; fax (514) 987-5743.
[Supplemental material is available online at www.genome.org.]
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4866006
6 Since PhastCons was designed to detect any type of region under selective pressure, many of its noncoding predictions are likely to have other nonregulatory functions.
7 Note that the formula for moduleScore is actually an approximation of the true P-value, for the following reasons: (1) Since competition for space between different tags is not modeled, the computed P-value of the total score of the 2nd, 3rd, 4th, and 5th tags are slightly conservative; (2) since the totalScores are discrete variables (but with a very large number of possible values), the approximation with a continuous uniform distribution introduces a small error; (3) since the moduleScore is obtained by selecting the best of five P-values, a multiple hypothesis testing correction should be applied. However, since we are mostly interested in the ranking of modules, this correction would make no difference.
8 Only a small number of maximal lengths could be tried, as the calculation of the TotalScore P-values are computationally expensive and depend on that length.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
J. Hu, H. Hu, and X. Li
MOPAT: a graph-based method to predict recurrent cis-regulatory modules from known motifs
Nucleic Acids Res.,
August 1, 2008;
36(13):
4488 - 4497.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Raskin, M. Pinchev, C. Arad, F. Lejbkowicz, A. Tamir, H. S. Rennert, G. Rennert, and S. B. Gruber
FGFR2 Is a Breast Cancer Susceptibility Gene in Jewish and Arab Israeli Populations
Cancer Epidemiol. Biomarkers Prev.,
May 1, 2008;
17(5):
1060 - 1065.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Sinha, A. S. Adler, Y. Field, H. Y. Chang, and E. Segal
Systematic functional characterization of cis-regulatory motifs in human core promoters
Genome Res.,
March 1, 2008;
18(3):
477 - 488.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Vandenbon, Y. Miyamoto, N. Takimoto, T. Kusakabe, and K. Nakai
Markov Chain-based Promoter Structure Modeling for Tissue-specific Expression Pattern Prediction
DNA Res,
February 7, 2008;
(2008)
dsm034v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. N. Singh, L.-S. Wang, and S. Hannenhalli
TREMOR a tool for retrieving transcriptional modules by incorporating motif covariance
Nucleic Acids Res.,
December 18, 2007;
35(21):
7360 - 7371.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. Miller, K. Rosenbloom, R. C. Hardison, M. Hou, J. Taylor, B. Raney, R. Burhans, D. C. King, R. Baertsch, D. Blankenberg, et al.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser
Genome Res.,
December 1, 2007;
17(12):
1797 - 1808.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Schwegmann, R. Guler, A. J. Cutler, B. Arendse, W. G. C. Horsnell, A. Flemming, A. H. Kottmann, G. Ryan, W. Hide, M. Leitges, et al.
Protein kinase C {delta} is essential for optimal macrophage-mediated phagosomal containment of Listeria monocytogenes
PNAS,
October 9, 2007;
104(41):
16251 - 16256.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. R. Davies, L.-W. Chang, D. Patra, X. Xing, K. Posey, J. Hecht, G. D. Stormo, and L. J. Sandell
Computational identification and functional validation of regulatory motifs in cartilage-expressed genes
Genome Res.,
October 1, 2007;
17(10):
1438 - 1447.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Y. Yamamoto, H. Ichida, T. Abe, Y. Suzuki, S. Sugano, and J. Obokata
Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis
Nucleic Acids Res.,
September 25, 2007;
35(18):
6219 - 6226.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
W. S.W. Wong and R. Nielsen
Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models
Bioinformatics,
August 15, 2007;
23(16):
2031 - 2037.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. C. Hollenhorst, A. A. Shah, C. Hopkins, and B. J. Graves
Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family
Genes & Dev.,
August 1, 2007;
21(15):
1882 - 1894.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. J. Gentles, M. J. Wakefield, O. Kohany, W. Gu, M. A. Batzer, D. D. Pollock, and J. Jurka
Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica
Genome Res.,
July 1, 2007;
17(7):
992 - 1004.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. H. Cheung, K. K. B. Barthel, Y. L. Kwan, and X. Liu
Identifying pattern-defined regulatory islands in mammalian genomes
PNAS,
June 12, 2007;
104(24):
10116 - 10121.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. X. Jin, H. O'Geen, S. Iyengar, R. Green, and P. J. Farnham
Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches
Genome Res.,
June 1, 2007;
17(6):
807 - 817.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Long and J. M. Miano
Remote Control of Gene Expression
J. Biol. Chem.,
June 1, 2007;
282(22):
15941 - 15945.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. J. Donaldson and B. Gottgens
CoMoDis: composite motif discovery in mammalian genomes
Nucleic Acids Res.,
January 12, 2007;
35(1):
e1 - e1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Visel, S. Minovitsky, I. Dubchak, and L. A. Pennacchio
VISTA Enhancer Browser--a database of tissue-specific human enhancers
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D88 - D92.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Ferretti, C. Poitras, D. Bergeron, B. Coulombe, F. Robert, and M. Blanchette
PReMod: a database of genome-wide mammalian cis-regulatory module predictions
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D122 - D126.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Wang, Y. Zhang, Y. Cheng, Y. Zhou, D. C. King, J. Taylor, F. Chiaromonte, J. Kasturi, H. Petrykowska, B. Gibb, et al.
Experimental validation of predicted mammalian erythroid cis-regulatory modules
Genome Res.,
December 1, 2006;
16(12):
1480 - 1492.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones
Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques
Genome Res.,
December 1, 2006;
16(12):
1455 - 1464.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. J.M. Walhout
Unraveling transcription regulatory networks by protein-DNA and protein-protein interaction mapping
Genome Res.,
December 1, 2006;
16(12):
1445 - 1454.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. A. Sharov, D. B. Dudekula, and M. S. H. Ko
CisView: A Browser and Database of cis-regulatory Modules Predicted in the Mouse Genome
DNA Res,
January 1, 2006;
13(3):
123 - 134.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|