|
|
|
|
Published online before print
April 10, 2006, 10.1101/gr.4842106 Genome Res. 16:669-677, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00
Resource A comprehensive catalog of human KRAB-associated zinc finger genes: Insights into the evolutionary history of a large family of transcriptional repressors1 Genome Biology 2 Microbial Systems Divisions, Biosciences, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets.
The human genome contains 30,000 genes (Lander et al. 2001
At least one-third of mammalian ZNF proteins include an effector motif called the Krüppel-associated box, or KRAB, which serves to recruit histone deacetylase complexes to regions surrounding the DNA-binding sites (Bellefroid et al. 1991 To derive a complete catalog of the KRAB-ZNF gene family, we computationally analyzed and manually curated all segments of the human genome containing KRAB and ZNF domains. These efforts revealed 423 protein-coding genes with alternative transcripts that predict the existence of at least 742 distinct proteins, as well as 341 pseudogene sequences. Analyses of this gene set and comparisons to predicted genes in other mammalian genomes permitted a genome-wide assessment of the mechanisms through which this gene family has evolved. Results of this study can be accessed from a public Web site (http://znf.llnl.gov/) that will be updated as additional data becomes available.
Assembling the human KRAB-ZNF catalog and Web-based resource KRAB-ZNF genes exist as simple modular structures with one or more KRAB-effector domains and a tandem array of zinc-finger motifs encoded within distinct 5' and 3' exons, respectively (Shannon and Stubbs 1998 Based on RNA evidence and HMM-identified motifs, we generated models for alternate transcripts arising from each locus and identified overlaps with publicly available known and predicted genes. Three hundred thirty-four HMM-based models overlapped known loci; manual annotation produced 669 transcript models for these genes, including 495 models that we extended or corrected and 157 public models that were not modified. In addition to the known genes, we identified 89 KRAB-ZNF loci capable of encoding full-length proteins that are not described in public databases (see Methods). Altogether we annotated 423 loci encoding proteins with both effector (KRAB and/or SCAN) and zinc-finger domains (Table 1; Supplemental Table S1), for simplicity we will hereafter refer to the collection as KRAB-ZNF genes.
In addition to KRAB-ZNF loci, we also identified 254 genes with noncanonical structures, e.g., encoding ZNF-only, KRAB-only, or SCAN-only proteins as their only potential protein product. Loci of this type were annotated as genes only when supported by mRNA evidence; these genes may indeed correspond to functional family members. However, in the following discussions we will focus primarily on the 423 loci capable of encoding proteins with both effector (SCAN, KRAB, or both) and zinc-finger domains. To publicly share the data arising from this analysis, we created a Web-based resource (http://znf.llnl.gov/) that provides access to full descriptions and sequences of all curated KRAB-ZNF gene models and pseudogene loci. Interfaces for searching and browsing the database, including an added track within the UCSC Genome Browser (Kent et al. 2002
Alternate splicing and pseudogenes
We also identified 227 gene fragments and 39 full-length pseudogenes, based on evidence of multiple stop codons, frameshifts, and lack of proper splice junctions. Sequence comparisons confirmed that most pseudogenes arose from neighboring loci by partial-gene duplication events (data not shown), although gene remnants may also be left behind after lineage-specific deletions (Hamilton et al. 2003
Gene clustering and evolution
To identify evolutionary relationships, we constructed a phylogenetic tree based on KRAB-Aencoding nucleotide sequences (Fig. 1; Supplemental Table S2). As suggested by previous studies focused on subsets of genes, evolutionary relatedness is typically associated with physical proximity in this family. However, the complete family tree also shows that unrelated KRAB-ZNF genes are physically intermixed at several clustered sites. Therefore, although tandem in situ duplication events have represented the major mechanism of new gene creation in the KRAB-ZNF family, distributed duplication and, possibly, post-duplication rearrangement events have also played a prominent role. It is uncertain at present what effect gene conversion may have had on the evolution of these genes.
Most genes containing the KRAB-A motif also include the KRAB-B modulator, or less common modulators KRAB-b, KRAB-BL, or KRAB-C (Table 1). These associations appear within separate clades in the KRAB-Abased tree, indicating that these distinct motifs arose and were expanded within specific families (Fig. 1). Unlike genes with specific types of KRAB modulators, the SCAN-containing genes do not group together in one evolutionary clade (Fig. 1, red circles). This pattern could be explained if the SCAN-KRAB-ZNF combination is ancient, with a history of frequent loss of one or the other effector domains during the expansion of the gene family. This kind of history would be consistent with the comingling of related genes with different combinations of SCAN and KRAB effector motifs we observed in several clusters (Table 2). However, it is also possible that the SCAN-KRAB combination arose more than once, as has recently been proposed (Looman et al. 2002
Phylogenetic analyses also highlighted relatedness between clusters and among cluster members and isolated loci distributed at distant chromosomal sites. For example, genes from the large 19p12 cluster, which is known to be primate specific (Bellefroid et al. 1995
Relationships in certain groups show that some distributed duplicates may subsequently give rise to tandem copies, suggesting one way that new lineage-specific clusters may have been seeded over evolutionary time. For example, seven KRAB-ZNF genes and one pseudogene sequence distributed in HSA4, 8, 11, and 12 show >96% nucleotide sequence identity over >70-kb duplications (ZNF705A paralogs) (highlighted in gray in Fig. 1; Supplemental Table S2). The high degree of similarity between these large segments indicates that most of the duplication events occurred
Paralogs, orthologs, and recent primate duplications
Included in this set of 197 loci are KRAB-ZNF genes located in the HSA19p12 cluster, which dates back to early primate evolution and underwent a significant expansion
KRAB-ZNF gene expression, cluster position, and evolutionary history
We have identified and curated 423 human loci capable of encoding complete KRAB-ZNF proteins, including 89 novel loci; many of the genes are alternatively spliced to encode predicted protein isoforms with potentially very different functional properties. For example, inclusion of a KRAB-B domain has been shown to enhance repressor activity of KRAB-A proteins (Vissing et al. 1995
As noted in previous reports, most of the 423 human KRAB-ZNF genes reside in large familial clusters (Fig. 1, Table 2; Rousseau-Merck et al. 1992 Based on comparisons between the curated human gene set and gene models from draft mouse, dog, and chimpanzee genomes, we present a preliminary classification of human KRAB-ZNF genes according to their degree of conservation or lineage specificity. Although information regarding specific orthologous relationships will change as nonhuman draft sequences are improved, the overall picture of gene repertoire diversity can be clearly discerned. Only 103 of the 423 human KRAB-ZNF genes can be grouped in unambiguous 1:1 orthologous relationships in primate, canine, and rodent lineages; by contrast, at least 136 loci, or nearly one-third of the total human KRAB-ZNF gene set, are primate specific, having arisen since the emergence of Old and New World monkeys. Since regulatory functions are known for only a handful of KRAB-ZNF proteins, the cumulative impact of lineage-specific gain, loss, and divergence of these genes on primate biology remains a matter of conjecture. However, their sheer numbers, their wide range of tissue-specific expression, and their dynamic evolutionary history predict that KRAB-ZNF genes have played a significant role in shaping both primate-specific and deeply conserved traits. A more complete understanding of the functions of the KRAB-ZNF family will be essential for deciphering pathways of vertebrate evolutionary diversity and for building accurate models of gene regulation and its role in human disease.
Genome searches and initial data analysis Human KRAB-A, KRAB-B, KRAB-b, KRAB-C, and SCAN protein sequences were collected from RefSeq (the National Center for Biotechnology Information mRNA reference sequence collection) (Pruitt et al. 2000
In addition, DNA sequences from exons immediately preceding known KRAB-A exons were used to search hg17 chromosomal sequences by using BLAST (Altschul et al. 1990 The human HMM matrices were also used to search the chimp (panTro1), mouse (mm6), and dog (canFam1) six-frame genome translations, and putative loci were generated based on proximity and orientation as above. Crude protein sequences for these nonhuman loci were generated by extending from motif coordinates N- and C-terminally until a translational stop signal was encountered, eliminating overlapping sequences from adjacent motifs, and joining all collected sequences for each locus.
Gene annotation and database curation Adjacent genes were considered "clustered" if the intergenic sequence separating two KRAB-ZNF genes was <200 kb, even in cases where unrelated genes were found between the ZNF loci (a situation only rarely encountered). This distance cutoff was selected based on a distribution of intergenic distances between KRAB-ZNF genes in the annotation database (for additional details, see legend of Supplemental Fig. S1). Manual inspection of the clustering confirmed that the 200-kb criterion resulted in coclustering of all major groups of neighboring, related genes. Only two pairs of paralogs that could be considered to form familial clusters were omitted by the 200-kb cutoff; these genesLLNL1035 and ZNF705B, and ZNF705C and LLNL1103derived from unusually large duplicons were counted as clusters in the final analysis.
Comparative genomic analyses
Evolutionary analysis
Selection of microarray probe sets and expression clustering
By use of Cluster v. 2.11 (Eisen et al. 1998
We thank Ivan Ovcharenko for advice on programming and genome searching methods, and David Goodstein and Astrid Terry at the Joint Genome Institute for advice on Apollo and gene annotation. We also thank Colleen Elso, Jutta Kollet, Jason Raymond, and Alice Yamada for critical reviews of the manuscript and Web site. This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory (LLNL) under contract no. W-7405-Eng-48. The project (04-ERD-084) was funded by the Laboratory Directed Research and Development Program at LLNL.
3 Present address: Louisiana State University, Baton Rouge, LA.
E-mail stubbs5{at}llnl.gov; fax (925) 422-2099. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4842106
Abrink M., Ortiz J.A., Mark C., Sanchez C., Looman C., Hellman L., Chambon P., Losson R. 2001. Conserved interaction between distinct Kruppel-associated box domains and the transcriptional intermediary factor 1 Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403410.[CrossRef][Medline] Ayyanathan K., Lechner M.S., Bell P., Maul G.G., Schultz D.C., Yamada Y., Tanaka K., Torigoe K., Rauscher III F.J. 2003. Regulated recruitment of HP1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: A mammalian cell culture model of gene variegation. Genes & Dev. 17: 18551869. Bailey J.A., Yavor A.M., Massa H.F., Trask B.J., Eichler E.E. 2001. Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 11: 10051017. Bellefroid E.J., Poncelet D.A., Lecocq P.J., Revelant O., Martial J.A. 1991. The evolutionarily conserved Kruppel-associated box domain defines a subfamily of eukaryotic multifingered proteins. Proc. Natl. Acad. Sci. 88: 36083612. Bellefroid E.J., Marine J.C., Ried T., Lecocq P.J., Riviere M., Amemiya C., Poncelet D.A., Coulie P.G., de Jong P., Szpirer C.et al. 1993. Clustered organization of homologous KRAB zinc-finger genes with enhanced expression in human T lymphoid cells. EMBO J. 12: 13631374.[Medline] Bellefroid E.J., Marine J.C., Matera A.G., Bourguignon C., Desai T., Healy K.C., Bray-Ward P., Martial J.A., Ihle J.N., Ward D.C. 1995. Emergence of the ZNF91 Kruppel-associated box-containing zinc finger gene family in the last common ancestor of anthropoidea. Proc. Natl. Acad. Sci. 92: 1075710761. Berg J.M. 1997. Letting your fingers do the walking. Nat. Biotechnol. 15: 323.[CrossRef][Medline] Cannizzaro L.A., Aronson M.M., Thiesen H.J. 1993. Human zinc finger gene ZNF23 (Kox16) maps to a zinc finger gene cluster on chromosome 16q22, and ZNF32 (Kox30) to chromosome region 10q23-q24. Hum. Genet. 91: 383385.[Medline] Choo Y. and Klug A. 1994. Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. Proc. Natl. Acad. Sci. 91: 1116811172. Chung H.R., Schafer U., Jackle H., Bohm S. 2002. Genomic expansion and clustering of ZAD-containing C2H2 zinc-finger genes in. Drosophila. EMBO Rep. 3: 11581162. Collins T., Stone J.R., Williams A.J. 2001. All in the family: The BTB/POZ, KRAB, and SCAN domains. Mol. Cell. Biol. 21: 36093615. Eichler E.E., Hoffman S.M., Adamson A.A., Gordon L.A., McCready P., Lamerdin J.E., Mohrenweiser H.W. 1998. Complex Eisen M.B., Spellman P.T., Brown P.O., Botstein D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95: 1486314868. Friedman J.R., Fredericks W.J., Jensen D.E., Speicher D.W., Huang X.P., Neilson E.G., Rauscher III F.J. 1996. KAP-1, a novel corepressor for the highly conserved KRAB repression domain. Genes & Dev. 10: 20672078. Gebelein B. and Urrutia R. 2001. Sequence-specific transcriptional repression by KS1, a multiple-zinc-finger-Kruppel-associated box protein. Mol. Cell. Biol. 21: 928939. Goodman M., Porter C.A., Czelusniak J., Page S.L., Schneider H., Shoshani J., Gunnell G., Groves C.P. 1998. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9: 585598.[CrossRef][Medline] Hamilton A.T., Huntley S., Kim J., Branscomb E., Stubbs L. 2003. Lineage-specific expansion of KRAB zinc-finger transcription factor genes: Implications for the evolution of vertebrate regulatory networks. Cold Spring Harb. Symp. Quant. Biol. 68: 131140.[CrossRef][Medline] (this issue).Hamilton A.T., Huntley S., Tran-Gyamfi M., Baggott D.M., Gordon L., Stubbs L. 2006. Evolutionary expansion and divergence in the ZNF91 subfamily of primate-specific zinc finger genes. Genome Res. Hoffmann A., Ciani E., Boeckardt J., Holsboer F., Journot L., Spengler D. 2003. Transcriptional activities of the zinc finger protein Zac are differentially controlled by DNA binding. Mol. Cell. Biol. 23: 9881003. Tandem gene family expansion and genomic diversity. In (ed. M.D. Adams). (in press)Huntley S., Hamilton A., Kim J., Branscomb E., Stubbs L. In Comparative genomics: A guide to the analysis of eukaryotic genomes . Humana Press, New York. Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D. 2002. The human genome browser at UCSC. Genome Res. 12: 9961006. Kim C.A. and Berg J.M. 1995. Serine at position 2 in the DNA recognition helix of a Cys2-His2 zinc finger peptide is not, in general, responsible for base recognition. J. Mol. Biol. 252: 15.[CrossRef][Medline] Knochel W., Poting A., Koster M., el Baradi T., Nietfeld W., Bouwmeester T., Pieler T. 1989. Evolutionary conserved modules associated with zinc fingers in Xenopus laevis.. Proc. Natl. Acad. Sci. 86: 60976100. Krebs C.J., Larkins L.K., Khan S.M., Robins D.M. 2005. Expansion and diversification of KRAB zinc-finger genes within a cluster including regulator of sex-limitation 1 and 2. Genomics 85: 752761.[CrossRef][Medline] Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W.et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline] Li W.H. In Molecular evolution. . 1997. Sinauer Associates, Sunderland, MA. Looman C., Abrink M., Mark C., Hellman L. 2002. KRAB zinc finger proteins: An analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution. Mol. Biol. Evol. 19: 21182130. Looman C., Hellman L., Abrink M. 2004. A novel Kruppel-associated box identified in a panel of mammalian zinc finger proteins. Mamm. Genome 15: 3540.[CrossRef][Medline] Lynch M. and Force A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics 154: 459473. Margolin J.F., Friedman J.R., Meyer W.K., Vissing H., Thiesen H.J., Rauscher III F.J. 1994. Kruppel-associated boxes are potent transcriptional repression domains. Proc. Natl. Acad. Sci. 91: 45094513. Mark C., Abrink M., Hellman L. 1999. Comparative analysis of KRAB zinc finger proteins in rodents and man: Evidence for several evolutionarily distinct subfamilies of KRAB zinc finger genes. DNA Cell Biol. 18: 381396.[CrossRef][Medline] Messina D.N., Glasscock J., Gish W., Lovett M. 2004. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 14: 20412047. Miller J.C. and Pabo C.O. 2001. Rearrangement of side-chains in a Zif268 mutant highlights the complexities of zinc fingerDNA recognition. J. Mol. Biol. 313: 309315.[CrossRef][Medline] Mombaerts P. 1999. Odorant receptor genes in humans. Curr. Opin. Genet. Dev. 9: 315320.[CrossRef][Medline] Moore M., Klug A., Choo Y. 2001. Improved DNA binding specificity from polyzinc finger peptides by using strings of two-finger units. Proc. Natl. Acad. Sci. 98: 14371441. Oh H.J., Li Y., Lau Y.F. 2005. Sry associates with the heterochromatin protein 1 complex by interacting with a KRAB domain protein. Biol. Reprod. 72: 407415. Ohlsson R., Renkawitz R., Lobanenkov V. 2001. CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet. 17: 520527.[CrossRef][Medline] Pengue G. and Lania L. 1996. Kruppel-associated box-mediated repression of RNA polymerase II promoters is influenced by the arrangement of basal promoter elements. Proc. Natl. Acad. Sci. 93: 10151020. Pruitt K.D., Katz K.S., Sicotte H., Maglott D.R. 2000. Introducing RefSeq and LocusLink: Curated human genome resources at the NCBI. Trends Genet. 16: 4447.[CrossRef][Medline] Se-Al: Sequence Alignment Editor. http://iubio.bio.indiana.edu/soft/iubionew/molbio/dna/analysis/Pist/main.htmlRambaut A. 1996. Rousseau-Merck M.F., Tunnacliffe A., Berger R., Ponder B.A., Thiesen H.J. 1992. A cluster of expressed zinc finger protein genes in the pericentromeric region of human chromosome 10. Genomics 13: 845848.[CrossRef][Medline] Saitou N. and Nei M. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 406425.[Abstract] Sander T.L. and Morris J.F. 2002. Characterization of the SCAN box encoding RAZ1 gene: Analysis of cDNA transcripts, expression, and cellular localization. Gene 296: 5364.[CrossRef][Medline] Sander T.L., Stringer K.F., Maki J.L., Szauter P., Stone J.R., Collins T. 2003. The SCAN domain defines a large family of zinc finger transcription factors. Gene 310: 2938.[CrossRef][Medline] Schmidt D. and Durrett R. 2004. Adaptive evolution drives the diversification of zinc-finger binding domains. Mol. Biol. Evol. 21: 23262339. Schoenherr C.J. and Anderson D.J. 1995. The neuron-restrictive silencer factor (NRSF): A coordinate repressor of multiple neuron-specific genes. Science 267: 13601363. Shannon M. and Stubbs L. 1998. Analysis of homologous XRCC1-linked zinc-finger gene families in human and mouse: Evidence for orthologous genes. Genomics 49: 112121.[CrossRef][Medline] Shannon M., Hamilton A.T., Gordon L., Branscomb E., Stubbs L. 2003. Differential expansion of zinc-finger transcription factor loci in homologous human and mouse gene clusters. Genome Res. 13: 10971110. Strausberg R.L., Feingold E.A., Klausner R.D., Collins F.S. 1999. The mammalian gene collection. Science 286: 455457. Su A.I., Wiltshire T., Batalov S., Lapp H., Ching K., Block D., Zhang J., Soden R., Hayakawa M., Kreiman G.et al. 2004. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. 101: 60626067. Swofford D.L. In PAUP: Phylogenetic analysis using parsimony. . 2002. Sinauer Associates, Sunderland, MA. Tanaka K., Tsumaki N., Kozak C.A., Matsumoto Y., Nakatani F., Iwamoto Y., Yamada Y. 2002. A Kruppel-associated box-zinc finger protein, NT2, represses cell-type-specific promoter activity of the Thompson J.D., Gibson T.J., Plewniak F., Jeanmougin F., Higgins D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25: 48764882. Uhrberg M. 2005. The KIR gene family: Life in the fast lane of evolution. Eur. J. Immunol. 35: 1015.[CrossRef][Medline] Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A.et al. 2001. The sequence of the human genome. Science 291: 13041351. Vissing H., Meyer W.K., Aagaard L., Tommerup N., Thiesen H.J. 1995. Repression of transcriptional activity by heterologous KRAB domains present in zinc finger proteins. FEBS Lett. 369: 153157.[CrossRef][Medline] Williams A.J., Blacklow S.C., Collins T. 1999. The zinc finger-associated SCAN box is a conserved oligomerization domain. Mol. Cell. Biol. 19: 85268535. Wu Y., Yu L., Bi G., Luo K., Zhou G., Zhao S. 2003. Identification and characterization of two novel human SCAN domain-containing zinc finger genes ZNF396 and ZNF397. Gene 310: 193201.[CrossRef][Medline] Zhang H.B., Liu D.P., Liang C.C. 2002. The control of expression of the Zheng L., Pan H., Li S., Flesken-Nikitin A., Chen P.L., Boyer T.G., Lee W.H. 2000. Sequence-specific transcriptional corepressor function for BRCA1 through a novel zinc finger protein, ZBRK1. Mol. Cell 6: 757768.[CrossRef][Medline]
Received October 21, 2005; accepted in revised format March 6, 2006. This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||