|
|
|
|
Published online before print
October 31, 2007, 10.1101/gr.6554007 Genome Res. 17:1787-1796, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00 OPEN ACCESS ARTICLE
Resource Sequence-based estimation of minisatellite and microsatellite repeat variability1 FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts 02138, USA; 2 Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; 3 Centre of Microbial and Plant Genetics, Department of Molecular and Microbial Systems, Katholieke Universiteit Leuven, Faculty of Applied Bioscience and Engineering, B-3001 Leuven (Heverlee), Belgium
Variable tandem repeats are frequently used for genetic mapping, genotyping, and forensics studies. Moreover, variation in some repeats underlies rapidly evolving traits or certain diseases. However, mutation rates vary greatly from repeat to repeat, and as a consequence, not all tandem repeats are suitable genetic markers or interesting unstable genetic modules. We developed a model, "SERV," that predicts the variability of a broad range of tandem repeats in a wide range of organisms. The nonlinear model uses three basic characteristics of the repeat (number of repeated units, unit length, and purity) to produce a numeric "VARscore" that correlates with repeat variability. SERV was experimentally validated using a large set of different artificial repeats located in the Saccharomyces cerevisiae URA3 gene. Further in silico analysis shows that SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. Using SERV, we demonstrate significant enrichment of variable repeats within human genes involved in transcriptional regulation, chromatin remodeling, morphogenesis, and neurogenesis. Moreover, SERV allows identification of known and candidate genes involved in repeat-based diseases. In addition, we demonstrate the use of SERV for the selection and comparison of suitable variable repeats for genotyping and forensic purposes. Our analysis indicates that tandem repeats used for genotyping should have a VARscore between 1 and 3. SERV is publicly available from http://hulsweb1.cgr.harvard.edu/SERV/.
Virtually all prokaryotic and eukaryotic genomes contain significant portions of tandem repeats, that is, stretches of DNA that are repeated head to tail. Tandem repeats are further classified into "microsatellites," which have repeat units containing up to 9 nucleotides (nt), and "minisatellites," with longer repeated units. The close proximity of multiple (nearly) identical DNA sequences causes frequent recombination or slippage events, generating new alleles that differ in the number of repeat units. Their instability makes tandem repeats ideally suited for fingerprinting, genotyping, and forensic analyses.
Because of their variability and their sequence simplicity, repeats have traditionally been considered as nonfunctional parasitic "junk" DNA (Orgel and Crick 1980
Apart from these negative consequences of repeat variability, hypermutable repeats may also have a beneficial role. Variable repeats located in certain key genes makes these genes hypervariable, allowing swift adaptive evolution of certain traits while maintaining low mutation rates in the rest of the genome (Rando and Verstrepen 2007
Whereas most tandem repeats are unstable compared with nonrepeated DNA, the mutation rates vary widely from repeat to repeat. Most repeat mutation rates are about 10- to 10,000-fold higher than those of nonrepeated regions and lie between 10–3 and 10–6 per cellular generation (Verstrepen et al. 2005
Repeats appear to be evenly distributed across the genome, and repeats located near meiotic hot spots are not noticeably more polymorphic than those located in recombination cold spots (Richard and Dujon 2006
Several algorithms are available to detect tandem repeats, including ETANDEM (Rice et al. 2000
While these simple models are quite capable of accurately predicting the variability of repeats closely resembling the limited training data set, their performance has not been validated for other repeats or other species, making them of only limited use for genome-wide analyses (O'Dushlaine and Shields 2006 Because of the large variation in repeat mutation rates, results obtained from repeat-based genotyping and forensics studies largely depend on the exact repeat(s) used. The lack of any standards makes it impossible to compare studies and sometimes even leads to flawed conclusions. Here, we describe the development of a general nonlinear model capable of predicting repeat variability for all types of tandem repeats (microsatellites and minisatellites) in a wide range of organisms spanning the major kingdoms of life. We demonstrate that the model outperforms existing models and that it can be used to identify and characterize potentially interesting (variable) repeats for genotyping, forensics, or functional studies.
Genome-wide detection of variable tandem repeats Existing models to predict repeat variability were based on small, specific data sets and used simple (linear) algorithms. As a result, while these models are quite capable of predicting variability for the limited data sets they were trained on, they are not suited as a general method to predict the variability of a broad range of repeats in a broad range of organisms. Therefore, we decided to use more complex models and large, unbiased training and validation data sets that represent the full spectrum of naturally occurring tandem repeats. To obtain such expansive data sets, we first developed a method to detect and compare orthologous tandem repeats in large (whole-genome) sequences. Repeat data sets were assembled for yeast (Saccharomyces cerevisiae), primates (Homo sapiens), insects (Drosophila melanogaster), plants (Arabidopsis thaliana), and bacteria (Neisseria meningitides and Mycobacterium tuberculosis). For each data set, repeats were detected and compared between several closely related strains or species and subsequently categorized as variable (if the number of repeat units differed between the compared strains/species) or nonvariable (if the number of repeats was constant in all strains or species; see Methods for details).
As anticipated, this procedure generated large data sets containing an unbiased collection of naturally occurring repeats. For example, the S. cerevisiae data set comprises 2743 conserved repeat loci, of which 242 were categorized as variable between three S. cerevisiae strains. The data indicate just how different tandem repeats can be. The unit length ranges from 2 to 81 nt, with some repeats having as many as 80 units. Moreover, the repeats found by this procedure seem to agree very well with manually curated smaller data sets. For example, our M. tuberculosis data set comprised 20 out of 21 repeats found by Le Flèche et al. (2002)
Generation of a predictive model for repeat variability The final model (SERV; http://hulsweb1.cgr.harvard.edu/SERV/) uses three basic characteristics of a tandem repeat (number of units, unit length, and purity) as input variables. On the basis of these variables, SERV generates a continuous output (referred to as "VARscore"). The VARscore serves as a continuous estimation of repeat variability, with larger VARscores correlating with higher predicted repeat variability. Visualization of the model (Supplemental Fig. S1) shows the intuitive relation between the input variables and the predicted variability (VARscore) of the corresponding repeat. The single most important factor determining a repeats predicted variability is the number of units, with higher repeat units leading to increased predicted variability. Increased repeat purity or unit length also leads to higher predicted variability, although the effect is smaller. These intuitive conclusions are further supported by our experimental analyses (Fig. 1).
SERV accurately predicts repeat variability in various genomes To evaluate the performance of the model, we compared our tandem repeat variability predictions to the few other existing methods, using five whole-genome data sets obtained from different groups of organisms (human/primate, insects, plants, and two bacterial species).
Since the models developed by Wren et al. (2000)
The method developed by Näslund et al. (2005)
The model developed by Denoeud et al. (2003) Overall, these results show that SERV systematically outperforms existing methods on a wide spectrum of species. Moreover, instead of classifying repeats as variable or nonvariable, the model produces a continuous output (VARscore), allowing a complete ranking of all repeats in a data set according to their predicted variability. It is important to note that most existing models were not intended to predict repeat variability over a broad spectrum of repeat categories. Hence, our study does not discredit their usefulness for the goals for which they were developed. In fact, when SERV is used to predict the variability of the limited sets of repeats for which these other models were trained, the respective specific model always (slightly) outperform SERV, although the difference is not statistically significant (Supplemental Table S1).
VARscore correlates with experimental repeat mutation rates In total, we constructed 30 repeats that cover the parameter space of natural repeats found in the yeast genome (unit lengths of 2, 10, and 20 nt; number of units between 2 and 50; and purity between 62.5% and 100%) (Fig. 1). For each different repeat, we performed at least three independent fluctuation analyses to estimate the mutation rates. The results indicate that the three parameters used in our model (i.e., number of repeat units, unit length, and repeat purity) indeed influence mutation rates. Regression shows an exponential relation between these parameters and mutation rates (Fig. 1C). Furthermore, when all VARscores for these repeats are plotted against their mutation rates, it becomes clear that VARscores indeed correlate well with mutation rates, especially when taking experimental errors and the diversity of the set of artificial repeats into account (R2 = 0.66, P = 4 x 10–8; Fig. 1D). In summary, the VARscore of a repeat correlates with its mutation rate, confirming that VARscores can be used to rank different repeats according to their predicted variability. We now explore a few different applications of this analysis.
VARscore as a benchmarking tool for variable repeats used as markers in fingerprinting
We decided to use SERV to check the predicted variability of both marker sets. As shown in Figure 2, there are striking differences in the VARscores for both sets of markers. Indeed, the scores for Leclerc et al.s markers (Leclerc et al. 2004
This analysis again demonstrated the correlation between a repeats VARscore and its instability. Hence, VARscores can be used as a criterion to select repeat loci suitable for genotyping and fingerprinting. On the basis of our analyses, we would recommend using repeats with a VARscore of at least 1, but lower than 2 (for divergent strains/species) or 3 (for closely related strains or individuals).
Human genes involved in transcriptional regulation and morphogenesis are enriched for variable repeats
The authors of previous studies have already mapped the occurrence of coding repeats in the human genome (Denoeud et al. 2003 We analyzed gene ontology for four groups of genes: (1) all human genes, (2) genes with tandem repeats, (3) top 25% ranked genes according to VARscore, and (4) top 15% genes according to VARscore. Results for functional categories that give significant enrichment in the top 15% of VARscores are reported in Table 2. The table shows a correlation between increasing VARscores and the proportion of genes belonging to every significant functional class. To validate these predictions, we used human EST (expressed sequence tags) data to investigate whether the repeats in these human genes indeed vary among transcripts isolated from different individuals (see Methods for details). The variability of the repeats in these EST sequences confirms the predictions made by SERV. As shown in the last column of Table 2, gene categories enriched for genes containing repeats with high VARscores also show significant enrichment in variable ESTs.
Two main Gene Ontology (GO) classes that show enrichment for potentially variable repeats, stand out: transcriptional regulation and development. Highly polymorphic tandem repeats in genes involved in transcription regulation (such as transcription factors) could lead to modified transcription activities and thus swift evolution (Caburet et al. 2004
Other development classes also emerge from our data set, including genes involved in neurogenesis and brain development. Genes containing intragenic trinucleotides repeats have indeed been linked to these phenomena (Karlin and Burge 1996
VARscore allows identification of genes involved in repeat-based diseases
This prompted us to investigate whether SERV allowed us to identify other candidate genes that might be linked to genetic diseases. We therefore compiled a table of all repeat-containing human genes and ranked the list according to the VARscore of the repeats (Supplemental Table S3). Some of the highest-ranking genes are already known to contain polymorphic repeats, for example, the cartilage-specific proteoglycan gene AGC1. However, for many genes in the list, repeat polymorphisms and/or their possible phenotypic effect have not yet been described. One group of such candidate genes are the MUC (mucin) genes. Although they are currently not considered to underlie repeat-based diseases, size variation in MUC genes has been associated with progression of immunoglobulin A nephropathy (Li et al. 2006 Needless to say, not all genes containing hypervariable coding repeats will lead to disease. Supplemental Table S3 may therefore also allow the identification of specific genes involved in fast evolution of certain traits caused by the high mutation rates in these intragenic repeats.
Our analysis shows that three basic characteristics of a given tandem repeat, namely number of repeated units, unit length, and repeat purity, are major determinants for its (in)stability. While other factors, such as GC content and entropy, may also exert some effect on repeat stability, the influence of the three factors used in our model is very intuitive. First and foremost, repeat variability increases exponentially with increasing number of repeat units. This observation confirms some of the pioneering work of Petes and coworkers, who found an exponential relation between number of units and mutation rates (Sia et al. 1997 The availability of a model to predict repeat variability has several applications, some of which were demonstrated in this paper. Despite the widespread use of variable tandem repeats in genotyping and forensics, results vary widely depending on which set of repeats is chosen. The lack of any standards makes it impossible to compare studies and sometimes even leads to flawed conclusions. Analysis of the VARscore of repeats used in different studies may help to compare and interpret paradoxical results and conclusions. Moreover, SERV also allows researchers looking for new microsatellite markers for genotyping or forensics to estimate if a given repeat would be a suitable marker and is likely to show variation between closely related (but nonidentical) individuals, strains, or species. From our analyses, it seems that only repeats displaying positive VARscores may be suited, with ideal markers showing VARscores above 1 but below 3. Another use of the VARscore is the identification of hypervariable repeats in genomes for functional studies. As it becomes increasingly clear that changes in some repeats may have profound phenotypic consequences, researchers are trying to identify new examples of this phenomenon. The ability to discriminate between repeats with low and high variability may be an important tool to select specific repeats from the large pool of candidates in the genome. Our basic analysis of the human genome demonstrates the usefulness of the VARscore to identify the genes known to be involved in repeat-dependent diseases such as Huntingtons syndrome and ataxia, as well as to compile a list of candidate genes containing hypervariable repeats, which might lead to certain diseases. Not all repeat variation leads to diseases. Instead, variation in repeat number might provide the basis for phenotypic diversity, thus allowing swift evolution of certain traits. While this has only been demonstrated for a limited number of examples, our analysis indicates that repeats may also play a role in humans. Here, repeats are enriched in genes involved in transcription and organismal development, including such key processes as brain development. Is it possible that so-called "junk DNA" underlies the swift evolution of the primate brain?
Data set assembly and analysis of repeat variability To obtain an expansive and unbiased data set, the complete S. cerevisiae nuclear genome (S288C sequence 2006 from the Saccharomyces Genome Database [SGD]; E.L. Hong, R. Balakrishnan, K.R. Christie, M.C. Costanzo, S.S. Dwight, S.R. Engel, D.G. Fisk, J.E. Hirschman, M.S. Livstone, R. Nash, et al.; http://www.yeastgenome.org/) was scanned for tandem repeats using the TRF algorithm (Benson 1999
Model development
All models were trained on a balanced training data set comprising 320 of all naturally occurring repeats in the S. cerevisiae genome (training data set). To select the most relevant repeat characteristics for inclusion in the final model, we applied a forward variable selection procedure using LS-SVMs with an RBF kernel. The selection criterion we used was the AUC performance on the remaining 2423 repeats in the S. cerevisiae genome (validation data set). The model parameters, that is, the regularization parameter
d (d = 3; purity, unit length, and number of units), and corresponding binary class labels yk {–1,+1} (label "+1" in case of variable repeats; "–1" otherwise), model parameters and bias term b, continuous predicted values y(x), and the kernel function using RBF kernel calculated as
Analysis of human coding regions repeats
To identify variable repeats in EST sequences, we used UniGene clusters associated to each of these human transcripts. We then applied the methodology described in O'Dushlaine et al. (2005)
Enrichment of variable repeats in genes that are associated with genetic diseases was calculated using the Genetic Association Database (Becker et al. 2004
Analysis of P. vivax repeats
Experimental validation of model
We thank Gerald Fink, Marcelo Vinces, Chris Brown, Bodo Stern, Sharad Ramanathan, Amir Karger, William Ritchie, An Jansen, Frank De Smet, Xander Warnez, and Kathleen Marchal for their useful comments and suggestions. Research in the lab of K.V. is supported by NIH NIGMS grant 5P50GM068763-04 and the Human Frontier Science Program Young Investigator Award RGY79/2007. N.P. is a Henri Benedictus Fellow of the King Baudouin Foundation and the Belgian American Educational Foundation (BAEF). T.P. acknowledges the financial support of the Harvard College Research Program for undergraduate researchers (HCRP) and the Bauer summer program for undergraduate students.
4 These authors contributed equally to this work.
E-mail kverstrepen{at}cgr.harvard.edu; fax (617) 495-2196. [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6554007
Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L., and Dopazo, J. 2005. BABELOMICS: A suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res. 33: W460–W464. doi: 10.1093/nar/gki456. Baldus, S.E., Engelmann, K., and Hanisch, F.G. 2004. MUC1 and the MUCs: A family of human mucins with impact in cancer biology. Crit. Rev. Clin. Lab. Sci. 41: 189–231.[CrossRef][Medline] Becker, K.G., Barnes, K.C., Bright, T.J., and Wang, S.A. 2004. The genetic association database. Nat. Genet. 36: 431–432.[CrossRef][Medline] Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57: 963–971. Benson, G. 1999. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27: 573–580. Berry, M., Ellingham, R.B., and Corfield, A.P. 2004. Human preocular mucins reflect changes in surface physiology. Br. J. Ophthalmol. 88: 377–383. Bowen, S., Roberts, C., and Wheals, A.E. 2005. Patterns of polymorphism and divergence in stress-related yeast proteins. Yeast 22: 659–668.[CrossRef][Medline] Brachmann, C.B., Davies, A., Cost, G.J., Caputo, E., Li, J.C., Hieter, P., and Boeke, J.D. 1998. Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14: 115–132.[CrossRef][Medline] Butler, J.M. 2006. Genetics and genomics of core short tandem repeat loci used in human identity testing. J. Forensic Sci. 51: 253–265.[CrossRef][Medline] Caburet, S., Vaiman, D., and Veitia, R.A. 2004. A genomic basis for the evolution of vertebrate transcription factors containing amino acid runs. Genetics 167: 1813–1820. Caburet, S., Cocquet, J., Vaiman, D., and Veitia, R.A. 2005. Coding repeats and evolutionary "agility." Bioessays 27: 581–587.[CrossRef][Medline] De Smet, F., De Brabanter, J., Van den Bosch, T., Pochet, N., Amant, F., Van Holsbeke, C., Moerman, P., De Moor, B., Vergote, I., and Timmerman, D. 2006. New models to predict depth of infiltration in endometrial carcinoma based on transvaginal sonography. Ultrasound Obstet. Gynecol. 27: 664–671.[CrossRef][Medline] Denoeud, F. and Vergnaud, G. 2004. Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains: A web-based resource. BMC Bioinformatics 5: 4. doi: 10.1186/1471-2105-5-4.[CrossRef][Medline] Denoeud, F., Vergnaud, G., and Benson, G. 2003. Predicting human minisatellite polymorphism. Genome Res. 13: 856–867. Ellegren, H. 2004. Microsatellites: Simple sequences with complex evolution. Nat. Rev. Genet. 5: 435–445.[CrossRef][Medline] Fidalgo, M., Barrales, R.R., Ibeas, J.I., and Jimenez, J. 2006. Adaptive evolution by mutations in the FLO11 gene. Proc. Natl. Acad. Sci. 103: 11228–11233. Fondon, J.W. and Garner, H.R. 2004. Molecular origins of rapid and continuous morphological evolution. Proc. Natl. Acad. Sci. 101: 18058–18063. Gatchel, J.R. and Zoghbi, H.Y. 2005. Diseases of unstable repeat expansion: Mechanisms and common principles. Nat. Rev. Genet. 6: 743–755.[Medline] Imwong, M., Sudimack, D., Pukrittayakamee, S., Osorio, L., Carlton, J.M., Day, N.P., White, N.J., and Anderson, T.J. 2006. Microsatellite variation, repeat array length, and population history of Plasmodium vivax. Mol. Biol. Evol. 23: 1016–1018. Karlin, S. and Burge, C. 1996. Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc. Natl. Acad. Sci. 93: 1560–1565. Kolpakov, R., Bana, G., and Kucherov, G. 2003. mreps: Efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31: 3672–3678. Le Flèche, P., Fabre, M., Denoeud, F., Koeck, J.L., and Vergnaud, G. 2002. High resolution, on-line identification of strains from the Mycobacterium tuberculosis complex based on tandem repeat typing. BMC Microbiol. 2: 37. doi: 10.1186/1471-2180-2-37.[CrossRef][Medline] Leclerc, M.C., Durand, P., Gauthier, C., Patot, S., Billotte, N., Menegon, M., Severini, C., Ayala, F.J., and Renaud, F. 2004. Meager genetic variability of the human malaria agent Plasmodium vivax. Proc. Natl. Acad. Sci. 101: 14455–14460. Levdansky, E., Romano, J., Shadkchan, Y., Sharon, H., Verstrepen, K.J., Fink, G.R., and Osherov, N. 2007. Coding tandem repeats generate diversity in Aspergillus fumigatus genes. Eukaryot. Cell 6: 1380–1391. Li, G., Zhang, H., Lv, J., Hou, P., and Wang, H. 2006. Tandem repeats polymorphism of MUC20 is an independent factor for the progression of immunoglobulin A nephropathy. Am. J. Nephrol. 26: 43–49.[CrossRef][Medline] Lopes, J., Ribeyre, C., and Nicolas, A. 2006. Complex minisatellite rearrangements generated in the total or partial absence of Rad27/hFEN1 activity occur in a single generation and are Rad51 and Rad52 dependent. Mol. Cell. Biol. 26: 6675–6689. Näslund, K., Saetre, P., von Salome, J., Bergstrom, T.F., Jareborg, N., and Jazin, E. 2005. Genome-wide prediction of human VNTRs. Genomics 85: 24–35.[CrossRef][Medline] O'Dushlaine, C.T. and Shields, D.C. 2006. Tools for the identification of variable and potentially variable tandem repeats. BMC Genomics 7: 290. doi: 10.1186/1471-2164-7-290.[CrossRef][Medline] O'Dushlaine, C.T., Edwards, R.J., Park, S.D., and Shields, D.C. 2005. Tandem repeat copy-number variation in protein-coding regions of human genes. Genome Biol. 6: R69. doi: 10.1186/gb-2005-6-8-r69.[CrossRef][Medline] Orgel, L.E. and Crick, F.H. 1980. Selfish DNA: The ultimate parasite. Nature 284: 604–607.[CrossRef][Medline] Paques, F. and Haber, J.E. 1999. Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev. 63: 349–404. Rando, O.J. and Verstrepen, K.J. 2007. Timescales of genetic and epigenetic inheritance. Cell 128: 655–668.[CrossRef][Medline] Rice, P., Longden, I., and Bleasby, A. 2000. EMBOSS: The European molecular biology open software suite. Trends Genet. 16: 276–277.[CrossRef][Medline] Richard, G.F. and Dujon, B. 2006. Molecular evolution of minisatellites in hemiascomycetous yeasts. Mol. Biol. Evol. 23: 189–202. Russell, B., Suwanarusk, R., and Lek-Uthai, U. 2006. Plasmodium vivax genetic diversity: Microsatellite length matters. Trends Parasitol. 22: 399–401.[CrossRef][Medline] Schroeder, J.A., Masri, A.A., Adriance, M.C., Tessier, J.C., Kotlarczyk, K.L., Thompson, M.C., and Gendler, S.J. 2004. MUC1 overexpression results in mammary gland tumorigenesis and prolonged alveolar differentiation. Oncogene 23: 5739–5747.[CrossRef][Medline] Sherman, F., Fink, G.R., and Hicks, J. 1991. Methods in yeast genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sia, E.A., Kokoska, R.J., Dominska, M., Greenwell, P., and Petes, T.D. 1997. Microsatellite instability in yeast: Dependence on repeat unit size and DNA mismatch repair genes. Mol. Cell. Biol. 17: 2851–2858.[Abstract] Stranger, B.E., Forrest, M.S., Dunning, M., Ingle, C.E., Beazley, C., Thorne, N., Redon, R., Bird, C.P., de Grassi, A., Lee, C., et al. 2007. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315: 848–853. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B.L.R., and Vandewalle, J. 2002. Least squares support vector machines. World Scientific, Singapore. Thomas, E.E. 2005. Short, local duplications in eukaryotic genomes. Curr. Opin. Genet. Dev. 15: 640–644.[CrossRef][Medline] Verstrepen, K.J., Reynolds, T.B., and Fink, G.R. 2004. Origins of variation in the fungal cell surface. Nat. Rev. Microbiol. 2: 533–540.[CrossRef][Medline] Verstrepen, K.J., Jansen, A., Lewitter, F., and Fink, G.R. 2005. Intragenic tandem repeats generate functional variability. Nat. Genet. 37: 986–990.[CrossRef][Medline] Viguera, E., Canceill, D., and Ehrlich, S.D. 2001. Replication slippage involves DNA polymerase pausing and dissociation. EMBO J. 20: 2587–2595.[CrossRef][Medline] Voynov, V., Verstrepen, K.J., Jansen, A., Runner, V.M., Buratowski, S., and Fink, G.R. 2006. Genes with internal repeats require the THO complex for transcription. Proc. Natl. Acad. Sci. 103: 14423–14428. Wierdl, M., Dominska, M., and Petes, T.D. 1997. Microsatellite instability in yeast: Dependence on the length of the microsatellite. Genetics 146: 769–779.[Abstract] Wren, J.D., Forgacs, E., Fondon, J.W., Pertsemlidis, A., Cheng, S.Y., Gallardo, T., Williams, R.S., Shohet, R.V., Minna, J.D., and Garner, H.R. 2000. Repeat polymorphisms within gene regions: Phenotypic and evolutionary implications. Am. J. Hum. Genet. 67: 345–356.[CrossRef][Medline]
Received March 28, 2007; accepted in revised format August 29, 2007. Related Protocol
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||