|
|
|
|
Genome Res. 16:536-541, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00 OPEN ACCESS ARTICLE Resource An isochore map of human chromosomesLaboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
Isochores are large DNA segments (>>300 kb on average) that are characterized by an internal variation in GC well below the full variation seen in the mammalian genome. Precisely defining in terms of size and composition as well as mapping the isochores on human chromosomes have, however, remained largely unsolved problems. Here we used a very simple approach to segment the human chromosomes de novo, based on assessments of GC and its variation within and between adjacent regions. We obtain a complete coverage of the human genome (neglecting the remaining gaps) by 3200 isochores, which may be visualized as the ultimate chromosomal bands. Isochores visibly belong to five families characterized by different GC levels, as expected from previous investigations. Since we previously showed that isochores are tightly linked to basic biological properties such as gene density, replication timing, and recombination, the new level of detail provided by the isochore map will help the understanding of genome structure, function, and evolution.
Well before genome sequencing, ultracentrifugation in Cs2SO4 density gradients in the presence of sequence-specific ligands (e.g., Ag+) was shown to lead to a high resolution of mammalian DNAs according to base composition (Corneo et al. 1968
A quarter of a century after the original studies that had defined the approximate sizes and compositions of isochores as well as the compositions and relative amounts of isochore families, it was reported that isochores could not be identified in the draft sequence of the human genome (Lander et al. 2001
Almost 50 years ago, calf thymus DNA, the standard eukaryotic DNA, was shown to be remarkably more heterogeneous in base composition than bacterial DNAs (Meselson et al. 1957
Isochores were shown to be tightly linked to basic biological properties, such as gene density, replication timing, and recombination (see Bernardi 2004
Scanning of GC profiles If one scans the GC profiles (Fig. 1, see gatefold) of human chromosomes from any starting point using a fixed window of 100 kb, one finds a mosaic of sequences ranging from 200 kb to several megabases that are characterized by different GC levels and by a remarkable compositional homogeneity. The critical 100-kb window size used in this work was chosen because plots of average standard deviations of GC against window size show the existence of a plateau that begins around 100 kb and extends to over 500 kb. This plateau has long been known (Macaya et al. 1976
Window sizes shorter than 100 kb showed standard deviations that were much higher than the plateaus, especially in GC-rich isochores, because of the contribution of different specific sequences (interspersed repeats, CpG islands, exons, introns, etc.). Indeed, this fact prevents the definition of isochores and isochore borders at sizes lower than 100 kb. When applied to randomly chosen fragments from the human genome, the same procedure yields much higher standard deviations, which reach a plateau around 4.5%5% GC (Fig. 2, top curve; see also Macaya et al. 1976
Using our approach, we found that 85% of the genome consists of isochores with an average standard deviation equal to
Isochore borders Isochore borders were identified on the basis of marked compositional differences that ranged from 2.7% to 6.3% GC for isochores belonging to different families, the average value being 3.9% GC (Fig. 3). As already mentioned, isochore borders were localized to within 100 kb and, indeed, a more precise definition is practically not possible.
H3 isochores were always flanked by GC-poorer isochores, and L1 isochores were always flanked by GC-richer isochores, as expected. These were also the predominant situations found in the cases of H2 or H1 isochores and of L2 isochores, respectively. However, these families also exhibited "transition isochores" in several cases, where one flanking isochore was higher, the other lower (see Supplemental Fig. S1). Very large GC differences at borders (such as L1/H3 borders) were rare, thus leading to the formation of blocks of isochores from closer families (e.g., L1/L2). These blocks correspond essentially to chromosomal bands at a 850-band resolution as defined at the cytogenetic level. In some cases, single isochores correspond to chromosomal bands at this resolution (see Table 1 for examples).
Number and size of isochores
Isochore families The isochore pattern is, expectedly, different from chromosome to chromosome (see Fig. 1). However, when isochores are pooled in bins of 1% GC (Fig. 5), isochore families stand out. This is evident for isochore families L1, L2, and H1, but also visible for the H2 and H3 families, which are present in small amounts in the genome. The relative amounts of DNA in isochore families were 19%, 37%, 31%, 11%, and 3% for L1, L2, H1, H2, and H3 isochores, respectively, again in fair agreement with previous results (Macaya et al. 1976
The present findings, while confirming the isochore features previously established, push our knowledge farther, by quantifying the size, GC levels, standard deviations, and coordinates of isochores on the human genome map. Moreover, these findings also indicate that isochores may be visualized as the ultimate banding patterns of the chromosomes in warm-blooded vertebrates, and that they are arranged in blocks, corresponding to chromosomal bands at the standard 850 band resolution.
It seems appropriate here to briefly summarize two major points of interest concerning isochores. From a practical viewpoint, isochores allowed us to gain an insight into the genome organization of warm-blooded vertebrates (and of other organisms) (Bernardi 2004
From a more general point of view, the present results raise one major question concerning the origin and maintenance of GC-rich isochores, which are a common, characteristic property of the genomes of warm-blooded vertebrates. We now know that the GC-rich (and gene-rich) isochores are the result of GC increases in the corresponding gene-rich regions of cold-blooded vertebrates, which are much less GC rich (see Bernardi 2004
Isochore mapping The entire chromosomal sequences of the finished human genome assembly (UCSC release hg17) (Kent et al. 2002 ek et al. 2002 es et al. 2004 100 kb (see Results). This observation has a simple biological interpretation: The higher variances observed for smaller windows correspond to well-known intra-isochore, gene-scale mosaicisms that are created, for example, by individual exons, introns, CpG islands, 3'-untranslated regions, interspersed repeats, and scaffold/matrix attachment regions. The array of GC levels of the 100-kb windows in each chromosome was scanned for jumps that were detectable on the basis of mean GC differences, and/or of differences in fluctuation levels with respect to subsegments ( 100 kb). As a guideline, we focused on jumps of at least 1%2% GC between adjacent candidate segments, although in rare cases smaller jumps were justified because of differences in variability between the two segments (see Fig. 3). The results obtained via this simple procedure, which involved only properties of bulk DNA and no annotated features, third codon positions, and so on, demonstrate the existence of a complete covering of the human genome sequence by isochores that fulfill the properties initially established by ultracentrifugation experiments.
Remarks on statistical inference
We thank Gabriel Macaya and Giacomo Bernardi for helpful discussions, and David Haussler for comments.
1 Corresponding author.
E-mail bernardi{at}szn.it; fax 39 081 2455807. [Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4910606 Freely available online through the Genome Research Open Access option.
Alvarez-Valin F., Lamolle G., Bernardi G. 2002. GC3 and mutation biases in the human genome. Gene 300: 161168.[CrossRef][Medline] Bernardi G. 1965. Chromatography of nucleic acids on hydroxyapatite. Nature 206: 779783.[CrossRef][Medline] Bernardi G. 1995. The human genome: Organization and evolutionary history. Annu. Rev. Genet. 29: 445476.[CrossRef][Medline] Bernardi G. 2001. Misunderstandings about isochores. Gene 276: 313.[CrossRef][Medline] Bernardi G. In Structural and evolutionary genomics. Natural selection in genome evolution. . 2004. Elsevier, Amsterdam. Bernardi G. and Bernardi G. 1986. Compositional constraints and genome evolution. J. Mol. Evol. 24: 111.[CrossRef][Medline] Bernardi G., Olofsson B., Filipski J., Zerial M., Salinas J., Cuny G., Meunier-Rotival M., Rodier F. 1985. The mosaic genome of warm-blooded vertebrates. Science 228: 953958. Britten R.J. and Kohne D.E. 1968. Repeated sequences in DNA. Science 161: 529540. Clay O. and Bernardi G. 2001a. The isochores in human chromosomes 21 and 22. Biochem. Biophys. Res. Commun. 285: 855856.[CrossRef][Medline] Clay O. and Bernardi G. 2001b. Compositional heterogeneity within and among isochores in mammalian genomes. II. Some general comments. Gene 276: 2531.[CrossRef][Medline] Clay O. and Bernardi G. 2005. How not to search for isochores: A reply to Cohen et al. Mol. Biol. Evol. 22: 23152317. Cohen N.T., Dagan L., Graur D. 2005. GC composition of the human genome: In search of isochores. Mol. Biol. Evol. 22: 12601272. Corneo G., Ginelli E., Soave C., Bernardi G. 1968. Isolation and characterization of mouse and guinea pig satellite DNAs. Biochemistry 7: 43734379.[CrossRef][Medline] Cruvellier S., Jabbari K., Clay O., Bernardi G. 2004. Compositional gene landscapes in vertebrates. Genome Res. 14: 886892. Cuny G., Soriano P., Macaya G., Bernardi G. 1981. The major components of the mouse and human genomes: Preparation, basic properties and compositional heterogeneity. Eur. J. Biochem. 111: 227233. Eyre-Walker A. and Hurst L.D. 2001. The evolution of isochores. Nat. Rev. Genet. 2: 549555.[CrossRef][Medline] (in press).Federico C., Scavo C., Cantarella C.D., Motta S., Saccone S., Bernardi G. 2006. Gene-rich and gene-poor chromosomal regions have different locations in the interphase nuclei of cold-blooded vertebrates. Chromosoma. Filipski J., Thiery J.P., Bernardi G. 1973. An analysis of the bovine genome by Cs2SO4Ag+ density gradient centrifugation. J. Mol. Biol. 80: 177197.[CrossRef][Medline] Galtier N., Piganeau G., Mouchiroud D., Duret L. 2002. GC-content evolution in mammalian genomes: The biased gene conversion hypothesis. Genetics 159: 907911. Gojobori T., Li W.H., Graur D. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol. 18: 360369.[CrossRef][Medline] Häring D. and Kypr J. 2001. No isochores in the human chromosomes 21 and 22? Biochem. Biophys. Res. Commun. 280: 567573.[CrossRef][Medline] International Human Genome Sequencing Consortium. . 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931945.[CrossRef][Medline] Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D.W.J. 2002. The human genome browser at UCSC. Genome Res. 12: 9961006. Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W.E.S.et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline] Li W. 2002. Are isochore sequences homogeneous? Gene 300: 129139.[CrossRef][Medline] Li W., Bernaola-Galván P., Carpena P., Oliver J.L. 2003. Isochores merit the prefix iso. Comput. Biol. Chem. 27: 510.[CrossRef][Medline] Macaya G., Thiery J.P., Bernardi G. 1976. An approach to the organization of eukaryotic genomes at a macromolecular level. J. Mol. Biol. 108: 237254.[CrossRef][Medline] Melodelima C., Gueguen L., Piau D., Gautier C. 2005. Prediction of human isochores using a hidden Markov model. JOBIM 2005: 427434. Meselson M., Stahl F.W., Vinograd J. 1957. Equilibrium sedimentation of macromolecules in density gradients. Proc. Natl. Acad. Sci. 43: 581588. Mouchiroud D., DOnofrio G., Aïssani B., Macaya G., Gautier C., Bernardi G. 1991. The distribution of genes in the human genome. Gene 100: 181187.[CrossRef][Medline] Nekrutenko A. and Li W.H. 2001. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10: 19861995. Nishio Y., Nakamura Y., Kawarabayasi Y., Usuda Y., Kimura E., Sugimoto S., Matsui K., Yamagishi A., Kikuchi H., Ikeo K.et al. 2003. Comparative complete genome sequence analysis of the amino acid replacements responsible for the thermostability of Corynebacterium efficiens.. Genome Res. 13: 15721579. Oliver J.L., Carpena P., Román-Roldán R., Mata-Balaguer T., Mejias-Romero A., Hackenberg M., Bernaola-Galván P. 2002. Isochore chromosome maps of the human genome. Gene 300: 117127.[CrossRef][Medline] Oliver J.L., Carpena P., Hackenberg M., Bernaola-Galván P. 2004. IsoFinder: Computational prediction of isochores in genome sequences. Nucleic Acids Res. 32: W287W292. Pa Pavli Saccone S., Federico C., Andreozzi L., DAntoni S., Bernardi G. 2002. Localization of the gene-richest and the gene-poorest isochores in the interphase nuclei of mammals and birds. Gene 300: 169178.[CrossRef][Medline] Santini S. and Bernardi G. 2005. Organization and base composition of tilapia Hox genes: Implications for the evolution of Hox clusters in fish. Gene 346: 5161.[CrossRef][Medline] Silverman B.W. In Density estimation for statistics and data analysis. . 1986. Chapman and Hall/CRC, Boca Raton, FL. Smith N.G.C. and Eyre-Walker A. 2001. Synonymous codon bias is not caused by mutation bias in G+C rich genes in human. Mol. Biol. Evol. 18: 982986. Thiery J.P., Macaya G., Bernardi G. 1976. An analysis of eukaryotic genomes by density gradient centrifugation. J. Mol. Biol. 108: 219235.[CrossRef][Medline] Molecular organization and function of the human genome. In (ed. J.J. Yunis).Yunis J.J., Tsai M.Y., Willey A.M. In Molecular structure of human chromosomes . 1977. Academic Press, New York. Zerial M., Salinas J., Filipski J., Bernardi G. 1986. Gene distribution and nucleotide sequence organization in the human genome. Eur. J. Biochem. 160: 479485.[Medline] Zoubak S., Clay O., Bernardi G. 1996. The gene distribution of the human genome. Gene 174: 95102.[CrossRef][Medline]
Received November 11, 2005; accepted in revised format December 28, 2005. This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||