|
|
|
|
Published online before print
May 12, 2004, 10.1101/gr.1475304 Genome Res. 14:1199-1205, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00
Resources Large-Scale Integration of Human Genetic and Physical Maps1 Polymorphism Research Laboratory, Department of Psychiatry, University of California at San Diego, La Jolla, California 92093-0603, USA 2 Department of Biology, University of California at San Diego, La Jolla, California 92093-0603, USA
Genetic maps are used routinely in family-based linkage studies to identify the rough location of genes that influence human traits and diseases. Unlike physical maps, genetic maps are based on the amount of recombination occurring between adjacent loci rather than the actual number of bases separating them. Genetic maps are constructed by statistically characterizing the number of crossovers observed in parental meioses leading to the transmission of alleles to their offspring. Considerations such as the number of meioses observed, the heterozygosity and physical distance between the loci studied, and the statistical methods used can impact the construction and reliability of a genetic map. As is well known, poorly constructed genetic maps can have adverse effects on linkage mapping studies. With the availability of sequence-based maps, as well as genetic maps generated by different researchers (such as those generated by the Marshfield and deCODE groups), one can investigate the compatibility and properties of different maps. We have integrated information from the most current human genome sequence data (UCSC genome assembly Human July 2003) as well as 8399 microsatellite markers used in the Marshfield and deCODE maps to reconcile the these maps. Our efforts resulted in updated sex-specific genetic maps.
The use of genetic maps for identifying genes that influence human traits and diseases has a long and illustrious history in the medical and biological sciences. Indeed, as pointed out in a recent historical perspective on human biological research by Lander and Weinberg (2000
The reconciliation of locus positions dictated by genetic and physical maps is not trivial, unfortunately. Large-scale sequencing and the ordering of loci from multiple sequence reads can result in errors of all sorts, most simply caused by sequencing errors and missing sequence or "gaps" between the sequences (see, e.g., Lander et al. 2001
As a result of problems associated with the construction of human physical and genetic maps, differences with respect to the positions of loci exist not only between physical and genetic maps, but also among different physical and genetic maps. Because genetic maps play a crucial role in pedigree-based meiotic (or "linkage") mapping gene discovery strategies, it is important to consider their reconcilability with physical maps. In addition, it has been well documented that misspecification of genetic maps can have negative effects on linkage analyses (see, e.g., Daw et al. 2000
Some researchers have considered the quality and reconcilability of genetic and physical maps (see http://cedar.genetics.soton.ac.uk/public_html/LDB2000), but most of these efforts have focused on a single chromosome or genomic region (see, e.g., Tapper et al. 2001 Most importantly, we have developed a comprehensive and integrated sex-specific genetic map that could be used in, for example, multipoint linkage analyses or help focus further sequencing and marker ordering efforts. Our map includes all of the Marshfield and deCODE markers and is based primarily on the genetic positions of loci in deCODE map because of the large number of meioses studied to develop that map. The high-resolution deCODE map is based on 1257 meiotic events in two-generation pedigrees and is the most accurate genetic map available to date. However, deCODE only includes 5136 microsatellites, whereas the older, less accurate Marshfield map, which was calculated based on only 188 meioses in three generation pedigrees, includes 8325 microsatellite markers. Our map uses up-to date physical sequence information to interpolate the positions of markers not included in the deCODE map, but whose position can be estimated relative to markers on that map. A method for querying markers in this map using a simple Excel spreadsheet query macro is available as Supplemental material from the authors' Web site (http://elcapitan.ucsd.edu/hyper/).
Chromosome Positions Of the 7737 Marshfield markers with available physical positions (UCSC July 2003), 45 were found to have chromosomal assignments that did not match the assignment provided by the physical map (Table 1). Of these 45 markers, 28 are "Utah" markers, which were typed on only a few meioses (four of the eight CEPH families used to form the Marshfield genetic maps). Only one of these 45 markers was considered in the construction of the deCODE map. In addition, only three markers are used in the routine linkage mapping panels provided by Marshfield Clinic. Three markers (PLA2, GC, and FB7F11) have been mapped to the chromosome assigned by the physical map via radiation hybrids, suggesting that the initial chromosomal positions provided by the Marshfield map were likely wrong. Finally, some of these markers have chromosomal assignments based on the physical map that have changed as the physical map has been updated (Table 1, cf. columns 911), suggesting a lack of confidence in their location as dictated by available sequence information. For comparison purposes, we also provide positional information obtained from the Celera database for these 45 markers. For a subset of these markers, it is quite possible that the sequence-based position is simply wrong as well.
Marker Order To assess marker order, we considered the number of markers implicated in a "block" of markers where there is misspecification. To clarify, consider a string of markers whose correct order is 123456789101112, but whose assumed order is 123974568101112: the block of markers 974568 is "order misspecified." This type of order misspecification captures entire blocks, such that if, say, multipoint linkage methods were used to extract identity-by-descent allele-sharing information among relative pairs, the calculations would be wrong even though some of the markers in the block are in the correct order (i.e., markers 456 are not out of order but do sit in a block that contains order misspecification). Table 2 describes the percentages of markers examined from the Marshfield and deCODE genetic maps exhibiting block order problems, the average and maximum block length on a chromosome-by-chromosome basis, and the percentage of markers that are misplaced in their position by >2 cM and >5 cM, respectively. These parameters were calculated by ordering the markers according to their physical position and determining all markers that are out of order compared with their Marshfield and deCODE map locations, respectively. For these markers, "true" map positions were interpolated by using the physical and genetic map position of the markers preceding and following a block with misspecified markers, analogous to the way described in the Methods section for the comprehensive genetic map. This resulted in especially large values for Chromosomes 16 and 17 for the Marshfield map and for Chromosomes 5 and 7 for the deCODE map, respectively. These large values are partly caused by a few markers with relatively large disagreement between their physical and genetic map positions, which, in accordance with our definition of blocks, will result in a large maximum block length. Furthermore, because we used markers that were adjacent to the blocks as a basis for the estimation of the "true" genetic position of the markers in these blocks, a large block length negatively influences the accuracy of the estimated marker positions as well as the number of markers misplaced at a certain distance.
In general, markers whose order is misspecified on the genetic maps that reside in a small region of the genome are likely to arise from difficulties in resolving recombination events between markers with a finite number of meioses. Although it is clear that the deCODE map contains fewer order problems than the earlier Marshfield map, this is more than likely because Kong et al. (2002
Interlocus Distances
Comprehensive Genetic Map From our efforts at examining the reconcilability and comparability of the Marshfield and deCODE genetic maps along with the latest versions of the available human genome sequence, we developed a comprehensive and integrated map (see Methods). An extract is shown in Table 3. The comprehensive genetic map includes a total of 8399 markers with information on the physical position (UCSC assembly July 2003), and Marshfield and deCODE recombination rates, as well as interpolated deCODE values for 2838 markers. A summary is shown in Table 4. This map is available for downloading with macros to facilitate marker searches from http://elcapitan.ucsd.edu/hyper/.
The study of recombination rates, as well as the exploitation of recombination rates in human gene-mapping studies, have received a great deal of attention, especially since the publication of the draft of human genome sequence and the announcement of the human Haplotype Map Initiative (see, e.g., Dawson et al. 2002
Map misspecification can have serious negative consequences on gene-mapping studies. Incorrect physical maps can complicate late stages of positional cloning efforts, whereas incorrect genetic maps can complicate initial linkage analyses. Our analysis involved all available markers in the Marshfield (Broman et al. 1998 One easy way potentially to verify that the genetic map used for a particular linkage analysis study is reliable is to assess the linkages between the markers empirically using the family data at hand (i.e., merely assess linkages among the markers using programs such as CRI-MAP or ASPEX rather than linkages between the markers and a potential trait-influencing locus). One can then compare the assumed map with that derived from the data. Although published genetic maps such as the Marshfield map with 188 meioses in three generation pedigrees and the deCODE map with 1257 meioses in two-generation families might be based on more meiotic events than those available in a given linkage study, such an exercise is well worth the effort as it can reveal other items in addition to incorrect map positions such as genotyping errors.
In pursuing empirical studies of map reliability with a given data set, it may be worthwhile to consider the use of sex-specific recombination rates. It is well known that males and females differ greatly in recombination rates (Broman et al. 1998 Discrepancies between genetic and physical maps will likely diminish as more and more polymorphic loci are mapped. In addition, further genetic linkage analyses of DNA markers (e.g., those used in genetic mapping studies) will provide greater confidence in marker order and interlocus distances. The genetic map that resulted from our own efforts encompasses physical/sequence information to interpolate positions of markers that are not available on either of the Marshfield or deCODE maps and as such is likely to be more comprehensive and accurate than either of them alone. Also, as the Haplotype Map Initiative unfolds, greater insight into recombination hot spots will emerge that can be further reconciled with different sorts of maps.
One final note on the issue of genetic and physical maps concerns their ubiquity. It is assumed that genetic maps capture recombination rates that are somewhat universal in that chromosomes are organized and recombined in roughly the same way among individuals. This assumed ubiquity is, in fact, what motivates researchers pursuing linkage analyses to use publicly available genetic maps in the first place. In addition, the potential ubiquity of the "block" structure of the human genome, which is motivating the human Haplotype Map Initiative, also assumes that, for example, genomic recombination hot spots, chromosomal sites for heavy gene conversion, and mutation hot spots are universal (see, e.g., Phillips et al. 2003
Integration of Genetic and Physical Map Information To construct an "integrated map" including the newest physical positions for microsatellite markers used in the Marshfield (Broman et al. 1998
Constructing a New "Comprehensive Genetic Map"
This work was supported in part by the following NIH grants: the NHLBI Family Blood Pressure Program (FBPP; HL64777-01); the NHLBI hypertension SCOR program (HL54998); the NIH Pharmacogenetics Network (HL69758-01); and The Consortium on the Genetics of Schizophrenia (COGS; MH06557-01A1). The authors thank Tiffany Greenwood, John Kelsoe, and Daniel O'Connor for critical discussions of this work. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1475304. Article published online before print in May 2004.
3 Corresponding author. [Supplemental material is available online at www.genome.org and http://elcapitan.ucsd.edu/hyper/.]
Abecasis, G.R., Cherny, S.S., Cookson, W.O., and Cardon, L.R. 2002. MERLINRapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30: 97101.[CrossRef][Medline] Almasy, L. and Blangero, J. 1998. Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62: 11981211.[CrossRef][Medline] Broman, K.W., Murray, J.C., Sheffield, V.C., White, R.L., and Weber, J.L. 1998. Comprehensive human genetic maps: Individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63: 861869.[CrossRef][Medline] Cullen, M., Perfetto, S.P., Klitz, W., Nelson, G., and Carrington, M. 2002. High-resolution patterns of meiotic recombination across the human major histocompatibility complex. Am. J. Hum. Genet. 71: 759776.[CrossRef][Medline] Daw, E.W., Thompson, E.A., and Wijsman, E.M. 2000. Bias in multipoint linkage analysis arising from map misspecification. Genet. Epidemiol. 19: 366380.[CrossRef][Medline] Dawson, E., Abecasis, G.R., Bumpstead, S., Chen, Y., Hunt, S., Beare, D.M., Pabial, J., Dibling, T., Tinsley, E., Kirby, S., et al. 2002. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418: 544548.[CrossRef][Medline] DeWan, A.T., Parrado, A.R., Matise, T.C., and Leal, S.M. 2002. The map problem: A comparison of genetic and sequence-based physical maps. Am. J. Hum. Genet. 70: 101107.[CrossRef][Medline]
Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., et al. 2002. The structure of haplotype blocks in the human genome. Science 296: 22252229. Goring, H.H. and Terwilliger, J.D. 2000. Linkage analysis in the presence of errors III: Marker loci and their map as nuisance parameters. Am. J. Hum. Genet. 66: 12981309.[CrossRef][Medline] Hackett, C.A. and Broadfoot, L.B. 2003. Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity 90: 3338.[CrossRef][Medline] Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31: 241247.[CrossRef][Medline] Kruglyak, L., Daly, M.J., Reeve-Daly, M.P., and Lander, E.S. 1996. Parametric and nonparametric linkage analysis: A unified multipoint approach. Am. J. Hum. Genet. 58: 13471363.[Medline]
Lander, E.S. and Weinberg, R.A. 2000. Genomics: Journey to the center of biology. Science 287: 17771782. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.[CrossRef][Medline]
Lynn, A., Koehler, K.E., Judis, L., Chan, E.R., Cherry, J.P., Schwartz, S., Seftel, A., Hunt, P.A., and Hassold, T.J. 2002. Covariation of synaptonemal complex length and mammalian meiotic exchange rates. Science 296: 22222225. Matise, T.C., Porter, C.J., Buyske, S., Cuttichia, A.J., Sulman, E.P., and White, P.S. 2002. Systematic evaluation of map quality: Human chromosome 22. Am. J. Hum. Genet. 70: 13981410.[CrossRef][Medline] Ott, J. 1999. Analysis of human genetic linkage. The Johns Hopkins University Press, Baltimore, MD. Phillips, M.S., Lawrence, R., Sachidanandam, R., Morris, A.P., Balding, D.J., Donaldson, M.A., Studebaker, J.F., Ankener, W.M., Alfisi, S.V., Kuo, F.S., et al. 2003. Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat. Genet. 33: 382387.[CrossRef][Medline] Schork, N.J. and Greenwood, T.A. 2004. Inherent bias toward the null hypothesis in conventional multipoint nonparametric linkage analysis. Am. J. Hum. Genet. 74: 306316.[CrossRef][Medline]
Tapper, W.J., Morton, N.E., Dunham, I., Ke, X., and Collins, A. 2001. A sequence-based integrated map of chromosome 22. Genome Res. 11: 12901295.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. 2001. The sequence of the human genome. Science 291: 13041351. Yu, A., Zhao, C., Fan, Y., Jang, W., Mungall, A.J., Deloukas, P., Olsen, A., Doggett, N.A., Ghebranious, N., Broman, K.W., et al. 2001. Comparison of human genetic and sequence-based physical maps. Nature 409: 951953.[CrossRef][Medline]
http://cedar.genetics.soton.ac.uk/public_html/LDB2000; The New Genetic Location Database. http://elcapitan.ucsd.edu/hyper,/; NIH hypertension PPG home page. http://genome.ucsc.edu/; UCSC Genome Bioinformatics Home. http://myscience.appliedbiosystems.com/; Celera Discovery System. http://research.marshfieldclinic.org/genetics/; Welcome to the Center for Medical Genetics. http://www.agre.org; Autism Genetic Resource Exchange.
Received April 28, 2003;
accepted in revised format January 14, 2004.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||