|
|
|
|
Published online before print
October 5, 2007, 10.1101/gr.6861907 Genome Res. 17:1665-1674, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Methods PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data1 Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 2 Department of Biostatistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 3 Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 4 Center for Applied Genomics, Childrens Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected 27 CNVs for each individual with a median size of 12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.
Copy number variation (CNV) refers to duplication or deletion of a segment of DNA sequence compared to a reference genome assembly. Several large-scale studies have reported the presence of copy number variation in humans, suggesting that CNVs may account for a significant proportion of human phenotypic variation, including disease susceptibility (Feuk et al. 2006 3 kb), permitting kilobase-resolution detection of CNVs.
Several technical advantages in the Illumina Infinium platform make it highly suitable for high-resolution CNV detection. The assay combines specific hybridization of genomic DNA to arrayed probes with allele-specific primer extension and signal amplification, thus achieving a high signal-to-noise ratio in genotype calling (Gunderson et al. 2005
Conventional methods for CNV identification on the Illumina platform involve examination of intensity signals (implemented in the LOH-plus module of the BeadStudio software), which identifies copy number changes by calculating the mode of B Allele Frequency for SNPs in a sliding window along the chromosome. While simple to implement, the sliding window approach has limited and relatively coarse boundary resolution for detected CNVs. A recently described algorithm, QuantiSNP, incorporates the log R Ratio and B Allele Frequency simultaneously in a hidden Markov model (HMM) framework (Colella et al. 2007
Here we present an integrated HMM algorithm, called "PennCNV," to detect CNVs with high resolution using the Illumina Infinium assay. To better reflect the distribution of the intensity data, we constructed accurate models for log R Ratio and B Allele Frequency and developed more realistic models for state transition between different copy number states. In addition, PennCNV incorporates the population allele frequency for each SNP and the distance between adjacent SNPs. Several studies have demonstrated the heritability of CNVs (Locke et al. 2006
The HMM modeling strategy To develop a strategy for detection of CNVs using the Illumina Infinium high-density SNP genotyping platform (Peiffer et al. 2006
To exploit all available information for each SNP to its full potential, PennCNV incorporates several components together into a hidden Markov model (HMM), including the LRR, the BAF, the distance between neighboring SNPs, and the population frequency of the B allele (Fig. 2). Both the LRR and BAF values can be displayed and exported from BeadStudio given that there is an appropriate clustering file with canonical cluster positions for each SNP. The distance between neighboring SNPs determines the probability of having a copy number state change between them. Each SNP has two alleles referred to as the A and B alleles, thus we use the term "population frequency of B allele" to differentiate it from the BAF term that measures allelic intensity ratio. The values for population frequency of B allele for all SNPs are compiled from a large set of individuals with mixed ethnic backgrounds and of normal phenotypes; the likelihood of the copy number genotypes for each copy number state is then determined.
Since the majority of CNVs in offspring should be inherited from either parent (Locke et al. 2006
Comparative analysis of CNV detection on HapMap individuals
Multiple genome-wide studies using array-CGH have shown that chromosome rearrangements tend to occur in genomic regions exhibiting segmental duplication (Iafrate et al. 2004
To assess the performance of PennCNV, we next compared the CNV calls in the Illumina HumanHap550 data with those published in a recent study that examined the global variations of CNVs using HapMap individuals on two different platforms: the Whole Genome TilePath array (WGTP) and the Affymetrix 500K Early Access array (500K_EA) (Redon et al. 2006
The use of family information in CNV calling and validation We believe that the vast majority of CNVs in offspring are inherited from parents (Locke et al. 2006 Firstly, with a strict criterion, we examined whether given CNVs in offspring could be detected in their parents with identical boundaries and found that 41.0% (for WGTP), 88.0% (for Affymetrix 500K_EA), and 47.4% (for Illumina HumanHap550) of the CNV calls are not inherited from parents, that is, CNV-NDPs. This criterion favors the WGTP platform because of substantially fewer probes. We next applied a relaxed evaluation criterion, by requiring that more than half of the base pairs in the offspring CNV must overlap with a parental CNV or vice versa. With this criterion, 27.1%, 20.4%, and 25.2% of offspring CNVs from the WGTP, 500K_EA, and HumanHap550 platforms are CNV-NDPs, respectively. Our comparative analysis indicates that false-positive or false-negative calls are highly prevalent in CNV detection algorithms regardless of platform or evaluation criteria and implicates the importance of using Mendelian inheritance for validation of CNV calling results and for accurate detection of CNV calls.
The PennCNV algorithm applied to data from the Illumina HumanHap550 platform allows detection of a large number of small-scale CNVs (median size: 13 kb, in comparison to 204 kb for the WGTP platform and 81 kb for the 500K_EA platform). To assess the effect of CNV length on calling accuracy, we analyzed a subset of larger CNVs, those containing >10 SNPs (median size: 69 kb), detected by the PennCNV algorithm. We found that 17.7% of offspring CNVs are CNV-NDPs with relaxed criteria, indicating that CNV-NDPs are mainly small-size CNVs. In addition, half of the CNV-NDPs actually fall within immunoglobulin regions (see below), thus We next examined the performance of PennCNV by incorporating family information into the calling algorithm (Table 2). After using family information, the total number of CNV calls is increased for HapMap CEU + YRI offspring (from 624 to 752) and for parents (from 1393 to 1619), indicating more sensitive CNV detection. In addition, 8.4% offspring CNVs are CNV-NDPs using the strict criterion, while 4.3% offspring CNVs are CNV-NDPs using the relaxed criterion, indicating significant reduction of CNV-NDPs after application of family information (Supplemental Fig. 5). Assuming that the vast majority of offspring CNVs are inherited from parents, we can use family-based CNV calls as a reference set to give an indirect estimate of false-positive and sensitivity measures of PennCNV in the absence of family data: 618 out of 624 offspring CNVs detected without the use of family information are also detected by family-based PennCNV, indicating a false-positive rate of 1.0% and a sensitivity of 82.2%. Similarly, using parental CNV data, we estimate that the false-positive rate is 0.2% and the sensitivity is 86%. We caution that these measures are indirect measures of algorithm performance and may be biased by the underlying assumption. Overall our analysis indicates that the use of family information significantly improves the sensitivity of CNV detection and reduces CNV-NDPs. To examine whether our results from the HapMap individuals would apply to other study cohorts, we analyzed 40 additional trios from another ongoing study (AGRE cohort). Similar to the results on the HapMap cohort, the use of family information leads to a 24% increase of the number of CNV calls in offspring, and a 22% increase of CNV calls in parents. After using family information, the fraction of CNV-NDPs decreases from 55% to 10.1% using the strict criterion and decreases from 36% to 5.8% using the relaxed criterion. Comparing CNV calls generated with and without family information by PennCNV, we estimate that the false-positive rate is 0.8% and the sensitivity is 81.1%. Therefore, results from analysis of the AGRE cohort are in concordance with those of the HapMap individuals.
The use of family information in CNV characterization
Family information can be also used to extract more biological knowledge from detected CNVs, such as inferring the parental origin of predicted de novo CNVs. To illustrate this, consider a scenario in which the father and mother genotypes at a SNP marker are AA and AB, respectively, and the PennCNV algorithm identified a de novo deletion in the offspring encompassing this SNP. If the offspring genotype call is BB (or when B Allele Frequency indicates that the actual genotype is B in the presence of the "No Call" genotype), we can infer that the de novo event happened on the paternal chromosome. Similarly, when the father, mother, and offspring genotypes are AA, BB, and AA, respectively, we can infer that the de novo event happened on the maternal chromosome. We illustrate this idea using a de novo CNV (located at 3p26, with 50 SNPs encompassing 97 kb) detected by the family-based PennCNV algorithm in the AGRE cohort (Supplemental Table 2). By manually examining the B Allele Frequency values for 50 SNPs within the CNV region in all family members (13 SNPs are informative for this analysis), we were able to unambiguously determine that the de novo event occurred on the paternal chromosome. In addition, the fact that 13/50 SNPs have Mendelian inconsistency and that all 13 SNPs support the paternal origin of the de novo event provides an additional level of validation for the predicted de novo CNV.
Identification of CNV breakpoints
We developed a HMM-based algorithm for kilobase-resolution detection of CNVs using whole-genome SNP genotyping data. Comparison with previously published CNV calls generated on the same HapMap individuals indicates that our algorithm is capable of identifying fine-scale genetic structure of CNVs with a median size of 12 kb, which is an order of magnitude smaller than previous experimental studies but concurs with several in silico studies (Conrad et al. 2006
The key to the performance of a CNV-calling algorithm is the ability to exploit all sources of available information to their full potential. Compared to the BeadStudio LOH-plus algorithm (Illumina) and the QuantiSNP algorithm (Colella et al. 2007 Although the PennCNV algorithm was developed specifically for data generated on the Illumina Infinium platform, it could be extended to other similar SNP genotyping platforms. There are several unique features of the Illumina data processing procedure, including the use of a group of reference samples (rather than a single reference sample) for SNP-specific signal adjustments and the use of "B Allele Frequency" for allelic intensity ratio calculation. These treatments reduce the variances of signal measures across SNPs, and make different markers more comparable to each other. In addition, these treatments also allow detection and modeling of various CNV events, such as heterosomic chromosome deletions and copy-neutral LOH. Therefore, when allele-specific signal data from a large group of reference samples are available for other genotyping platforms, it is desirable to generate similar measures as the Illumina platform, which can then be directly analyzed by PennCNV.
Our modeling procedure treats each SNP position as equally likely to be within a CNV region. However, different SNPs have different prior probability based on whether they are located within a common CNV region, thus these prior probabilities can be potentially used to improve the prediction algorithm. The prior probabilities for all SNPs can be estimated from a large set of reference samples and can then be used to construct SNP-specific state transition matrices. Alternatively, an improved algorithm can take into account the fact that some chromosomes have more CNVs than others, or that the centromeric and telomeric regions tend to have more CNVs (Nguyen et al. 2006
There are several limitations for interpreting CNV-calling data from Illumina high-density SNP genotyping arrays. These arrays were constructed using HapMap data and contain primarily tag SNPs (Steemers and Gunderson 2007 In conclusion, our study demonstrates the feasibility of genome-wide CNV fine-mapping via high-density SNP genotyping technology. With the accumulation of high-density SNP genotyping data on many more individuals, we are compiling a large set of common CNVs in the human genome across populations, and we plan to fine-map the breakpoints for many of them, especially those predicted to be functionally important. This collection of common CNVs would be essential in completing the map of human genetic variation and would greatly advance our basic understanding of the dynamic human genome.
Inference of log R Ratio (LRR) and B Allele Frequency (BAF) For each SNP, its two alleles are referred to as the A and B alleles using a set of specific naming rules (see http://www.illumina.com/downloads/TopBot_TechNote.pdf). The raw signal intensity values measured for the A and B alleles are then subject to a five-step normalization procedure using the signal intensity of all SNPs (see Illumina white paper at https://icom.illumina.com/icom/software.ilmn). This procedure produces the X and Y values for each SNP, representing the experiment-wide normalized signal intensity on the A and B alleles, respectively. Two additional measures are then calculated for each SNP, where R = X + Y refers to the total signal intensity, and = arctan(Y/X)/( /2) refers to the relative allelic signal intensity ratio.
As a normalized measure of total signal intensity, the log R Ratio (LRR) value for each SNP is then calculated as LRR = log2(Robserved/Rexpected), where Rexpected is computed from linear interpolation of canonical genotype clusters (Peiffer et al. 2006
AA, AB, and BB are the values for three canonical genotype clusters generated from a large set of reference samples. The transformation from to BAF values adjusts for different chemical characteristics of each SNP so that values for different SNPs are more comparable to each other.
Hidden Markov model for CNV detection
Let {ri, bi, zi} denote the log R ratio, B allele frequency, and copy number state at SNP i (1
Hidden copy number states
Emission probability of log R ratio
;) is the density function of a normal distribution with mean µr,z and standard deviation sr,z. Here the uniform distribution is used to model both random fluctuation of signal measures in chemical assays and the possible genome misannotation and misassembly.
Emission probability of B allele frequency
Specific treatment for chromosome X
Transition probabilities of hidden states
Parameter estimation and CNV calling
A posteriori CNV validation using family information
Let
Since we initially analyze the parents and the offspring separately, it is possible that they have different CNV boundaries (Supplemental Fig. 1). In this situation, we can partition the entire combined CNV region into several smaller blocks. For example, for the scenario in the second row and the second column in Supplemental Figure 1 that contains three blocks, the posterior probability of the trio state is
All CNVs used in this study are detected using the human May 2004 genome assembly as the reference genome assembly. The PennCNV software is available from http://www.neurogenome.org/cnv/penncnv. Several support programs for processing raw genotyping data and for functionally annotating CNVs are also included. The CNV calls are publicly available for downloading from the Web site. In addition, we provide custom-made tracks for visualizing CNVs in the UCSC Genome Browser.
We wish to thank the patients and their families who donated blood samples to the Childrens Hospital of Philadelphia (CHOP), and acknowledge the technical staff at the Center for Applied Genomics at CHOP for producing the genotypes used for analyses. We also thank the Autism Genetic Resource Exchange (AGRE) Consortium6 and the participating AGRE families for the resources they provided. The Autism Genetic Resource Exchange is a program of Cure Autism Now and is supported, in part, by grant MH64547 from the National Institute of Mental Health to Daniel H. Geschwind (PI). We also thank Junhyong Kim for critical reading of the manuscript. This work is supported by a seed grant from the Penn/CHOP Center for Autism Research, by NIH grant R01 MH604687 and NARSAD distinguished Investigator Award to M.B., by the Pennsylvania Commonwealth HRFF, and by the Childrens Hospital of Philadelphia.
5 Corresponding author.
E-mail bucan{at}pobox.upenn.edu; fax (215) 573-2041. [Supplemental material is available online at www.genome.org. The PennCNV software is available from http://www.neurogenome.org/cnv/penncnv.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6861907
6 The AGRE Consortium: Dan Geschwind, UCLA, Los Angeles, CA; Maja Bucan, University of Pennsylvania, Philadelphia, PA; W. Ted Brown, N.Y.S. Institute for Basic Research in Developmental Disabilities, Staten Island, NY; Rita M. Cantor, UCLA School of Medicine, Los Angeles, CA; John N. Constantino, Washington University School of Medicine, St. Louis, MO; T. Conrad Gilliam, University of Chicago, Chicago, IL; Martha Herbert, Harvard Medical School, Boston, MA; Clara Lajonchere, Cure Autism Now/Autism Speaks, Los Angeles, CA; David H. Ledbetter, Emory University, Atlanta, GA; Christa Lese-Martin, Emory University, Atlanta, GA; Janet Miller, Cure Autism Now/Autism Speaks, Los Angeles, CA; Stanley F. Nelson, UCLA School of Medicine, Los Angeles, CA; Gerard D. Schellenberg, University of Washington, Seattle, WA; Carol A. Samango-Sprouse, George Washington University, Washington, DC; Sarah Spence, UCLA, Los Angeles, CA; Matthew State, Yale University, New Haven, CT; Rudolph E. Tanzi, Massachusetts General Hospital, Boston, MA.
Aardema, M.J., Crosby, L.L., Gibson, D.P., Kerckaert, G.A., and LeBoeuf, R.A. 1997. Aneuploidy and consistent structural chromosome changes associated with transformation of Syrian hamster embryo cells. Cancer Genet. Cytogenet. 96: 140–150.[CrossRef][Medline] Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J., and Eichler, E.E. 2001. Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 11: 1005–1017. Baum, L.E., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math. Statist 41: 164–171. Carter, N. 2007. Methods and strategies for analyzing copy number variation using DNA microarrays. Nat. Genet. 39: S16–S21.[CrossRef][Medline] Colella, S., Yau, C., Taylor, J.M., Mirza, G., Butler, H., Clouston, P., Bassett, A.S., Seller, A., Holmes, C.C., and Ragoussis, J. 2007. QuantiSNP: An objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35: 2013–2025. Conrad, D.F. and Hurles, M. 2007. The population genetics of structural variation. Nat. Genet. 39: S30–S36.[CrossRef][Medline] Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E., and Pritchard, J.K. 2006. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38: 75–81.[CrossRef][Medline] Eichler, E.E., Nickerson, D.A., Altshuler, D., Bowcock, A.M., Brooks, L.D., Carter, N.P., Church, D.M., Felsenfeld, A., Guyer, M., Lee, C., et al. 2007. Completing the map of human genetic variation. Nature 447: 161–165.[CrossRef][Medline] Feuk, L., Carson, A.R., and Scherer, S.W. 2006. Structural variation in the human genome. Nat. Rev. Genet. 7: 85–97.[Medline] Fiegler, H., Redon, R., Andrews, D., Scott, C., Andrews, R., Carder, C., Clark, R., Dovey, O., Ellis, P., Feuk, L., et al. 2006. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 16: 1566–1574. Freeman, J.L., Perry, G.H., Feuk, L., Redon, R., McCarroll, S.A., Altshuler, D.M., Aburatani, H., Jones, K.W., Tyler-Smith, C., Hurles, M.E., et al. 2006. Copy number variation: New insights in genome diversity. Genome Res. 16: 949–961. Geschwind, D.H., Sowinski, J., Lord, C., Iversen, P., Shestack, J., Jones, P., Ducat, L., and Spence, S.J. 2001. The Autism Genetic Resource Exchange: A resource for the study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet. 69: 463–466.[CrossRef][Medline] Gunderson, K.L., Steemers, F.J., Lee, G., Mendoza, L.G., and Chee, M.S. 2005. A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet. 37: 549–554.[CrossRef][Medline] Hinds, D.A., Kloek, A.P., Jen, M., Chen, X., and Frazer, K.A. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38: 82–85.[Medline] Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., and Lee, C. 2004. Detection of large-scale variation in the human genome. Nat. Genet. 36: 949–951.[CrossRef][Medline] Ishkanian, A.S., Malloff, C.A., Watson, S.K., DeLeeuw, R.J., Chi, B., Coe, B.P., Snijders, A., Albertson, D.G., Pinkel, D., Marra, M.A., et al. 2004. A tiling resolution DNA microarray with complete coverage of the human genome. Nat. Genet. 36: 299–303.[CrossRef][Medline] Kent, W.J. 2002. BLAT—The BLAST-like alignment tool. Genome Res. 12: 656–664. Khaja, R., Zhang, J., MacDonald, J.R., He, Y., Joseph-George, A.M., Wei, J., Rafiq, M.A., Qian, C., Shago, M., Pantano, L., et al. 2006. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38: 1413–1418.[CrossRef][Medline] Kuhn, R.M., Karolchik, D., Zweig, A.S., Trumbower, H., Thomas, D.J., Thakkapallayil, A., Sugnet, C.W., Stanke, M., Smith, K.E., Siepel, A., et al. 2007. The UCSC Genome Browser Database: Update 2007. Nucleic Acids Res. 35: D668–D673. Locke, D.P., Sharp, A.J., McCarroll, S.A., McGrath, S.D., Newman, T.L., Cheng, Z., Schwartz, S., Albertson, D.G., Pinkel, D., Altshuler, D.M., et al. 2006. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79: 275–290.[CrossRef][Medline] Marioni, J.C., Thorne, N.P., and Tavare, S. 2006. BioHMM: A heterogeneous hidden Markov model for segmenting array CGH data. Bioinformatics 22: 1144–1146. McCarroll, S.A. and Altshuler, D. 2007. Copy-number variation and association studies of human disease. Nat. Genet. 39: S37–S42.[CrossRef][Medline] McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S.B., Lee, C., Daly, M.J., et al. 2006. Common deletion polymorphisms in the human genome. Nat. Genet. 38: 86–92.[Medline] Mills, R.E., Luttig, C.T., Larkins, C.E., Beauchamp, A., Tsui, C., Pittard, W.S., and Devine, S.E. 2006. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16: 1182–1190. Nguyen, D.Q., Webber, C., and Ponting, C.P. 2006. Bias of selection on human copy-number variants. PLoS Genet. 2: e20. doi: 10.1371/journal.pgen.0020020.[CrossRef][Medline] Peiffer, D.A., Le, J.M., Steemers, F.J., Chang, W., Jenniges, T., Garcia, F., Haden, K., Li, J., Shaw, C.A., Belmont, J., et al. 2006. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16: 1136–1148. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., et al. 2006. Global variation in copy number in the human genome. Nature 444: 444–454.[CrossRef][Medline] Risin, S., Hopwood, V.L., and Pathak, S. 1992. Trisomy 12 in Epstein-Barr virus-transformed lymphoblastoid cell lines of normal individuals and patients with nonhematologic malignancies. Cancer Genet. Cytogenet. 60: 164–169.[CrossRef][Medline] Risin, S., Fujimaki, T., Mestriner, C.A., Brown, N.M., Hopwood, V.L., Fidler, I.J., and Pathak, S. 1993. Clonal expansion of cells with trisomy of chromosomes 12 and X in an EBV-transformed lymphoblastoid cell line and establishment of a tumorigenic monoclonal cell line (48,XX,+X,+12). Cytogenet. Cell Genet. 62: 54–55.[Medline] Scherer, S.W., Lee, C., Birney, E., Altshuler, D., Eichler, E.E., Carter, N., Hurles, M., and Feuk, L. 2007. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 39: S7–S15.[CrossRef][Medline] Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., et al. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525–528. Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R., et al. 2005. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77: 78–88.[CrossRef][Medline] Simon-Sanchez, J., Scholz, S., Fung, H.C., Matarin, M., Hernandez, D., Gibbs, J.R., Britton, A., de Vrieze, F.W., Peckham, E., Gwinn-Hardy, K., et al. 2007. Genome-wide SNP assay reveals structural genomic variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum. Mol. Genet. 16: 1–14. Steemers, F.J. and Gunderson, K.L. 2007. Whole genome genotyping technologies on the BeadArray platform. Biotechnol. J. 2: 41–49.[CrossRef][Medline] Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., et al. 2005. Fine-scale structural variation of the human genome. Nat. Genet. 37: 727–732.[CrossRef][Medline] Viterbi, A.J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13: 260–269.[CrossRef] Wong, K.K., deLeeuw, R.J., Dosanjh, N.S., Kimm, L.R., Cheng, Z., Horsman, D.E., MacAulay, C., Ng, R.T., Brown, C.J., Eichler, E.E., et al. 2007. A comprehensive analysis of common copy-number variations in the human genome. Am. J. Hum. Genet. 80: 91–104.[CrossRef][Medline]
Received June 29, 2007; accepted in revised format September 5, 2007.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||