|
|
|
|
Published online before print
January 31, 2007, 10.1101/gr.5686107 Genome Res. 17:368-376, 2007 ©2007 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/07 $5.00
Methods High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays1 Department of Pathology, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands; 2 Department of Medical Statistics, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands; 3 Department of Molecular Cell Biology, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands; 4 Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre, 3508 AB Utrecht, The Netherlands
High-density SNP microarrays provide insight into the genomic events that occur in diseases like cancer through their capability to measure both LOH and genomic copy numbers. Where currently available methods are restricted to the use of fresh frozen tissue, we now describe the design and validation of copy number measurements using the Illumina BeadArray platform and the application of this technique to formalin-fixed, paraffin-embedded (FFPE) tissue. In fresh frozen tissue from a set of colorectal tumors with numerous chromosomal aberrations, our method measures copy number patterns that are comparable to values from established platforms, like Affymetrix GeneChip and BAC array-CGH. Moreover, paired comparisons of fresh frozen and FFPE tissues showed nearly identical patterns of genomic change. We conclude that this method enables the use of paraffin-embedded material for research into both LOH and numerical chromosomal abnormalities. These findings make the large pathological archives available for genomic analysis, which could be especially relevant for hereditary disease where fresh material from affected relatives is rarely available.
Genomic copy number variations (CNVs) and allelic imbalances are common characteristics of cancer and other diseases (Rajagopalan and Lengauer 2004
Genome-wide SNP array CNV and LOH profiles have been reported for two different SNP typing platforms: Affymetrix GeneChip arrays and Illumina BeadArrays (Oliphant et al. 2002 In this study, we have developed a method to measure DNA copy numbers from FFPE tumors on Illumina BeadArrays and compared the outcome to copy number profiles from fresh frozen tumors. Tumors from different hospitals were included, from which both normal and tumor FFPE tissue, fresh frozen tumor, and normal leukocyte DNA were available. We determined reliability and reproducibility for all types of tissue and compared copy number patterns from fresh frozen tumor with FFPE tumor.
For the reliable detection of regions with CNVs, accurate normalization algorithms are essential to identify only real aberrations. For GeneChips, several algorithms have been reported (Lieberfarb et al. 2003 We show here that the signal intensity values for BeadArrays can be used to create reliable copy number profiles from FFPE colorectal tumors with very high reproducibility between experiments, high concordance with frozen tissue from the same tumor, and a high degree of agreement to other methods.
Copy number calculations Genotypes from BeadArrays were computated from allele-specific signal intensities (Fan et al. 2003
Channel properties
Background correction Background correction is often an essential step in data processing. Since the scanning software does not provide direct estimates of background intensity for each measurement, we tested three types of background estimation based on observations of the signal properties. First, the background intensity can be estimated as the minimal signal intensity in a channel. Second, the first mode of the intensity histogram can be used. In samples without CNVs, SNPs have three possible states per allele: zero, one, or two copies. For probes that have one or two copies in the sample under investigation, the variability of the measured intensity is determined by the PCR, hybridization, the measurement properties of that probe, and noise. For probes that have zero copies, the variability is only determined by noise. A simulation of this model, with Gaussian distributions for probe properties and noise, is shown in Figure 1B. The distribution of the zero alleles shows a narrow, distinguishable peak, implying that the first mode of the signal can be used as an estimation of the background signal.
The third approach is based on the observation that the population of homozygous SNPs is slightly slanted inward on a scatter plot (Fig. 1C); the signal intensity of absent alleles is higher at higher intensity of the present alleles. This effect could be due to crosstalk or spectral overlap between the fluorescent dyes. In order to correct for this, we chose to convert the green and red intensities for each SNP into polar coordinates and to use the angle value of the two peaks adjacent to the quadrant boundaries at 0 and
Within-array sample normalization
Between-array locus normalization
Selection criteria for best settings Figure 2 shows the effects of the normalization settings on the amplitude and variability of the signal. Background subtraction (BG) shows a clear increase in the amplitude of the signal, especially for the "mode" method. However, all methods substantially increase variability (e.g., the mode method increases the amplitude but nearly doubles the standard deviation).
Selecting only heterozygous loci (GT) for use as an invariant set improves the normalization since the resulting copy number for unaffected regions in tumor samples is close to 2, with low variability. Qnt between the channels of a sample decreases variability with little or no impact on the other goals. Selection of high-quality heterozygous loci (pGCS) improves normalization when the cutoff is around the 80th percentile. At that point the amplitude of the signal for affected regions is the highest, with only a small effect on variability. Consequently, we chose the following settings for pre-processing and normalization (Table 1): (1) Use Qnt between the red and green signals for each sample; (2) no background estimation and correction; (3) use only the top 20% of heterozygous loci in each sample on the rGCS scale.
Validation We validated our processing strategy by evaluating sex chromosomes, comparing the results to other methods, and assessing the reproducibility of samples in different experiments.
Performance with known CNV
Comparison to other methods
The basic pattern of CNVs is comparable between these platforms, with a correlation of >0.9 between the same tumors on different platforms and low correlations between the experimental samples (Table 3). Moreover, the visual resemblance between the smoothed signals from these methods is remarkable (Fig. 3).
Reproducibility The set of samples were hybridized twice to separate Illumina Sentrix arrays. Despite the far lower intensities in one of these arrays (3900 vs. 820), the data show a very good concordance (Table 3)
Combined analysis of LOH and copy number
Comparison of CNV in frozen and FFPE tumor tissue The variability of the unsmoothed copy numbers at different levels of CNV was comparable between frozen tissue and FFPE samples (Table 2). The correlation between the variable chromosomes in FFPE and frozen samples from the same patient was less than between different methods with frozen samples or between replicates (Table 3). This was mainly due to sample T44. When this tumor was excluded, 50% of the values were >0.96. In order to test the origin of the differences, we also performed BAC arrays on the FFPE-extracted DNA. There was insufficient material left to process T108, but the patterns of CNVs in the other three samples, and especially the differences between frozen and FFPE material, were comparable between BAC arrays and BeadArrays (Fig. 4D). Also, for each, tumor chromosomes can be selected that show perfect concordance, while other chromosomes perform less well (Fig. 4CE). The average absolute distances between frozen and FFPE samples from both normal and affected chromosomes show excellent concordance.
Application in FFPE tissue We have applied this method to a series of 22 colorectal tumors that have been stored in the paraffin archive for up to 10 yr. Several characteristic patterns of LOH and CNVs can be identified in this series (Fig. 5) (Diep et al. 2006
We have developed a method to determine CNVs using an Illumina BeadArray in combination with the GoldenGate assay and validated it using established copy number methodologies. The technical properties of the GoldenGate assay allow the analysis of partly degraded DNA, and we have shown that copy number analysis of paraffin embedded tissue shows comparable results to the analysis of fresh frozen tissue.
Copy number analysis using SNP microarrays enables the combined analysis of CNV and LOH in one assay (Bignell et al. 2004 Normalization procedures usually assume that the cell population is diploid or aneuploid when no further information is available. The DNA index can be used to improve estimation of chromosomal copy numbers in cases of aneuploid or multiploid tumors. For the GoldenGate assay, this is not immediately feasible because the four linkage panels each contain a different set of chromosomes and any CNV will not usually be distributed evenly among the chromosomes. To cope with this situation, linkage panels with SNPs distributed evenly across the genome are required. The absence of a single chromosome, e.g., the X-chromosome in males, shows a signal of 1.5 rather than the theoretically expected one. This reduced linearity is also found for BAC arrays and GeneChips and is probably a consequence of the complexity of the process. However, it is also the case that the design of the BeadArrays is optimized to discriminate heterozygous from homozygous loci, rather than measure copy number. Consequently, the allele-specific PCR likely approaches saturation, leading to reduced linearity. Reducing the number of PCR cycles could potentially improve the linearity, although the effect of this remains to be tested. In general, calculated copy numbers from SNP and BAC arrays show too much variability to assign discrete copy number values to individual probes. Usually, analysis methods use information from flanking SNPs to calculate copy number under the assumption that genomic events are not restricted to single SNPs. The effect is that a smoothed signal is calculated along the physical position on the genome. The extent of smoothing has an effect on the spatial resolution of the measurement. For noisy data, stronger smoothing is required, thus increasing the minimum size of detectable CNVs.
Of the various smoothing algorithms (Fridlyand et al. 2004
The comparison of copy number analysis from fresh frozen tissue and FFPE tissue from the same tumor showed varying degrees of similarity. Three of the tested tumors showed a median correlation at the same level as comparisons between the methods in the validation section. The limited concordance between frozen and FFPE samples from the fourth tumor could likely be explained by tumor heterogeneity (Fukunari et al. 2003
Taking into account that the PCR amplification protocol was essentially the same for both types of tissue and that the variability of the signal is comparable for both, these findings show that the Illumina GoldenGate array can reliably determine copy number changes from FFPE tissue. A previous study (Thompson et al. 2005
We have previously shown that BeadArrays can be used to reliably genotype and detect LOH on FFPE tissue (Lips et al. 2005
Subjects/material Colorectal tumor tissue that was known to have genomic aberrations and corresponding normal tissue from four patients was used, following medical ethical guidelines. Ploidy status of the tumors was previously assessed by flow cytometry. A pathologist (H. Morreau) assessed the normal and tumor areas and the percentage of tumor cells based on H&E slides. The samples included a rectal adenoma (T514, 60% tumor, aneuploid), one right-sided Dukes B (T44, 50% tumor, aneuploid), and two Dukes C carcinomas (T106, 90% tumor, multiploid; T108, 80% tumor, multiploid). From the departmental FFPE archives, we collected 22 colorectal carcinomas, for which normal DNA from either leukocytes or histologically normal FFPE tissue was also available.
DNA isolation
Array platforms Affymetrix GeneChip Mapping 10K Xba1 2.0 arrays (Affymetrix) contain 10,204 markers with a mean intermarker distance of 258 kb.
BAC array slides were produced at the LUMC department of Molecular Cell Biology. This platform contains 3700 probes spotted in triplicate and uses 1-Mb-spaced BACs distributed by the Wellcome Trust Sanger Institute (Knijnenburg et al. 2005
Data analysis For BeadArrays, gene calls were extracted using the gene calling program GenCall version 6.0.7 (Illumina). The software provides two quality scores per locus, an experiment-wide gene train score (GTS) and a sample-specific gene call score (GCS). From these, we computed the relative gene call score (rGCS) as GCS/GTS. In order to retrieve intensity measurements, the Settings.xml file for the BeadArray software has to be adapted: line <SaveTextFiles>false</SaveTextFiles> has to be changed to <SaveTextFiles>true</SaveTextFiles>. Samples were excluded when the raw median intensity in either of the channels was <1250. Normalization procedures for Illumina arrays are discussed in the Results section. There are a number of methods available to calculate copy numbers from Affymetrix SNP arrays. We have evaluated CNAT, CNAG, and dChip. From these methods, we chose dChip version 1.3 because of its performance with regard to variability and amplitude of the signal for changed chromosomes.
The scanned images for the BAC arrays were processed using GenePix 4.1 software. The BioConductor package Limma (Smyth and Speed 2003 LOH was determined by comparing the genotypes from frozen tumor tissue and blood leukocytes from the same patient. LOH was called for stretches of two or more SNPs within 500,000 bp that were heterozygous in normal tissue and homozygous in tumor tissue
Basic properties of the methods, such as average and variability, were calculated from the normalized copy numbers. Comparisons between methods and samples were calculated from binned, smoothed copy numbers with a bin size of 2500 kb. A genomic smoother was used as described in Eilers and de Menezes (2005) In order to promote the concept of reproducible research, the R-scripts to create the figures and tables from the raw data are bundled together with the data sets.
We thank A. Middeldorp and M. van Puijenbroek for discussions, J.W.F. Dierssen for providing colorectal tumor samples, and R. vant Slot for sample and array processing.
5 Corresponding author.
E-mail j.oosting{at}lumc.nl; fax 31-71-5248158. [Supplemental material is available online at www.genome.org. The R-package BeadArray SNP used to perform the analysis is available from http://www.bioconductor.org. The data sets are available from the Gene Expression Omnibus with accession number GSE5347 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5347).] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5686107
Bignell, G.R., Huang, J., Greshock, J., Watt, S., Butler, A., West, S., Grigorova, M., Jones, K.W., Wei, W., and Stratton, M.R., et al. 2004. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 14: 287295. De Jong, A.E., van Puijenbroek, M., Hendriks, Y., Tops, C., Wijnen, J., Ausems, M.G., Meijers-Heijboer, H., Wagner, A., Van Os, T.A., and Brocker-Vriends, A.H., et al. 2004. Microsatellite instability, immunohistochemistry, and additional PMS2 staining in suspected hereditary nonpolyposis colorectal cancer. Clin. Cancer Res. 10: 972980. Diep, C.B., Kleivi, K., Ribeiro, F.R., Teixeira, M.R., Lindgjaerde, O.C., and Lothe, R.A. 2006. The order of genetic events associated with colorectal cancer progression inferred from meta-analysis of copy number changes. Genes Chromosomes Cancer 45: 3141.[CrossRef][Medline] Dumur, C.I., Dechsukhum, C., Ware, J.L., Cofield, S.S., Best, A.M., Wilkinson, D.S., Garrett, C.T., and Ferreira-Gonzalez, A. 2003. Genome-wide detection of LOH in prostate cancer using human SNP microarray technology. Genomics 81: 260269.[CrossRef][Medline] Eilers, P.H. and de Menezes, R.X. 2005. Quantile smoothing of array CGH data. Bioinformatics 21: 11461153. Fan, J.B., Oliphant, A., Shen, R., Kermani, B.G., Garcia, F., Gunderson, K.L., Hansen, M., Steemers, F., Butler, S.L., and Deloukas, P., et al. 2003. Highly parallel SNP genotyping. Cold Spring Harb. Symp. Quant. Biol. 68: 6978.[CrossRef][Medline] Fridlyand, J., Snijders, A.M., Pinkel, D., Albertson, D.G., and Jain, A.N. 2004. Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal. 90: 132153. Fukunari, H., Iwama, T., Sugihara, K., and Miyaki, M. 2003. Intratumoral heterogeneity of genetic changes in primary colorectal carcinomas with metastasis. Surg. Today 33: 408413.[CrossRef][Medline] Garraway, L.A., Widlund, H.R., Rubin, M.A., Getz, G., Berger, A.J., Ramaswamy, S., Beroukhim, R., Milner, D.A., Granter, S.R., and Du, J., et al. 2005. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436: 117122.[CrossRef][Medline] Herr, A., Grutzmann, R., Matthaei, A., Artelt, J., Schrock, E., Rump, A., and Pilarsky, C. 2005. High-resolution analysis of chromosomal imbalances using the Affymetrix 10K SNP genotyping chip. Genomics 85: 392400.[CrossRef][Medline] Irving, J.A., Bloodworth, L., Bown, N.P., Case, M.C., Hogarth, L.A., and Hall, A.G. 2005. Loss of heterozygosity in childhood acute lymphoblastic leukemia detected by genome-wide microarray single nucleotide polymorphism analysis. Cancer Res. 65: 30533058. Ishikawa, S., Komura, D., Tsuji, S., Nishimura, K., Yamamoto, S., Panda, B., Huang, J., Fukayama, M., Jones, K.W., and Aburatani, H. 2005. Allelic dosage analysis with genotyping microarrays. Biochem. Biophys. Res. Commun. 333: 13091314.[CrossRef][Medline] Janne, P.A., Li, C., Zhao, X., Girard, L., Chen, T.H., Minna, J., Christiani, D.C., Johnson, B.E., and Meyerson, M. 2004. High-resolution single-nucleotide polymorphism array and clustering analysis of loss of heterozygosity in human lung cancer cell lines. Oncogene 23: 27162726.[CrossRef][Medline] Jong, K., Marchiori, E., Meijer, G., Vaart, A.V., and Ylstra, B. 2004. Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics 20: 36363637. Kallioniemi, O.P., Kallioniemi, A., Sudar, D., Rutovitz, D., Gray, J.W., Waldman, F., and Pinkel, D. 1993. Comparative genomic hybridization: A rapid new method for detecting and mapping DNA amplification in tumors. Semin. Cancer Biol. 4: 4146.[Medline] Kennedy, G.C., Matsuzaki, H., Dong, S., Liu, W.M., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., and Zhang, J., et al. 2003. Large-scale genotyping of complex DNA. Nat. Biotechnol. 21: 12331237.[CrossRef][Medline] Knijnenburg, J., Szuhai, K., Giltay, J., Molenaar, L., Sloos, W., Poot, M., Tanke, H.J., and Rosenberg, C. 2005. Insights from genomic microarrays into structural chromosome rearrangements. Am. J. Med. Genet. A 132: 3640.[Medline] Lai, W.R., Johnson, M.D., Kucherlapati, R., and Park, P.J. 2005. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21: 37633770. Lieberfarb, M.E., Lin, M., Lechpammer, M., Li, C., Tanenbaum, D.M., Febbo, P.G., Wright, R.L., Shim, J., Kantoff, P.W., and Loda, M., et al. 2003. Genome-wide loss of heterozygosity analysis from laser capture microdissected prostate cancer using single nucleotide polymorphic allele (SNP) arrays and a novel bioinformatics platform dChipSNP. Cancer Res. 63: 47814785. Lin, M., Wei, L.J., Sellers, W.R., Lieberfarb, M., Wong, W.H., and Li, C. 2004. dChipSNP: Significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 20: 12331240. Lindblad-Toh, K., Tanenbaum, D.M., Daly, M.J., Winchester, E., Lui, W.O., Villapakkam, A., Stanton, S.E., Larsson, C., Hudson, T.J., and Johnson, B.E., et al. 2000. Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nat. Biotechnol. 18: 10011005.[CrossRef][Medline] Lips, E.H., Dierssen, J.W.F., van Eijk, R., Oosting, J., Eilers, P.H., Tollenaar, R.A., de Graaff, E.J., Wijmenga, C., vant Slot, R., and Morreau, H., et al. 2005. Reliable high-throughput genotyping and loss of heterozygosity detection in formalin-fixed paraffin-embedded tumors using single nucleotide polymorphism arrays. Cancer Res. 65: 1018810191. Mao, X., Barfoot, R., Hamoudi, R.A., Easton, D.F., Flanagan, A.M., and Stratton, M.R. 1999. Allelotype of uterine leiomyomas. Cancer Genet. Cytogenet. 114: 8995.[CrossRef][Medline] Matsuzaki, H., Loi, H., Dong, S., Tsai, Y.Y., Fang, J., Law, J., Di, X., Liu, W.M., Yang, G., and Liu, G., et al. 2004. Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res. 14: 414425. Miller, S.A., Dykes, D.D., and Polesky, H.F. 1988. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 16: 1215. Nannya, Y., Sanada, M., Nakazaki, K., Hosoya, N., Wang, L., Hangaishi, A., Kurokawa, M., Chiba, S., Bailey, D.K., and Kennedy, G.C., et al. 2005. A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 65: 60716079. Oliphant, A., Barker, D.L., Stuelpnagel, J.R., and Chee, M.S. 2002. BeadArray technology: Enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques 56 (Suppl 8): 6061. Pinkel, D. and Albertson, D.G. 2005. Comparative genomic hybridization. Annu. Rev. Genomics Hum. Genet. 6: 331354.[Medline] Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., Collins, C., Kuo, W.L., Chen, C., and Zhai, Y., et al. 1998. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20: 207211.[CrossRef][Medline] Pollack, J.R., Perou, C.M., Alizadeh, A.A., Eisen, M.B., Pergamenschikov, A., Williams, C.F., Jeffrey, S.S., Botstein, D., and Brown, P.O. 1999. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23: 4146.[Medline] Primdahl, H., Wikman, F.P., von der Masse, H., Zhou, X.G., Wolf, H., and Orntoft, T.F. 2002. Allelic imbalances in human bladder cancer: Genome-wide detection with high-density single-nucleotide polymorphism arrays. J. Natl. Cancer Inst. 94: 216223. Rajagopalan, H. and Lengauer, C. 2004. Aneuploidy and cancer. Nature 432: 338341.[CrossRef][Medline] Shen, R., Fan, J.B., Campbell, D., Chang, W., Chen, J., Doucet, D., Yeakley, J., Bibikova, M., Wickham, G.E., and McBride, C., et al. 2005. High-throughput SNP genotyping on universal bead arrays. Mutat. Res. 573: 7082.[Medline] Smyth, G.K. and Speed, T. 2003. Normalization of cDNA microarray data. Methods 31: 265273.[CrossRef][Medline] Thompson, E.R., Herbert, S.C., Forrest, S.M., and Campbell, I.G. 2005. Whole genome SNP arrays using DNA derived from formalin-fixed, paraffin-embedded ovarian tumor tissue. Hum. Mutat. 26: 384389.[CrossRef][Medline] Tomlinson, I., Rahman, N., Frayling, I., Mangion, J., Barfoot, R., Hamoudi, R., Seal, S., Northover, J., Thomas, H.J., and Neale, K., et al. 1999. Inherited susceptibility to colorectal adenomas and carcinomas: Evidence for a new predisposition gene on 15q14-q22. Gastroenterology 116: 789795.[CrossRef][Medline] Zhao, X., Li, C., Paez, J.G., Chin, K., Janne, P.A., Chen, T.H., Girard, L., Minna, J., Christiani, D., and Leo, C., et al. 2004. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 64: 30603071. Zhou, X., Li, C., Mok, S.C., Chen, Z., and Wong, D.T. 2004a. Whole genome loss of heterozygosity profiling on oral squamous cell carcinoma by high-density single nucleotide polymorphic allele (SNP) array. Cancer Genet. Cytogenet. 151: 8284.[CrossRef][Medline] Zhou, X., Mok, S.C., Chen, Z., Li, Y., and Wong, D.T. 2004b. Concurrent analysis of loss of heterozygosity (LOH) and copy number abnormality (CNA) for oral premalignancy progression using the Affymetrix 10K SNP mapping array. Hum. Genet. 115: 327330.[CrossRef][Medline]
Received June 23, 2006; accepted in revised format November 29, 2006. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||