|
|
|
|
Genome Res. 14:1130-1136, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Methods Automatic Identification of Subcellular Phenotypes on Human Cell Arrays1 Intelligent Bioinformatics Systems, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany 2 Department of Cell Biology/Biophysics, EMBL Heidelberg, 69117 Heidelberg, Germany 3 Gene Expression and Cell Biology/Biophysics Programmes, EMBL Heidelberg, 69117 Heidelberg, Germany 4 MetaSystems GmbH, 68804 Altlussheim, Germany
Light microscopic analysis of cell morphology provides a high-content readout of cell function and protein localization. Cell arrays and microwell transfection assays on cultured cells have made cell phenotype analysis accessible to high-throughput experiments. Both the localization of each protein in the proteome and the effect of RNAi knock-down of individual genes on cell morphology can be assayed by manual inspection of microscopic images. However, the use of morphological readouts for functional genomics requires fast and automatic identification of complex cellular phenotypes. Here, we present a fully automated platform for high-throughput cell phenotype screening combining human live cell arrays, screening microscopy, and machine-learning-based classification methods. Efficiency of this platform is demonstrated by classification of eleven subcellular patterns marked by GFP-tagged proteins. Our classification method can be adapted to virtually any microscopic assay based on cell morphology, opening a wide range of applications including large-scale RNAi screening in human cells.
Genomewide cDNA overexpression and gene knock-down by RNA interference enable gain and loss of function screens in many systems of cultured cells that were traditionally not accessible to genetic screens (e.g., Drosophila Schneider cells, human cell cultures). In addition to gain and loss of function, the subcellular localization of the whole proteome can be determined by expressing tagged proteins (Pepperkok et al. 2001 Such a manual approach is a source for bias in data analysis and causes a bottleneck for large-scale experiments. Automated systems for the interpretation of cell images from cell arrays would provide three important advantages over manual practice: (1) high-throughput performance; (2) quantitative and reproducible identification of cellular phenotypes; and therefore (3) consistent and unbiased phenotypic information in protein databases.
Automatic classification of microscopic images typically requires the extraction of quantitative parameters (features) from the digital image (Egmont-Petersen et al. 2002 Importantly, none of the existing studies on automatic classification of subcellular localization has used images that were captured automatically such as in high-throughput microscopy screens that yield images of inherently lower quality. Manual selection, centering, focusing, and control of the illumination and detection parameters in the imaging process evidently leads to better classification results than in an automated imaging system. In this study, we present a fully automated system for the production and imaging of human cell arrays, and most importantly, an automated classification of phenotypes. As a case study, we set up an overexpression screen, in which we marked eleven different subcellular patterns by GFP-tagged proteins, each of them characterizing a distinct phenotype.
Workflow Concept The focus of this study was to set up a framework for high-throughput cell phenotyping. To make this screen scalable to the high number of human genes, we based our screen on human cell arrays (Fig. 1). We generated a variety of phenotypes in a proof-of-principle study by an overexpression screen, in which we marked eleven different subcellular patterns by GFP-tagged proteins, each of them characterizing a distinct phenotype. To allow automation of cell array production, we developed an improved protocol for solid phase transfection by mixing all required transfection components prior to robotic spotting. Thus, transfection-ready DNA arrays on chambered coverglasses compatible with direct live cell observation can be printed, dried, and stored, and cells are simply seeded on the array one day prior to phenotype analysis.
For fully automated phenotype analysis, we designed an imaging strategy to automatically capture single-cell fluorescence images from entire live cell arrays with high resolution. We implemented our imaging strategy by adapting a commercial widefield fluorescence scanning microscope (Mehes et al. 2000 For enhanced detection of cells an auto-focus algorithm is applied first to cell nuclei counterstained by Hoechst, then the GFP or YFP signal is automatically integrated and the auto-focus algorithm is executed again on the protein signal. Auto-focusing works reliably in cells growing as monolayers, making this method applicable to a wide variety of cultured mammalian cells. For this study we chose the MCF7 breast cancer cell line but we have obtained similar results with HeLa cells (data not shown). A series of images is then acquired with variable integration times of the CCD camera, thereby allowing selection of cells with good signal-to-noise ratio from a field with cells at different expression levels. In the next step, objects potentially representing cell nuclei are segmented based on the counterstained signal. Finally, the system selects `valid' cell nuclei out of the set of candidate objects based on morphological parameters (for details see Methods).
Subcellular Class Design With Additional Artifact Class
Designing Optimal Classification Schemes Image classification is the critical next step in the analysis stream. To find the optimal method for classification of automatically captured images, we compared two well-known methods in machine learning, namely, Artificial Neural Networks (ANN; MacKay 1992
Since the optimal number of image features was typically between 20 and 25 features for the BayesANN approach (data not shown), we based the training of all classification methods on only 25 features for better comparison. The 25 best-ranked features of the STEPWISE or SAM selections contained predominately texture-related features such as granularities and co-occurrence (Supplemental Table 2). Texture-related features thus appear to be at least as efficient as morphological object features for automatic classification of subcellular patterns. Further, they are more robust with respect to biological variation. Figure 3 summarizes the results from these classification algorithms. The ANN/GA performed worse than the other two methods (Fig. 3A) and was therefore not further considered in our study. BayesANN and SVM classification algorithms performed well in combination with the feature selection methods SAM and STEPWISE. The two best combinations of algorithms were generated through SVM/STEPWISE and BayesANN/SAM with accuracies of 80.5% and 82.2%, respectively (Fig. 3B; Supplemental Table 3). In contrast to ANNs, the training of SVM does not strongly depend on feature preselection (Ramaswamy et al. 2001
In the present study, we developed a fully automated workflow from cell array production to phenotype analysis. As a case study we chose to phenotype cells by identification of the subcellular localization of marker proteins that can be used as indicators for the cellular state. We achieved a very high overall accuracy of more than 80% prediction for eleven localization classes with our fully automated system. An even better accuracy was affected mostly by three problematic localizations that were difficult to distinguish at the resolution of our imaging system. The localization class in our study that could be classified most inefficiently was endoplasmatic reticulum (accuracy 31%), which was frequently incorrectly classified as microtubules (accuracy 47%) or mitochondria (accuracy 62%). These misclassifications are in agreement with the visual similarities of the corresponding images (Fig. 2). All other automatically imaged subcellular patterns were recognized with accuracy between 74% and 95%. A key for achieving this high degree of accuracy was to include all artifacts into one additional localization class. Despite its heterogeneity, the artifact class (Fig. 2) could be accurately distinguished from all other subcellular classes (accuracy 91%), but receives false positives from other classes. Importantly, the prediction within all other classes of subcellular localizations achieved a specificity per class of more than 97% (Supplemental Table 3). Thus, only a very small proportion of all images is assigned incorrectly to an individual class.
Biological variation of the same phenotype in different cells remained the most challenging aspect of our fully automated workflow from cell array production to phenotype analysis. By choosing transient overexpression of GFP-tagged cDNAs as a proof-of-principle application this problem was probably more severe than in immunofluorescence screens based on endogenous proteins (Kiger et al. 2003
To assess the classification ability of our system for different cDNA clones not being used in the training step, we automatically captured 20 images of microtubule-related Ensconsin (Bulinski et al. 2001
The automated captured images of endoplasmatic reticulum and microtubles are partly very similar. The lack of sharp images with a low depth of image field in this study appeared to be problematic for the separation of the fine tubular structure of microtubules and the membrane network of endoplasmatic reticulum. Using a cell line that has a flatter morphology than the MCF7 cells employed in our study, where a significant overlap between ER and microtubules in the rounded up cytoplasm is observed, can most likely solve this problem. For subtle phenotypic differences we anticipate that a confocal screening microscope will improve accuracy in the future and we have started work on such a system. The essential advance of this study is the integration of automated classification with automated microscopy and production of cell arrays, which can potentially be applied to large, genomewide cell arrays. Our system can also be easily extended to extract dynamic features from time-lapse screening studies (Gonczy et al. 2000
Live Cell Arrays The plasmid-gelatin-transfection solution was prepared in 384-well plates (Nunc) as follows: 500 ng of GFP or YFP tagged plasmid, 7.5 µL EC buffer and 0.75 µL Enhancer were incubated for 10 min at room temperature and then mixed with 2.5µL Effectene (Effectene Transfection Kit, Qiagen) and again incubated for 10 min at room temperature in 7.25 µL of 0.08% Gelatin (G-9391, Sigma). The plasmid-gelatin-transfection solution was arrayed onto 1-well Labtek (Nunc) slides using the ChipWriter Compact robot (Biorad). The spot diameter was 400 µm for all experiments. After printing, 6.5 x 105 MCF7 cells were plated on the Labtek slide and cultured for 10 h in growth medium containing 10% inactivated fetal calf serum, 1% glutamine, and 1% penicillin-streptomycin. Thereafter, Hoechst stain (33342, Sigma, 1 µg/mL final concentration) was added for 10 min to stain cell nuclei. For live cell data acquisition the growth medium was replaced by imaging medium (DMEM without carbonate but supplemented with 30 mM Hepes, pH 7.4, obtained from Sigma). The transfection efficiency varied between 1% and 30% depending strongly on the cDNA to be transfected.
Automatic Image Acquisition For auto-focusing, the stage is first moved down and then up to a number of focus planes to minimize the effects of mechanical focus drive backlash. At each position an image is captured. The number of planes and distance between consecutive focus planes is defined in the parameter set of the imaging microscope. For each of the captured images a focus criterion is computed. The stage is then moved to the Z-position corresponding to the plane optimizing the focus criterion. After global or local background correction, the gray-level histogram of the counterstain channel image is computed and analyzed. Based on the maximum and minimum gray levels in the image and a threshold factor defined in the parameter set, the system calculates a global segmentation threshold. A fast-contour-following algorithm is used to isolate the objects defined by this thresholding operation. Finally, the system accepts a candidate as a `valid' cell nucleus if the object area ranges from 50500 µm2, concavity depth <0.8, and aspect ratio <3.0.
Only those valid cells are selected for imaging of subcellular localization. Multiple image acquisition with three integration times (1x, 2x, and 3x the calculated integration time, respectively, with maximum integration of 2 sec) was applied to the GFP/YFP signal. A selection of the brightest unsaturated (threshold for saturation: 4 saturated pixels) GFP/YFP cell image (ROI of 148x148 pixels, centered by center of nucleus) from the three integrated GFP/YFP images was determined. Tiles are considered positive if total intensity >400 and relative center intensity >20. The relative center intensity is the ratio between minimum gray level within and maximum gray level outside of half of the radius of the cell. The scanning time for 150 spots imaged in this study was in the range of
Feature Generation and Classification
The Support Vector Machines (http://www.csie.ntu.edu.tw/
We thank Benedikt Brors and Daniel Gerlich for suggestions on the manuscript. We thank Carl Zeiss Inc. (Göttingen, Germany) for microscope support to the ALMF at EMBL. Stefan Wiemann and Annemarie Poustka have kindly provided some of the cDNA clones. This work was supported by grants from Federal Ministry of Education and Research (DHGP:01KW9937, NGFN:01GR0101, BioFuture: 0311880A) and Human Frontiers Science Program (RGP0031/2001-M). The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2383804.
5 These two authors contributed equally to this work.
6 Corresponding author. [Supplemental material is available online at www.genome.org. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: S. Wiemann and A. Poustka.]
Bishop, C.M. 2000. Neural network for pattern recognition. Oxford University Press, New York.
Bulinski, J.C., Odde, D.J., Howell, B.J., Salmon, T.D., and Waterman-Storer, C.M. 2001. Rapid dynamics of the microtubule binding of ensconsin in vivo. J. Cell. Sci. 114: 38853897. Burhardt, H. and Siggelow, S. 2001. Invariant features in pattern recognitionfundamentals and application. In Nonlinear model-based image/video processing and analysis (eds. C. Kotropoulos and I. Pitas), pp. 269307. John Wiley & Sons, New York. Chang, T. and Kuo, C.C. 1993. Texture analysis and classification with tree-structured wavelet transform. IEEE Transactions on Image Processing 2: 429441.[CrossRef] Chen, X., Velliste, M., Weinstein, S., Jarvik, J.W., and Murphy, R.F. 2003. Location proteomicsBuilding subcellular location tree from high resolution 3D fluorescence microcope images of randomly-tagged proteins. Manipulation and Analysis of Biomolecules, Cells, and Tissues, Proceedings of SPIE 4962: 298306. Egmont-Petersen, M., de Ridder, D., and Handels, H. 2002. Image processing with neural networksA review. Pattern Recognition 35: 22792301.[CrossRef] Gonczy, P., Echeverri, C., Oegema, K., Coulson, A., Jones, S.J., Copley, R.R., Duperon, J., Oegema, J., Brehm, M., Cassin, E., et al. 2000. Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III. Nature 408: 331336.[CrossRef][Medline] Haralick, R.M. 1979. Statistical and structural approaches to texture. Proceedings of the IEEE 67: 768804. Huang, K., Velliste, M., and Murphy, R.F. 2003. Feature reduction for improved recognition of subcellular location pattern in fluorescence microscope images. Manipulation and Analysis of Biomolecules, Cells, and Tissues, Proceedings of SPIE 4962: 298306. Huh, W.-K., Falvo, J.V., Gerke, L.C., Caroll, A.S., Howson, R.W., Weissmann, J.S., and O'Shea, E.K. 2003. Global analysis of the protein localization in budding yeast. Nature 425: 686691.[CrossRef][Medline] Jain, A.K., Duin, R.P.W., and Mao, J. 2000. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22: 437.[CrossRef] Jarvik, J.W., Adler, S.A., Telmer, C.A., Subramaniam, V., and Lopez, A.J. 1996. CD-tagging: A new approach to gene and protein discovery and analysis. Biotechniques 20: 896904.[Medline] Kiger, A., Baum B., Jones, S., Jones, M., Coulson, A., Echeverri, C., and Perrimon, N. 2003. A functional genomic analysis of cell morphology using RNA interference. J. Biol. 2: 27.[CrossRef][Medline] Leray, P. and Gallinari, P. 1999. Feature selection with neural networks. Behaviormetrika 26: 127. Liebel, U., Starkuviene, V., Erfle, H., Simpson, J.C., Poustka, A., Wiemann, S., and Pepperkok, R. 2003. A microscope-based screening platform for large-scale functional protein analysis in intact cells. FEBS Lett. 554: 394398.[CrossRef][Medline] MacKay, D.J.C. 1992. A practical Bayesian framework for backpropagation networks. Neural Comput. 4: 448472. Mehes, G., Lorch, T., and Ambros, P.F. 2000. Quantitative analysis of disseminated tumor cells in the bone marrow by automated fluorescence image analysis. Cytometry 42: 357362.[CrossRef][Medline] Murphy, R.F., Velliste, M., and Porreca, G. 2002. Robust classification of subcellular location patterns in fluorescence microscope images. In Proceedings of the 2002 IEEE International Workshop on Neural Networks Signal Processing (NNSP 12), pp. 6776. Pepperkok, R., Simpson, J.C., and Wiemann, S. 2001. Being in the right location at the right time. Genome Biol. 2: REVIEWS1024.[Medline] Ragg, T. 2002. Bayesian learning and evolutionary parameter optimization. AI Communications 15: 6174.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., et al. 2001. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. 98: 1514915154.
Rolls, M.M., Stein, P.A., Taylor, S.S., Ha, E., McKeon, F., and Rapoport, T.A. 1999. A visual screen of a GFP-fusion library identifies a new type of nuclear envelope membrane protein. J. Cell. Biol. 146: 2944. Simpson, J. and Pepperkok, R. 2003. Localizing the proteome. Genome Biol. 4: 240.[CrossRef][Medline] Simpson, J.C., Wellenreuther, R., Poustka, A., Pepperkok, R., and Wiemann, S. 2000. Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Rep. 1: 287292.[CrossRef][Medline] Smola, A.J. and Schölkopf, B. 1998. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica 22: 211231.
Tusher, V.G., Tibshirani, R., and Chu, G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. 98: 51165121. Vapnik, V.N. 1995. The nature of statistical learning theory. Springer-Verlag, New York.
Wiemann, S., Weil, B., Wellenreuther, R., Gassenhuber, J., Glassl, S., Ansorge, W., Böcher, M., Blöcker, H., Bauersachs, S., Blum, H., et al. 2001. Towards a catalog of human genes and proteins: Sequencing and analysis of 500 novel complete protein coding human cDNAs. Genome Res. 11: 422435. Zernike, F. 1934. Beugungstheorie des schneidenverfahrens und seiner verbesserten form, der phasenkontrastmethode. Physika 1: 689704. Ziauddin, J. and Sabatini, D.M. 2001. Microarrays of cells expressing defined cDNAs. Nature 411: 107110.[CrossRef][Medline]
http://www.dkfz.de/LIFEdb/; cDNA database. http://harvester.embl.de/; Database cross linker.
http://www.csie.ntu.edu.tw/ http://www.ncrg.aston.ac.uk/netlab/; Neural Network toolbox using Matlab. http://brain.unr.edu; Source code of ANN (NevProp3) by Philip Goodman.
Received January 26, 2004;
accepted in revised format March 4, 2004.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||