|
|
|
|
Genome Res. 16:559-566, 2006 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06 $5.00 Perspective Embracing the complexity of genomic data for personalized medicine1 Duke Institute for Genome Sciences & Policy, 2 Department of Molecular Genetics and Microbiology 3 Department of Medicine Duke University Medical Center, Durham, North Carolina 27710, USA; 4 Koo Foundation Sun Yat Sen Cancer Center Taipei, 112 Taiwan; 5 Institute of Statistics and Decision Sciences, Duke University, Durham, North Carolina 27708, USA
Numerous recent studies have demonstrated the use of genomic data, particularly gene expression signatures, as clinical prognostic factors in cancer and other complex diseases. Such studies herald the future of genomic medicine and the opportunity for personalized prognosis in a variety of clinical contexts that utilizes genome-scale molecular information. The scale, complexity, and information content of high-throughput gene expression data, as one example of complex genomic information, is often under-appreciated as many analyses continue to focus on defining individual rather than multiplex biomarkers for patient stratification. Indeed, this complexity of genomic data is oftenrather paradoxicallyviewed as a barrier to its utility. To the contrary, the complexity and scale of global genomic data, as representing the many dimensions of biology, must be embraced for the development of more precise clinical prognostics. The need is for integrated analysesapproaches that embrace the complexity of genomic data, including multiple forms of genomic data, and aim to explore and understand multiple, interacting, and potentially conflicting predictors of risk, rather than continuing on the current and traditional path that oversimplifies and ignores the information content in the complexity. All forms of potentially relevant data should be examined, with particular emphasis on understanding the interactions, complementarities, and possible conflicts among gene expression, genetic, and clinical markers of risk.
Clinical disease states represent exceedingly complex biological phenotypes reflecting the interaction of a myriad of genetic and environmental contributions. A case in point is cancer, which represents a hugely heterogeneous disease. The characteristics of an individual tumor and its life course results from multiple mutations acquired over time (e.g., RAS, RTK) and continual evolution of the responses to environment (e.g., estrogen or tobacco exposure), overlaying inherent germline variations (e.g., BRCA1/2). Multiple oncogenes affecting critical pathways lead to gene expression data that reflect very many and diverse aspects of the oncologic state. While the effect of some oncogenes may be quite subtle, their combined effectstogether with and in the context of environmental, lifestyle, and other factorscan make an important contribution to tumor aggressiveness. It is the aggregate of these effects that places the individual on a complex, high-dimensional risk "spectrum." The complexity of the disease process leads to immense natural heterogeneity in tumor phenotypes, disease outcomes, and response to therapies. A major challenge is to develop information that can describe this complexity so as to facilitate an understanding of the disease mechanisms as well as to guide the development and application of therapies. Unfortunately, the available array of clinical and biochemical markers fall well short of being capable of describing the disease complexity. The challenge, as well as the opportunity, of personalized medicine lies in the capacity to develop quantitative data that can match the complexity of the disease.
The advent of genomic technologies has now offered the potential to develop data that do provide this complexity, identifying discrete subsets of disease that have not been recognized prior to the use of genomic data. Clear examples can be seen for both lymphoma (Golub et al. 1999
Among the most visible applications have been studies in human cancer where gene expression patterns can be identified that provide phenotypic detail not previously obtainable by traditional methods of analysis: profiles and patterns that identify new subclasses of tumors, such as the distinction between acute myeloid leukemia and acute lymphoblastic leukemia, without prior knowledge of the classes (Golub et al. 1999 While much of our discussion focuses on the use of gene expression data given the wealth of experience now showing the value of this information, we also recognize the importance of other sources of data in the integrated view of personalized medicine. The Human Genome Project has provided genomic sequence and information on sequence variation that distinguish individuals and their susceptibility to disease or prospects for health. The transcripts from the 22,000 genes, their translated protein products and post-translational modifications and splice variants, and small molecule metabolitesthe physiologic workhorses of the genomic program and its interplay with the environmentare all now available to measure disease states and potentially to correlate with clinical outcomes and drug response. Genomic complexity contained in DNA based information, combined with RNA/protein/metabolite profiles and clinical data, offers the opportunity to define multidimensional risk stratifiers with fidelity and precision that have never been possible (Fig. 1).
Pharmacogenetic/genomic data have already provided examples where individual differences in drug response can be identified on the basis of variation in drug metabolizing enzymes or variants in the drug target itself (Tate and Goldstein 2004
The development of integrated analyses, making use of multiple forms of complex data, is an issue of critical relevance to clinical medicine. The benefit of data integration can be illustrated by a consideration of the Framingham Heart Study, the landmark longitudinal analysis of risk determinants for coronary artery disease (Dawber 1980
To fully realize the clinical potential of genome-scale information requires a paradigm shift in the way complex, large-scale data are viewed, analyzed, and utilized. For example, the tradition of identifying one or a small number of biomarkers continues in the context of cancer genomics with the identification of a single gene expression signature from tumor-derived DNA microarray data being the goal, evaluated for its prognostic significance by association with disease outcomes but without regard to the myriad other dimensions of cancer biology reflected in the expression data. Typically, with patient samples stratified by such a simplified signature, the study will then define a predictor of risk. However, it is frequently found that, in follow-on studies with expanded and new patient samples, the potency of the signature as a "risk predictor" is diminished, as is also often the case with traditional clinical and genetic markers. The issue here is not the failure of analytical methods or genomic technologies; rather, it is the focus. The prognostic role of any one or a small number of molecular markers must generally be much more broadly evaluated in conjunction with multiple other factors, including biologically meaningful pathway data and clinical data. Cancer biology and the disease process are hugely complex. Individual risk factors, be they genetic, clinical genomic, or other, represent only single or low-dimensional snapshots of the disease process and state. What is needed is the integrative view that takes all forms of data into consideration and aims to identify an individual patient on a complex spectrum of risk that is measured by multiple factors, while addressing issues of interaction, complementarity, redundancy, andcriticallyconflict among the risk factors at the individual patient level (Nevins et al. 2003
To illustrate the complexity of the oncogenic process and the need to employ equally complex data to predict clinical outcomes, we focus on four recent studies of breast cancer. A pivotal study was published in 2002 in the emerging field of genomic medicine describing the use of a DNA microarray-based 70-gene expression profile as a prognostic factor in breast cancer (vant Veer et al. 2002
A second recent study (Ein-Dor et al. 2005
A third recent study (Chang et al. 2005
Our own work, published in 2004, provided a comprehensive demonstration of the value of combining multiple gene expression signatures together with clinical data in breast cancer prognosis (Pittman et al. 2004
In the Pittman et al. (2004)
The example given for breast cancer prognosis, which makes use of multiple gene expression profiles together with clinical information to generate the most robust outcome predictor, should be viewed as just the initial step toward the final goal of a truly integrated predictive model that assesses all forms of useful data. Ultimately, a completely integrated set of data will be required for individualized prognosis inclusive of genomic variation of germline DNA, expression signatures from the tumor, serum protein markers, and clinical data. Where this exploration of relevant data eventually stops will be determined by the complexity of the phenotypes under study and recognized by the fact that the complexity of the data has matched the complexity of the biology. At a practical level, this is a question balancing statistical and technological considerations: Broader evaluation of the prognostic accuracy of predictive models across larger and diverse patient samples must be stressed in order to define improved understanding of the robustness and practical accuracy, while advances in genomic and other technologies will generate increasingly rich and precise determinations of molecular states that need to be contrasted and evaluation in expanded analyses.
While the most significant advances have been made by using expression profiling from disease tissue, it is also clear that other forms of data have the potential to contribute valuable information. As an example, proteomic profiling of tumors has identified patterns of protein peaks from mass spectrometry analyses that have the capacity to accurately predict clinical outcome in lung cancer (Yanagisawa et al. 2003
A clinical phenotype, whether this is disease outcome, response to therapy, or some other measure of the disease process, reflects events in the disease tissue (such as a tumor) as well as the inherited genetic constitution of the patient. The latter defines the potential response to drugs, the effectiveness of immune interactions, and more. As such, measures of this germline variation will also contribute to the overall goal of developing the most effective predictor of outcome. The concept is straightforwardvariations in key genes that encode drug metabolizing enzymes or immune system activities can influence the onset and course of disease. Nevertheless, compared with the use of gene expression profiling of tumor tissue, the development of pharmaco-genomic markers that predict outcomes has been slow, in part reflecting the difficulty in identifying the relevant genes and gene variants but also the challenge of adoption in clinical practice. Other forms of data may potentially act as surrogates for the impact of the germline genetic variation. For instance, serum proteomic profiles may reflect unique patterns that predict susceptibility. Likewise, expression profiles from peripheral lymphocytes may serve as indicators of events ongoing elsewhere in the body with the lymphocyte serving as a "sensor" of the host environment.
Three key areas represent logical and critical next steps in the use of complex genomic profiling data toward the goal of personalized medicine. First, analyses that have developed profiles that predict future eventssuch as an adverse event or the response to a particular therapymust now move into actual clinical practice by forming the basis for the next generation of clinical trials that will employ these methodologies to stratify patients. No longer should drug treatment studies be performed without a component that attempts to identify those patients most likely to respond to a particular therapeutic regimen. Although the ability to make this transition clearly depends on the strength of validation of genomic/integrated predictions, this transition must be a clear goal of the ongoing work. A distinction should be made between research discovery and eventual application where the latter might demand a more simplified approach for practical reasons. The continued emphasis toward the goal of a single, "silver bullet" gene expression index, or other form of genomic data, is clearly derived from the convention of using assays of single biomarkers or limited numbers of genes assayed by other means such as polymerase chain reaction (PCR) to reflect the state of the disease. It is important to recognize that while this may be worthwhile in a practical sense for an eventual clinical assay, it should not influence the research discovery of the most valuable information to describe a complex phenotype. Even in the setting of practical application, one must be cautious about reducing the information content and thus the ability to recognize the underlying complexity of the biological state. Multiple expression patterns can define concurrent risk stratifications of broad patient groups; such collections of markers will almost surely also generate conflicting information for individual patients as they bear on genes and pathways related to linked, although differing, aspects of the tumor. Indeed, this issue was described in discussions of several individual patients in Pittman et al. (2004) Second, the availability of multiple sources of genomic-scale data relevant to clinical phenotypes must be integrated to develop more precise descriptions of clinical phenotype. We have already discussed the opportunities for merging multiple gene expression patterns to develop more powerful predictive models. The availability of other forms of data, including proteomic, metabolomic, and DNA structure profiles now provides opportunities for integrating these additional forms of data into comprehensive models of disease outcome. Moreover, the analysis of disease tissue, whether tumors or other, represents only half of the story, with germline gene variation information representing yet another opportunity to add richness to the genomic-based predictors and classifiers.
The multidimensional nature of these assays (DNA, RNA, protein-based) along with the clinical data for an individual highlights several challenges that must be addressed for this paradigm to become a reality. One challenge is the nature of the testing platform and the development of performance standards for multicomponent biomaker assays. A second challenge is the delivery of information robustly that allows health care providers to incorporate it into risk assessment seamlessly. The Framingham nomogram (Fig. 3B) (Wilson et al. 1998 Lastly, while there is immediate utility in the application of genomic profiles as prognostic and predictive tools to guide therapy decisions, these profiles also hold information that distinguishes the underlying biological processes that define these differences in risk or response. Clearly, this is a major issue facing the opportunities for stratification since it is of little value to identify patients at increased risk for recurrence, or likely to be resistant to a given therapy, if there are no options for what to then offer such patients. Now, the challenge of extracting an understanding of the underlying biology from the gene expression profiles that define resistant or high-risk patients, which might represent new opportunities for therapeutic intervention, is daunting. While one can often identify function associated with some of the genes underlying a component signature, the challenge is to put this in perspective of the entire genomic profilewhat is the underlying unifying aspect of the profile that distinguishes the two classes of patients? The gene-by-gene approach is limited by the lack of information that can easily connect these genes in a functional sense.
An alternative approach is to look for higher-order structure in the gene expression profiles that might offer clues to the underlying biology (Fig. 5). A recently described application known as gene set enrichment analysis (GSEA) (Mootha et al. 2003
In each of these examples, the goal is to convert the profile into an enhanced biological understanding with the goal of placing the genes in a functional context. This provides an opportunity to identify new therapeutic targets that might be suggested based on an understanding of the pathway that is affected. For instance, while no single gene within a given gene expression profile may represent an attractive therapeutic target, the pathway that is identified could be rich in potential targets. Further, it is also very possible that several therapeutics are already available that target components of the pathway. This approach anticipates the opportunity to generate improved biological understanding into predictive models that classify or predict clinical outcomes (Fig. 5). As already emphasized, it is paramount that all available data should be utilized in the development of models that can most effectively predict important clinical outcomes. Moreover, the realization that biologically defined signatures can add value to the development of outcome predictions clearly suggests that additional signatures tailored more closely to the biological context should be defined and evaluated. The relevance of a wound-healing profile in breast cancer (Chang et al. 2005
6 Corresponding author.
E-mail j.nevins{at}duke.edu; fax (919) 681-8973. Article is online at http://www.genome.org/cgi/doi/10.1101/gr.3851306
Alizadeh A.A., Eisen M.B., Davis R.E., Ma C., Lossos I.S., Rosenwald A., Boldrick J.C., Sabet H., Tran T., Yu X.et al. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503511.[CrossRef][Medline] Bild A., Yao G., Chang J.T., Wang Q., Potti A., Chasse D., Joshi M.-B., Harpole D., Lancaster J.M., Berchuck A.et al. 2006. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439: 353357.[CrossRef][Medline] Black E.P., Huang E., Dressman H., Ishida S., West M., Nevins J.R. 2003. Distinct gene expression phenotypes of cells lacking Rb and Rb family members. Cancer Res. 63: 37163723. Calvo K.R., Liotta L.A., Petricoin E.F. 2005. Clinical proteomics: From biomarker discovery and cell signaling profiles to individualized personal therapy. Biosci. Rep. 25: 107125.[CrossRef][Medline] Chang H.Y., Sneddon J.B., Alizadeh A.A., Sood R., West R.B., Montgomery K., Chi J.T., de van Rign M., Botstein D., Brown P.O. 2004. Gene expression signature of fibroblast serum response predicts human cancer progression: Similarities between tumors and wounds. PLoS Biol. 2: 206214. Chang H.Y., Nuyten D.S., Sneddon J.B., Hastie T., Tibshirani R., Sorlie T., Dai H., He Y.D., vant Veer L.J., Bartelink H.et al. 2005. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl. Acad. Sci. 102: 37383743. Dawber T.R. In The Framingham Study. . 1980. Harvard University Press, Cambridge, MA. Ein-Dor L., Kela I., Getz G., Givol D., Domany E. 2005. Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 21: 171178. Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., Coller H., Loh M.L., Downing J.R., Caligiuri M.A.et al. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286: 531537. Huang E., Ishida S., Pittman J., Dressman H., West M., Nevins J.R. 2003. Gene expression phenotypic models that predict the activity of oncogenic pathways. Nat. Genet. 34: 226230.[CrossRef][Medline] Monti S., Savage K.J., Kutok J.L., Feuerhake F., Kurtin P., Mihm M., Wu B., Pasqualucci L., Neuberg D.S., Aguiar R.C.et al. 2005. Molecular profiling of diffuse large B cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood 105: 18511861. Mootha V.K., Lindgren C.M., Eriksson K.F., Subramanian A., Sihag S., Lehar J., Puigserver P., Carlsson E., Ridderstrale M., Laurila E.et al. 2003. PGC-1 Nevins J.R., Huang E.S., Dressman H., Pittman J., Huang A.T., West M. 2003. Towards integrated clinic-genomic models for personalized medicine: Combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Hum. Mol. Genet. 12: R153R157. Perou C.M., Sorlie T., Eisen M.B., de van Rijn M., Jeffrey S.S., Rees C.A., Pollack J.R., Ross D.T., Johnsen H., Akslen L.A.et al. 2000. Molecular portraits of human breast tumors. Nature 406: 747752.[CrossRef][Medline] Pittman J., Huang E., Dressman H., Horng C.-F., Cheng S.-H., Tsou M.-H., Chen C.-M., Bild A., Iversen E.S., Huang A.T.et al. 2004. Models for individualized prediction of disease outcomes based on multiple gene expression patterns and clinical data. Proc. Natl. Acad. Sci. 101: 84318436. Ramaswamy S. and Golub T.R. 2002. DNA microarrays in clinical oncology. J. Clin. Oncol. 20: 19321941. Sarwal M., Chua M.S., Kambham N., Hsieh S.C., Satterwhite T., Masek M., Salvatierra O. 2003. Molecular heterogeneity in acute renal allograft rejection identified by DNA microarray profiling. N. Engl. J. Med. 349: 125138. Seo D.M., Wang T., Dressman H., Herderick E.E., Iversen E., Dong C., Vata K., Milano C.A., Nevins J.R., Pittman J.et al. 2004. Gene expression phenotypes of atherosclerosis. Arterioscler. Thromb. Vasc. Biol. 24: 19221927. Staudt L.M. 2003. Molecular diagnosis of the hematologic cancers. N. Engl. J. Med. 348: 17771785. Stoughton R.B. and Friend S.H. 2005. How molecular profiling could revolutionize drug discovery. Nat. Rev. Drug Discov. 4: 345350.[Medline] Sweet-Cordero A., Mukherjee S., Subramanian A., You H., Roix J.J., Ladd-Acosta C., Mesirov J., Golub T.R., Jacks T. 2005. An oncogenic KRAS2 expression signature identified by cross-species gene expression analysis. Nat. Genet. 37: 4854.[Medline] Tate S.K. and Goldstein D.B. 2004. Will tomorrows medicines work for everyone? Nat. Genet. 36: S34S42.[CrossRef][Medline] Tudor M., Akbarian S., Chen R.Z., Jaenisch R. 2002. Transcriptional profiling of a mouse model for Rett syndrome reveals subtle transcriptional changes in the brain. Proc. Natl. Acad. Sci. 99: 1553615541. vant Veer L.J., Dai H., de van Vijver M.J., He Y.D., Hart A.A., Mao M., Peterse H.L., van der Kooy K., Marton M.J., Witteveen A.T.et al. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530536.[CrossRef][Medline] Wilson P.W.F., DAgostino R.B., Levy D.B., Belanger A.M., Silbershatz H., Kannel W.B. 1998. Prediction of coronary heart disease using risk factor categories. Circulation 97: 18371847.[Medline] Yanagisawa K., Shyr Y., Xu B.J., Massion P.P., Larsen P.H., White B.C., Roberts J.R., Edgerton M., Gonzalez A., Nadaf S.et al. 2003. Proteomic patterns of tumour subsets in non-small-cell lung cancr. Lancet 362: 415416.[CrossRef][Medline]
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||