Can supercomputer hack cancer systems?
Prof. Satoru Miyano
Data Analysis fusion Team, Integrated Simulation of Living Matter Group
Computational Science Research Program, RIKEN
For unraveling the systems behind cancer and related biological mechanisms, we developed a systems biology strategy based on computation driven biological experiments that crucially requires a supercomputer system. A biological system we analyzed with our strategy is the epidermal growth factor (EGF) signaling gene network. For system modeling, we devised a "state space model (SSM)" method for reverse-engineering simulatable gene networks from time-course gene expression data, with which we can overview the whole networks and see their dynamic changes over time by simulation. For analyzing the EGF signaling gene network with this computational method, we designed a biological experiment and produced 19 time point gene expression data in 48 hours of normal lung epithelial cells under four conditions of with/without EGF/gefitinib (an inhibitor for the epidermal growth factor receptor tyrosine kinase (EGF RTK). We computed SSM gene networks consisting of 1,500 genes with this data set. Utilizing predictive ability of SSM, we can discriminate the following two situations behind differentially expressed genes in time-course: 1) genes that are differentially expressed from the different regulatory systems for the case and control, and 2) genes that are differentially expressed from the same regulatory system but with different states of regulators. By simulating these networks, we found candidates of genes under differential regulations between the case and control, which may be regarded as the genes that have the influence of EGF RTK to the EGF-signaling gene network. Furthermore, a classifier with these genes was designed and trained for survival prediction with the cohort data of the multi-institutional consortium project (Nature Medicine 14, 822-827 (2008)). By simply adjusting the mean of gene expression values, the classifier was tested on completely independent expression profiles of lung cancer. It accurately predicted survival of patients of stage I lung cancer for both cases. In parallel, we have been developing a technology called "data assimilation for biological systems" that blends simulation models and observational data rationally. This technology is a computational and statistical strategy with which we can estimate personalized models from general biological models by using individual measurement data. As a general simulation model for data assimilation, we developed a gene regulatory and signaling pathway model related to the EGF receptor pathway that comprises of about 280 biological entities and 500 biological processes. The software Cell Illustrator is used for this modeling. We present some intermediate data assimilation results using the above time-course mRNA data. For uncovering cancer heterogeneity, we focus on epithelial-mesenchymal transition (EMT) as a case. EMT is a key developmental remodeling program, where cells alternate between epithelial-like static and mesenchymal-like migratory phenotypes. Although EMT is now thought to contribute to increasing tumor grade and drug resistance, the regulatory mechanisms responsible for EMT are largely unknown. We developed a computational method called "network profiler" for discovering associations between the differences in molecular mechanisms and the diversity of phenotype traits. This method was applied to gene expression data of lung cancer cell lines and and unraveled global differences of networks with 20,000 genes of different EMT expression levels. The analysis suggested some key genes which are involved in EMT.