The Institute of Statistical Mathematics
Tomoyuki Higuchi (Data Analysis Fusion WG)
Through the East Japan Great Earthquake (the big earthquake that hit Eastern Japan this year), we researchers recognized that it is difficult to understand and provide protection against or control complicated systems, and that we continuously need to bring together our wisdom to solve problems. When we try to understand and control complicated systems such as the earth as well as living things, it is effective to evaluate and correct the progress of research by using our ability to predict phenomena on the assumption that information about an object is always incomplete. This approach has been demonstrated in statistics, and it has always contributed greatly to human prosperity on earth.
Prediction ability is an integrated index of effectiveness, which consists of two major functions. One of them is the descriptive power of a forward computing model, and the other is the cognitive power that catches the present state of the object (current state). To put it simply, forward computing represents a repeat assignment operation. For example, it is like a computing method where, when a value is put into the right side of an equation, the resulting value comes out of the left side, then the resulting value is also put into the right side, and the resulting value of the next step comes out of the left side. Many simulation computations explicitly solving time development adopt this method, and long-term prediction is achieved by repeating this forward computing. On the other hand, the latter is related directly to innovation of measurement methods. New instruments can provide larger amounts of more precise information than ever before by innovations with epoch-making measurement methods, and they are a great attraction to researchers in any field, especially life sciences, where new instruments have driven development.
However, leaning toward research and development of measurement devices is not a good strategy from the viewpoint of improving prediction ability. This is not only because directly measuring the whole object has limitations in theory, but it is extremely effective to strengthen the descriptive power of the forward computing model for increasing prediction ability. In the research fields that have a long history of simulation, such as earth and space sciences and solid-state physics, governing equations that are the basis of forward computing have usually been established, and it is important for success to implement calculations approximately based on the governing equations on a super computer. Improvement of this approximate calculation is equivalent to improvement of the forward computing model. One of the major goals of BioSupercomputing is to dramatically improve the descriptive power of this forward computation model by taking full advantage of the scale of computing hardware. Unfortunately, it is not an exaggeration to say that there is no principle corresponding to the governing equation in life sciences, so forward computing models themselves must be based on a wide variety of ideas and become less general.
Then, is the systematic improvement of forward computing model d ifficult in life sciences? As mentioned above, the ability to predict phenomena can be improved by enhancing both the measurement method and the forward computing model. Therefore, it looks more natural to arrange the forward computing model so that prediction ability may improve than to modify the model in accordance with its own evaluation criteria. This means that learning functions fed back from measurement data are added to the forward computing model. In fact, in the weather and oceanography fields that are state-of-the-art areas of simulation research, it is usual that weather forecasting services improve prediction performance by integrating large quantities of space data collected hourly from all over the world and the largest world scale simulated calculation result on a super computer using Bayesian statistics, and then improving simulation models in real time. Also, it has been pointed out that the simulator SPEEDI, which estimates the effects of atomic radiation and became a hot topic recently, was not able to demonstrate its power sufficiently partly because it had no function to reflect real observation data in the simulator in real time.
The integration of observation data and the result of the model is referred to as data assimilation, and it has recently been attracting attention in the field of simulation science. If the idea of data assimilation is applied to the simulation of living things, it at least will lead to a steady improvement of prediction ability, and consequently, help understanding and control of the complex system. With this fervent desire, we have worked on research and development of data assimilation technology for simulation of living things as members of the Data Analysis Fusion Team every day. Current living thing simulation models are like ready-made clothes when compared to clothes. Even if there is a variation, there might be a difference in size like S, M, and L at most. On the other hand, each human body system is different. We are looking forward to the day when a custom-made or even semicustom-made living matter simulator, which is suitable for a patient, can automatically be built from medical information about patients who suffer from side-effects of medicine or a treatment method.
Figure 1 : Conceptual diagram that shows how to use the application under development, LiSDAS (Life Science Data Assimilation Systems). LiSDAS took its name from the well-known Automated Meteorological Data Acquisition System, AMeDAS. Data acquired from an experiment or a measurement site (upper left part) and an existing set of models (upper right part) are combined to perform the data assimilation. Data assimilation allows for evaluation and rebuilding of models at the same time. The calculation result and the measured data in the assimilated simulator are shown in the lower part. The result of the data assimilation is utilized to build a new hypothesis, and to design the following experiment. LiSDAS is such a calculation platform which achieves a series of intelligence cycles.
BioSupercomputing Newsletter Vol.5