

Graduate School of Information Science and Engineering,
Tokyo Institute of Technology
(From left, above)Yutaka Akiyama, Yuri Matsuzaki,
Nobuyuki Uchikoga and Masahito Ohue

We are working on prediction of the protein-protein interaction (PPI), one of the important problems of systems biology, by the method of bioinformatics and parallel processing (Figure 1). Usually, the expected role of the computational method using physical chemistry on PPI analysis was to examine the configuration and affinity of interactions concerning the known one-to-one protein-protein interaction in detail. We then developed “MEGADOCK,” a novel PPI prediction system based on large-scale parallel calculation, and made it possible to predict a candidate pair of PPI exhaustively from a large amount of protein groups. This is expected to contribute to the discovery of new PPI by collaboration with experiments in the future.

MEGADOCK is a system to predict the presence or absence of interaction using information about the tertiary structure of proteins based on various scores obtained from rigid-body docking. In this calculation, a high-speed evaluation is conducted mainly based on the shape complementarity of the molecular surface without considering the structural change of the protein. We introduced the rPSC (real Pairwise Shape Complementarity) score composed of the terms of shape complementarity and electrostatic interaction assigned to the molecular structures on the voxel space. With a conventional tool, ZDOCK, score is calculated using 3 interactions with 3 complex numbers, whereas in the rPSC, score is calculated using 2 interactions with 1 complex number by expressing the shape complementarity in a real number part and introducing electrostatic interactions into an imaginary number part. The number of three-dimensional fast Fourier transformation (FFT) required for convlolution sum calculation was reduced. When executed by a single CPU, about four times higher calculation speed was achieved with the same precision as ZDOCK.
MEGADOCK is parallelized using the MPI library. When a certain processor was assigned multiple receptor and ligand proteins, one ligand is taken sequentially from the ligand set, transformed by FFT with the certain angle increment, and compared as the innermost loop with all the data in the receptors set. A procedure of making the FFT transformed library concerning known proteins, and read from a hard disk to perform convolution sum calculations was implemented, and a speed increased of up to about 3 times was achieved. With appropriate load balancing, efficient calculation is possible by hundreds of processors or more.
As a benchmarking of MEGADOCK, we firstly performed PPI prediction of 44×44=1,936 combinations on 44 protein complexes. The predicted (red) and native (green) structures were consistent (upper part of Figure 2). In the prediction of PPI pairs, many correct complexes were predicted as shown by the warm color on the diagonal line in the lower part of Figure 2, and a prediction performance (F-measure = 0.415) similar to or higher than in a related study was obtained. As an actual application in systems biology, PPI prediction was performed on the signal transduction pathway of bacterial chemotaxis (89×89=7,921) and the human EGFR signal transduction pathway related to lung cancer (497×497=247,009). Our goal is to perform the calculation of a 1,000×1,000 (mega) class routinely.

References
[1] Matsuzaki Y., Matsuzaki Y., Sato T. and Akiyama Y., J Bioinform Comput Biol , 7: 991-1012 (2009).
[2] Ohue, Matsuzaki, Matsuzaki, Sato and Akiyama. IPSJ Transactions on Mathematical Modeling and its Applications (TOM), 3(3): 91-106 (2010).
BioSupercomputing Newsletter Vol.3