Modernizing an Operational Real-time Tsunami Simulator to Support Diverse Hardware Platforms
Keichi Takahashi, Takashi Abe, Akihiro Musa, Yoshihiko Sato, Yoichi Shimomura, Hiroyuki Takizawa, Shunichi Koshimura
TL;DR
This work tackles the limited hardware reach of a production real-time tsunami forecast system by migrating the RTi codebase from vector supercomputers to modern CPUs and GPUs using a minimally invasive, directive-based strategy. By preserving the original loop structure and applying targeted OpenACC/OpenMP directives, CUDA-aware MPI, GPUDirect RDMA, and careful load-balancing, the authors achieve performance portability across diverse platforms. Key contributions include optimization of communication, asynchronous kernel launches, and a data-driven domain-decomposition tuning that significantly improves load balance and reduces per-rank runtime. The results demonstrate six-hour simulations with over 47 million cells completing in as little as 1.5–2.5 minutes on contemporary hardware, enabling broader, real-time access to accurate tsunami inundation forecasts.
Abstract
To issue early warnings and rapidly initiate disaster responses after tsunami damage, various tsunami inundation forecast systems have been deployed worldwide. Japan's Cabinet Office operates a forecast system that utilizes supercomputers to perform tsunami propagation and inundation simulation in real time. Although this real-time approach is able to produce significantly more accurate forecasts than the conventional database-driven approach, its wider adoption was hindered because it was specifically developed for vector supercomputers. In this paper, we migrate the simulation code to modern CPUs and GPUs in a minimally invasive manner to reduce the testing and maintenance costs. A directive-based approach is employed to retain the structure of the original code while achieving performance portability, and hardware-specific optimizations including load balance improvement for GPUs are applied. The migrated code runs efficiently on recent CPUs, GPUs and vector processors: a six-hour tsunami simulation using over 47 million cells completes in less than 2.5 minutes on 32 Intel Sapphire Rapids CPUs and 1.5 minutes on 32 NVIDIA H100 GPUs. These results demonstrate that the code enables broader access to accurate tsunami inundation forecasts.
