Table of Contents
Fetching ...

Modernizing an Operational Real-time Tsunami Simulator to Support Diverse Hardware Platforms

Keichi Takahashi, Takashi Abe, Akihiro Musa, Yoshihiko Sato, Yoichi Shimomura, Hiroyuki Takizawa, Shunichi Koshimura

TL;DR

This work tackles the limited hardware reach of a production real-time tsunami forecast system by migrating the RTi codebase from vector supercomputers to modern CPUs and GPUs using a minimally invasive, directive-based strategy. By preserving the original loop structure and applying targeted OpenACC/OpenMP directives, CUDA-aware MPI, GPUDirect RDMA, and careful load-balancing, the authors achieve performance portability across diverse platforms. Key contributions include optimization of communication, asynchronous kernel launches, and a data-driven domain-decomposition tuning that significantly improves load balance and reduces per-rank runtime. The results demonstrate six-hour simulations with over 47 million cells completing in as little as 1.5–2.5 minutes on contemporary hardware, enabling broader, real-time access to accurate tsunami inundation forecasts.

Abstract

To issue early warnings and rapidly initiate disaster responses after tsunami damage, various tsunami inundation forecast systems have been deployed worldwide. Japan's Cabinet Office operates a forecast system that utilizes supercomputers to perform tsunami propagation and inundation simulation in real time. Although this real-time approach is able to produce significantly more accurate forecasts than the conventional database-driven approach, its wider adoption was hindered because it was specifically developed for vector supercomputers. In this paper, we migrate the simulation code to modern CPUs and GPUs in a minimally invasive manner to reduce the testing and maintenance costs. A directive-based approach is employed to retain the structure of the original code while achieving performance portability, and hardware-specific optimizations including load balance improvement for GPUs are applied. The migrated code runs efficiently on recent CPUs, GPUs and vector processors: a six-hour tsunami simulation using over 47 million cells completes in less than 2.5 minutes on 32 Intel Sapphire Rapids CPUs and 1.5 minutes on 32 NVIDIA H100 GPUs. These results demonstrate that the code enables broader access to accurate tsunami inundation forecasts.

Modernizing an Operational Real-time Tsunami Simulator to Support Diverse Hardware Platforms

TL;DR

This work tackles the limited hardware reach of a production real-time tsunami forecast system by migrating the RTi codebase from vector supercomputers to modern CPUs and GPUs using a minimally invasive, directive-based strategy. By preserving the original loop structure and applying targeted OpenACC/OpenMP directives, CUDA-aware MPI, GPUDirect RDMA, and careful load-balancing, the authors achieve performance portability across diverse platforms. Key contributions include optimization of communication, asynchronous kernel launches, and a data-driven domain-decomposition tuning that significantly improves load balance and reduces per-rank runtime. The results demonstrate six-hour simulations with over 47 million cells completing in as little as 1.5–2.5 minutes on contemporary hardware, enabling broader, real-time access to accurate tsunami inundation forecasts.

Abstract

To issue early warnings and rapidly initiate disaster responses after tsunami damage, various tsunami inundation forecast systems have been deployed worldwide. Japan's Cabinet Office operates a forecast system that utilizes supercomputers to perform tsunami propagation and inundation simulation in real time. Although this real-time approach is able to produce significantly more accurate forecasts than the conventional database-driven approach, its wider adoption was hindered because it was specifically developed for vector supercomputers. In this paper, we migrate the simulation code to modern CPUs and GPUs in a minimally invasive manner to reduce the testing and maintenance costs. A directive-based approach is employed to retain the structure of the original code while achieving performance portability, and hardware-specific optimizations including load balance improvement for GPUs are applied. The migrated code runs efficiently on recent CPUs, GPUs and vector processors: a six-hour tsunami simulation using over 47 million cells completes in less than 2.5 minutes on 32 Intel Sapphire Rapids CPUs and 1.5 minutes on 32 NVIDIA H100 GPUs. These results demonstrate that the code enables broader access to accurate tsunami inundation forecasts.
Paper Structure (21 sections, 4 equations, 16 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 4 equations, 16 figures, 2 tables, 1 algorithm.

Figures (16)

  • Figure 1: An example of three nested grid levels in the coast of Kochi Prefecture.
  • Figure 2: Overview of subroutines in the time integration loop.
  • Figure 3: Breakdown of runtime before adjusting the work imbalance.
  • Figure 4: Domain decomposition before optimization.
  • Figure 5: Runtime of NLMNT2 routine with respect to the number of cells of a block.
  • ...and 11 more figures