Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee
Chayanon, Wichitrnithed, Woo-Sun-Yang, Yun, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson, Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste
TL;DR
This work addresses accelerating a hotspot in the Weather Research and Forecasting model by porting parts of the 33-bin FSBM microphysics routine to NVIDIA GPUs using OpenMP device offloading. A workflow combining runtime profiling with the Codee static analysis tool guides a sequence of refactorings that remove data dependencies and enable deeper loop collapses, yielding substantial speedups. The study reports up to 2.08x overall improvement on a CONUS-12km test case and discusses memory-bound constraints and occupancy considerations, offering a practical blueprint for GPU acceleration of legacy weather codes. The findings underscore the value of integrating static modernization tools with runtime profiling to accelerate and validate large HPC codes, while outlining directions for extending GPU offloading to other microphysics components.
Abstract
Currently, the Weather Research and Forecasting model (WRF) utilizes shared memory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of GPU resources on the Perlmutter supercomputer at NERSC, we port parts of the computationally expensive routines of the Fast Spectral Bin Microphysics (FSBM) microphysical scheme to NVIDIA GPUs using OpenMP device offloading directives. To facilitate this process, we explore a workflow for optimization which uses both runtime profilers and a static code inspection tool Codee to refactor the subroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm test case.
