Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Chayanon; Wichitrnithed; Woo-Sun-Yang; Yun; He; Brad Richardson; Koichi Sakaguchi; Manuel Arenaz; William I. Gustafson; Jacob Shpund; Ulises Costi Blanco; Alvaro Goldar Dieste

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Chayanon, Wichitrnithed, Woo-Sun-Yang, Yun, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson, Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste

TL;DR

This work addresses accelerating a hotspot in the Weather Research and Forecasting model by porting parts of the 33-bin FSBM microphysics routine to NVIDIA GPUs using OpenMP device offloading. A workflow combining runtime profiling with the Codee static analysis tool guides a sequence of refactorings that remove data dependencies and enable deeper loop collapses, yielding substantial speedups. The study reports up to 2.08x overall improvement on a CONUS-12km test case and discusses memory-bound constraints and occupancy considerations, offering a practical blueprint for GPU acceleration of legacy weather codes. The findings underscore the value of integrating static modernization tools with runtime profiling to accelerate and validate large HPC codes, while outlining directions for extending GPU offloading to other microphysics components.

Abstract

Currently, the Weather Research and Forecasting model (WRF) utilizes shared memory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of GPU resources on the Perlmutter supercomputer at NERSC, we port parts of the computationally expensive routines of the Fast Spectral Bin Microphysics (FSBM) microphysical scheme to NVIDIA GPUs using OpenMP device offloading directives. To facilitate this process, we explore a workflow for optimization which uses both runtime profilers and a static code inspection tool Codee to refactor the subroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm test case.

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 4 figures, 7 tables)

This paper contains 15 sections, 1 equation, 4 figures, 7 tables.

Introduction
Related work
The FSBM routine
Experimental Setup
Approach
Codee
Offloading with OpenMP
Implementation
Lookup optimization and Codee
OpenMP offloading
Further optimization
Further evaluation
Using multiple MPI ranks per GPU
Output verification
Discussion and Conclusion

Figures (4)

Figure 1: WRF decomposition layer. Diagram from MichalakesUnknown-db.
Figure 2: Comparison of bulk and bin microphysics schemes. Image from Morrison2020-zn.
Figure 3: The solid lines form rooflines, with the top horizontal line for single precision and the bottom one for double precision. The green and brown circles at the bottom are the observed values with single and double precisions, respectively, when collapsing the two outermost loops. The pair of points above are when collapsing three loops.
Figure 4: Total elapsed time for different versions of the code. For the GPU version, the number of GPUs is fixed to 16. In the rightmost group, the CPU codes run on 256 cores while the GPU code runs on 40 cores and 8 GPUs.

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

TL;DR

Abstract

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Authors

TL;DR

Abstract

Table of Contents

Figures (4)