Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC
Lucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco
TL;DR
The paper demonstrates a directive-based OpenACC port of the Dutch Atmospheric Large-Eddy Simulation (DALES) model to GPUs, enabling high-resolution atmospheric simulations with minimal code disruption. It details the modelling framework, the porting strategy (data management, loop collapsing, selective refactoring), and the integration of GPU-accelerated libraries such as RRTMGP and cuFFT. Through Cloud Botany reference cases, it validates numerical consistency with CPU runs and characterizes single-node performance across NVIDIA A100 and H100 GPUs, revealing strong speedups but limited weak-scaling due to FFT-based communications. The study also explores Kernel Tuner for auto-tuning stencil kernels, reporting meaningful gains on A100 for select kernels but only modest overall improvements when scaled across the code base, and discusses future work on AMD GPUs, alternative Poisson solvers, and mixed-precision acceleration.
Abstract
This paper presents the GPU porting through OpenACC directives of the Dutch Atmospheric Large-Eddy Simulation (DALES) application, a high-resolution atmospheric model. The code is written in Fortran~90 and features parallel (distributed) execution through spatial domain decomposition. We assess the performance of the GPU offloading, comparing the time-to-solution on regular and accelerated HPC nodes. %comparing the computational time between distributed and accelerated nodes. A weak scaling analysis is conducted and portability across NVIDIA A100 and H100 hardware %and AMD hardware is discussed. Finally, we show how targeted kernels can benefit from further optimization with Kernel Tuner, a GPU kernels auto-tuning package.
