Adapting Atmospheric Chemistry Components for Efficient GPU Accelerators
Christian Guzman Ruiz, Matthew Dawson, Mario C. Acosta, Oriol Jorba, Eduardo Cesar Galobardes, Carlos Pérez García-Pando, Kim Serradell
TL;DR
The paper tackles the high computational cost of chemical kinetics in atmospheric models by integrating CAMP into MONARCH and introducing GPU-oriented strategies. It presents a Multi-cells approach that groups data from many grid cells to solve multiple chemical mechanisms in parallel, achieving up to ~9x speedups, and a CUDA-based Derivative implementation with memory-access optimizations that yields up to ~1.2x compute speedup and ~1.7x overall gains. The evaluation on a Volta GPU with a simple 3-species mechanism shows data movement between CPU and GPU as a major bottleneck, motivating future work to migrate additional solver components to GPU and to overlap CPU/GPU tasks for better scalability. The work demonstrates a practical path to accelerating chemical weather modeling, enabling larger or more detailed simulations with available HPC resources.
Abstract
Atmospheric models demand a lot of computational power and solving the chemical processes is one of its most computationally intensive components. This work shows how to improve the computational performance of the Multiscale Online Nonhydrostatic AtmospheRe CHemistry model (MONARCH), a chemical weather prediction system developed by the Barcelona Supercomputing Center. The model implements the new flexible external package Chemistry Across Multiple Phases (CAMP) for the solving of gas- and aerosol-phase chemical processes, that allows multiple chemical processes to be solved simultaneously as a single system. We introduce a novel strategy to simultaneously solve multiple instances of a chemical mechanism, represented in the model as grid-cells, obtaining a speedup up to 9x using thousands of cells. In addition, we present a GPU strategy for the most time-consuming function of CAMP. The GPU version achieves up to 1.2x speedup compared to CPU. Also, we optimize the memory access in the GPU to increase its speedup up to 1.7x.
