Table of Contents
Fetching ...

Adapting Atmospheric Chemistry Components for Efficient GPU Accelerators

Christian Guzman Ruiz, Matthew Dawson, Mario C. Acosta, Oriol Jorba, Eduardo Cesar Galobardes, Carlos Pérez García-Pando, Kim Serradell

TL;DR

The paper tackles the high computational cost of chemical kinetics in atmospheric models by integrating CAMP into MONARCH and introducing GPU-oriented strategies. It presents a Multi-cells approach that groups data from many grid cells to solve multiple chemical mechanisms in parallel, achieving up to ~9x speedups, and a CUDA-based Derivative implementation with memory-access optimizations that yields up to ~1.2x compute speedup and ~1.7x overall gains. The evaluation on a Volta GPU with a simple 3-species mechanism shows data movement between CPU and GPU as a major bottleneck, motivating future work to migrate additional solver components to GPU and to overlap CPU/GPU tasks for better scalability. The work demonstrates a practical path to accelerating chemical weather modeling, enabling larger or more detailed simulations with available HPC resources.

Abstract

Atmospheric models demand a lot of computational power and solving the chemical processes is one of its most computationally intensive components. This work shows how to improve the computational performance of the Multiscale Online Nonhydrostatic AtmospheRe CHemistry model (MONARCH), a chemical weather prediction system developed by the Barcelona Supercomputing Center. The model implements the new flexible external package Chemistry Across Multiple Phases (CAMP) for the solving of gas- and aerosol-phase chemical processes, that allows multiple chemical processes to be solved simultaneously as a single system. We introduce a novel strategy to simultaneously solve multiple instances of a chemical mechanism, represented in the model as grid-cells, obtaining a speedup up to 9x using thousands of cells. In addition, we present a GPU strategy for the most time-consuming function of CAMP. The GPU version achieves up to 1.2x speedup compared to CPU. Also, we optimize the memory access in the GPU to increase its speedup up to 1.7x.

Adapting Atmospheric Chemistry Components for Efficient GPU Accelerators

TL;DR

The paper tackles the high computational cost of chemical kinetics in atmospheric models by integrating CAMP into MONARCH and introducing GPU-oriented strategies. It presents a Multi-cells approach that groups data from many grid cells to solve multiple chemical mechanisms in parallel, achieving up to ~9x speedups, and a CUDA-based Derivative implementation with memory-access optimizations that yields up to ~1.2x compute speedup and ~1.7x overall gains. The evaluation on a Volta GPU with a simple 3-species mechanism shows data movement between CPU and GPU as a major bottleneck, motivating future work to migrate additional solver components to GPU and to overlap CPU/GPU tasks for better scalability. The work demonstrates a practical path to accelerating chemical weather modeling, enabling larger or more detailed simulations with available HPC resources.

Abstract

Atmospheric models demand a lot of computational power and solving the chemical processes is one of its most computationally intensive components. This work shows how to improve the computational performance of the Multiscale Online Nonhydrostatic AtmospheRe CHemistry model (MONARCH), a chemical weather prediction system developed by the Barcelona Supercomputing Center. The model implements the new flexible external package Chemistry Across Multiple Phases (CAMP) for the solving of gas- and aerosol-phase chemical processes, that allows multiple chemical processes to be solved simultaneously as a single system. We introduce a novel strategy to simultaneously solve multiple instances of a chemical mechanism, represented in the model as grid-cells, obtaining a speedup up to 9x using thousands of cells. In addition, we present a GPU strategy for the most time-consuming function of CAMP. The GPU version achieves up to 1.2x speedup compared to CPU. Also, we optimize the memory access in the GPU to increase its speedup up to 1.7x.
Paper Structure (6 sections, 4 equations, 4 figures)

This paper contains 6 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: MONARCH overall flow diagram with CAMP as chemistry solver.
  • Figure 2: On the left (figure $a)$): comparison of original and Multi-cells overall workflows from the MONARCH point of view. On the right (figure $b)$): Derivative workflow diagram for GPU execution.
  • Figure 3: Data structure inversion for GPU Derivative. “Value” numbers represent the GPU memory arrangement and access order, “j” the number of reactions and “p” the number of parameters.
  • Figure 4: On the left (figure $a)$): CAMP speedup using Multi-cells optimization in front of the original One-cell version. On the right (figure $b)$): Speedup of base and final single-GPU versions compared to single-thread CPU versions. Final version applies the optimization on GPU memory access into the base version.