Table of Contents
Fetching ...

A GPU-based Compressible Combustion Solver for Applications Exhibiting Disparate Space and Time Scales

Anthony Carreon, Jagmohan Singh, Shivank Sharma, Shuzhi Zhang, Venkat Raman

TL;DR

This work tackles the computational difficulty of simulating high-speed, stiffly reacting flows by developing a GPU-optimized compressible combustion solver built on AMReX. The authors introduce a bulk-sparse chemical kinetics integration strategy and GPU-focused optimizations, including memory-aware layouts and dynamic tiling, implemented within a multigrid AMR context and executed across multiple NVIDIA H100 GPUs. Key results show $2-5 \times$ speedups over the baseline, near-ideal weak scaling from $1$ to $96$ GPUs, and substantial arithmetic-intensity gains in both convection and chemistry workloads as confirmed by roofline analyses. The work demonstrates that combining algorithmic restructuring with GPU-centric tuning within AMReX enables scalable, high-fidelity simulations of multiscale reacting flows, with clear pathways for further enhancements such as adaptive strategies and ML-assisted performance modeling.

Abstract

High-speed chemically active flows present significant computational challenges due to their disparate space and time scales, where stiff chemistry often dominates simulation time. While modern supercomputing scientific codes achieve exascale performance by leveraging graphics processing units (GPUs), existing GPU-based compressible combustion solvers face critical limitations in memory management, load balancing, and handling the highly localized nature of chemical reactions. To this end, we present a high-performance compressible reacting flow solver built on the AMReX framework and optimized for multi-GPU settings. Our approach addresses three GPU performance bottlenecks: memory access patterns through column-major storage optimization, computational workload variability via a bulk-sparse integration strategy for chemical kinetics, and multi-GPU load distribution for adaptive mesh refinement applications. The solver adapts existing matrix-based chemical kinetics formulations to multigrid contexts. Using representative combustion applications including hydrogen-air detonations and jet in supersonic crossflow configurations, we demonstrate $2-5\times$ performance improvements over initial GPU implementations with near-ideal weak scaling across $1-96$ NVIDIA H100 GPUs. Roofline analysis reveals substantial improvements in arithmetic intensity for both convection ($\sim 10 \times$) and chemistry ($\sim 4 \times$) routines, confirming efficient utilization of GPU memory bandwidth and computational resources.

A GPU-based Compressible Combustion Solver for Applications Exhibiting Disparate Space and Time Scales

TL;DR

This work tackles the computational difficulty of simulating high-speed, stiffly reacting flows by developing a GPU-optimized compressible combustion solver built on AMReX. The authors introduce a bulk-sparse chemical kinetics integration strategy and GPU-focused optimizations, including memory-aware layouts and dynamic tiling, implemented within a multigrid AMR context and executed across multiple NVIDIA H100 GPUs. Key results show speedups over the baseline, near-ideal weak scaling from to GPUs, and substantial arithmetic-intensity gains in both convection and chemistry workloads as confirmed by roofline analyses. The work demonstrates that combining algorithmic restructuring with GPU-centric tuning within AMReX enables scalable, high-fidelity simulations of multiscale reacting flows, with clear pathways for further enhancements such as adaptive strategies and ML-assisted performance modeling.

Abstract

High-speed chemically active flows present significant computational challenges due to their disparate space and time scales, where stiff chemistry often dominates simulation time. While modern supercomputing scientific codes achieve exascale performance by leveraging graphics processing units (GPUs), existing GPU-based compressible combustion solvers face critical limitations in memory management, load balancing, and handling the highly localized nature of chemical reactions. To this end, we present a high-performance compressible reacting flow solver built on the AMReX framework and optimized for multi-GPU settings. Our approach addresses three GPU performance bottlenecks: memory access patterns through column-major storage optimization, computational workload variability via a bulk-sparse integration strategy for chemical kinetics, and multi-GPU load distribution for adaptive mesh refinement applications. The solver adapts existing matrix-based chemical kinetics formulations to multigrid contexts. Using representative combustion applications including hydrogen-air detonations and jet in supersonic crossflow configurations, we demonstrate performance improvements over initial GPU implementations with near-ideal weak scaling across NVIDIA H100 GPUs. Roofline analysis reveals substantial improvements in arithmetic intensity for both convection () and chemistry () routines, confirming efficient utilization of GPU memory bandwidth and computational resources.

Paper Structure

This paper contains 29 sections, 8 equations, 12 figures, 2 tables, 3 algorithms.

Figures (12)

  • Figure 1: AMReX's GPU strategy.
  • Figure 2: Schematic of the different parallelization strategies implemented in (A) the baseline kinetics code and (B) the optimized kinetics code. Each boxed step in the pseudocodes represents a different GPU kernel launched with the specified number of threads. The different line colors and styles overlayed on the matrix of grid data represent the data elements that the GPU threads of a corresponding GPU kernel write to. One line equals one GPU thread. In pseudocode (B), a single GPU kernel is launched with a thread count equal to the cell count across all grids.
  • Figure 3: Schematic of the initial and boundary conditions for the 2D detonation cases. (A) Cold detonation in a hydrogen-air mixture using a reduced 14-species mechanism by mevel2009hydrogen. (B) Hot detonation in a hydrogen-air mixture using a 30-species mechanism by smith2020foundational. All walls are non-slip and adiabatic. The driver is a high-temperature, high-pressure gas that propagates into a stagnant hydrogen-air mixture. Perturbation zones reside slightly downstream of the driver to accelerate ignition and facilitate the formation of cellular detonation structures.
  • Figure 4: Relative runtime breakdown across different code regions at 16 GPUs for three combustion applications: 2D detonation with 14 species mechanism (panels A/D), 2D detonation with 30 species mechanism (panels B/E), and JISCF (panels C/F). Panels A, B, and C show the baseline version results, while panels D, E, and F show the optimized version results.
  • Figure 5: Absolute runtime breakdown across different code regions at 16 GPUs for three combustion applications: 2D detonation with 14 species mechanism (red bars), 2D detonation with 30 species mechanism (green bars), and JISCF (blue bars).
  • ...and 7 more figures