A GPU-based Compressible Combustion Solver for Applications Exhibiting Disparate Space and Time Scales
Anthony Carreon, Jagmohan Singh, Shivank Sharma, Shuzhi Zhang, Venkat Raman
TL;DR
This work tackles the computational difficulty of simulating high-speed, stiffly reacting flows by developing a GPU-optimized compressible combustion solver built on AMReX. The authors introduce a bulk-sparse chemical kinetics integration strategy and GPU-focused optimizations, including memory-aware layouts and dynamic tiling, implemented within a multigrid AMR context and executed across multiple NVIDIA H100 GPUs. Key results show $2-5 \times$ speedups over the baseline, near-ideal weak scaling from $1$ to $96$ GPUs, and substantial arithmetic-intensity gains in both convection and chemistry workloads as confirmed by roofline analyses. The work demonstrates that combining algorithmic restructuring with GPU-centric tuning within AMReX enables scalable, high-fidelity simulations of multiscale reacting flows, with clear pathways for further enhancements such as adaptive strategies and ML-assisted performance modeling.
Abstract
High-speed chemically active flows present significant computational challenges due to their disparate space and time scales, where stiff chemistry often dominates simulation time. While modern supercomputing scientific codes achieve exascale performance by leveraging graphics processing units (GPUs), existing GPU-based compressible combustion solvers face critical limitations in memory management, load balancing, and handling the highly localized nature of chemical reactions. To this end, we present a high-performance compressible reacting flow solver built on the AMReX framework and optimized for multi-GPU settings. Our approach addresses three GPU performance bottlenecks: memory access patterns through column-major storage optimization, computational workload variability via a bulk-sparse integration strategy for chemical kinetics, and multi-GPU load distribution for adaptive mesh refinement applications. The solver adapts existing matrix-based chemical kinetics formulations to multigrid contexts. Using representative combustion applications including hydrogen-air detonations and jet in supersonic crossflow configurations, we demonstrate $2-5\times$ performance improvements over initial GPU implementations with near-ideal weak scaling across $1-96$ NVIDIA H100 GPUs. Roofline analysis reveals substantial improvements in arithmetic intensity for both convection ($\sim 10 \times$) and chemistry ($\sim 4 \times$) routines, confirming efficient utilization of GPU memory bandwidth and computational resources.
