Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems
Zachary Cooper-Baldock, Brenda Vara Almirall, Kiao Inthavong
TL;DR
This study tackles the practical question of how GPU acceleration affects CFD workloads in HPC environments when using ANSYS Fluent. It benchmarks two real CFD cases (external submarine flow and internal airway flow) across multiple CPU architectures and NVIDIA GPUs (V100 and A100), evaluating total simulation time, mean iteration time, initialization, power consumption, and SU cost. The results show substantial speedups from GPU acceleration, with improvements exceeding 80%–95% in many scenarios, but initialisation, power, and cost outcomes depend strongly on architecture and queue pricing; in particular, Sapphire Rapids can incur higher SU despite faster runtimes. The work provides actionable guidance for researchers and HPC admins on hardware selection, correct GPU submission flags, and energy-aware budgeting for GPU-accelerated CFD workflows, enabling more efficient planning and deployment of CFD simulations on modern HPC systems.
Abstract
Computational Fluid Dynamics (CFD) is the simulation of fluid flow undertaken with the use of computational hardware. The underlying equations are computationally challenging to solve and necessitate high performance computing (HPC) to resolve in a practical timeframe when a reasonable level of fidelity is required. The simulations are memory intensive, having previously been limited to central processing unit (CPU) solvers, as graphics processing unit (GPU) video random access memory (VRAM) was insufficient. However, with recent developments in GPU design and increases to VRAM, GPU acceleration of CPU solved workflows is now possible. At HPC scale however, many operational details are still unknown. This paper utilizes ANSYS Fluent, a leading commercial code in CFD, to investigate the compute speed, power consumption and service unit (SU) cost considerations for the GPU acceleration of CFD workflows on HPC architectures. To provide a comprehensive analysis, different CPU architectures, and GPUs have been assessed. It is seen that GPU compute speed is faster, however, the initialisation speed, power and cost performance is less clear cut. Whilst the larger A100 cards perform well with respect to power consumption, this is not observed for the V100 cards. In situations where more than one GPU is required, their adoption may not be beneficial from a power or cost perspective.
