Reduced and mixed precision turbulent flow simulations using explicit finite difference schemes
Bálint Siklósi, Pushpender K. Sharma, David J. Lusher, István Z. Reguly, Neil D. Sandham
TL;DR
The paper addresses the efficiency-accuracy trade-off in compressible turbulent flow simulations by extending OPS and OpenSBLI to support mixed-precision arithmetic in explicit finite-difference schemes. Through Taylor-Green vortex benchmarks, it demonstrates that mixed precision (e.g., half-single and single-double) can yield substantial speedups with minimal loss of accuracy, while pure FP16 typically fails. The work introduces a mixed-precision algorithm and code-generation workflow, enabling per-quantity precision control for Q, R, and W arrays and showing that memory and communication overheads can be reduced in multi-CPU and multi-GPU environments. These findings offer a practical pathway to scalable high-fidelity CFD on modern HPC hardware, with plans for larger applications, adaptive precision, and further OPS/OpenSBLI enhancements.
Abstract
The use of reduced and mixed precision computing has gained increasing attention in high-performance computing (HPC) as a means to improve computational efficiency, particularly on modern hardware architectures like GPUs. In this work, we explore the application of mixed precision arithmetic in compressible turbulent flow simulations using explicit finite difference schemes. We extend the OPS and OpenSBLI frameworks to support customizable precision levels, enabling fine-grained control over precision allocation for different computational tasks. Through a series of numerical experiments on the Taylor-Green vortex benchmark, we demonstrate that mixed precision strategies, such as half-single and single-double combinations, can offer significant performance gains without compromising numerical accuracy. However, pure half-precision computations result in unacceptable accuracy loss, underscoring the need for careful precision selection. Our results show that mixed precision configurations can reduce memory usage and communication overhead, leading to notable speedups, particularly on multi-CPU and multi-GPU systems.
