Table of Contents
Fetching ...

Reduced and mixed precision turbulent flow simulations using explicit finite difference schemes

Bálint Siklósi, Pushpender K. Sharma, David J. Lusher, István Z. Reguly, Neil D. Sandham

TL;DR

The paper addresses the efficiency-accuracy trade-off in compressible turbulent flow simulations by extending OPS and OpenSBLI to support mixed-precision arithmetic in explicit finite-difference schemes. Through Taylor-Green vortex benchmarks, it demonstrates that mixed precision (e.g., half-single and single-double) can yield substantial speedups with minimal loss of accuracy, while pure FP16 typically fails. The work introduces a mixed-precision algorithm and code-generation workflow, enabling per-quantity precision control for Q, R, and W arrays and showing that memory and communication overheads can be reduced in multi-CPU and multi-GPU environments. These findings offer a practical pathway to scalable high-fidelity CFD on modern HPC hardware, with plans for larger applications, adaptive precision, and further OPS/OpenSBLI enhancements.

Abstract

The use of reduced and mixed precision computing has gained increasing attention in high-performance computing (HPC) as a means to improve computational efficiency, particularly on modern hardware architectures like GPUs. In this work, we explore the application of mixed precision arithmetic in compressible turbulent flow simulations using explicit finite difference schemes. We extend the OPS and OpenSBLI frameworks to support customizable precision levels, enabling fine-grained control over precision allocation for different computational tasks. Through a series of numerical experiments on the Taylor-Green vortex benchmark, we demonstrate that mixed precision strategies, such as half-single and single-double combinations, can offer significant performance gains without compromising numerical accuracy. However, pure half-precision computations result in unacceptable accuracy loss, underscoring the need for careful precision selection. Our results show that mixed precision configurations can reduce memory usage and communication overhead, leading to notable speedups, particularly on multi-CPU and multi-GPU systems.

Reduced and mixed precision turbulent flow simulations using explicit finite difference schemes

TL;DR

The paper addresses the efficiency-accuracy trade-off in compressible turbulent flow simulations by extending OPS and OpenSBLI to support mixed-precision arithmetic in explicit finite-difference schemes. Through Taylor-Green vortex benchmarks, it demonstrates that mixed precision (e.g., half-single and single-double) can yield substantial speedups with minimal loss of accuracy, while pure FP16 typically fails. The work introduces a mixed-precision algorithm and code-generation workflow, enabling per-quantity precision control for Q, R, and W arrays and showing that memory and communication overheads can be reduced in multi-CPU and multi-GPU environments. These findings offer a practical pathway to scalable high-fidelity CFD on modern HPC hardware, with plans for larger applications, adaptive precision, and further OPS/OpenSBLI enhancements.

Abstract

The use of reduced and mixed precision computing has gained increasing attention in high-performance computing (HPC) as a means to improve computational efficiency, particularly on modern hardware architectures like GPUs. In this work, we explore the application of mixed precision arithmetic in compressible turbulent flow simulations using explicit finite difference schemes. We extend the OPS and OpenSBLI frameworks to support customizable precision levels, enabling fine-grained control over precision allocation for different computational tasks. Through a series of numerical experiments on the Taylor-Green vortex benchmark, we demonstrate that mixed precision strategies, such as half-single and single-double combinations, can offer significant performance gains without compromising numerical accuracy. However, pure half-precision computations result in unacceptable accuracy loss, underscoring the need for careful precision selection. Our results show that mixed precision configurations can reduce memory usage and communication overhead, leading to notable speedups, particularly on multi-CPU and multi-GPU systems.

Paper Structure

This paper contains 17 sections, 9 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Schematic of the use of $n_W$ variable precision work arrays $W$ to form residuals $R$ for update of conservative variables $Q$ during a typical Runge-Kutta substep (DP=double precision, SP=single precision, HP=half precision).
  • Figure 2: Contours of $\rho E$ in three mutualy perpendicular slices at the mid locations in $x$, $y$ and $z$-directions, demonstrating the evolution of TGV state at different times: (a) t=0, (b) t=5, (c) t=10 and (d) t=15.
  • Figure 3: Kinetic energy (K) and dissipation ($\epsilon^S$) relative to the time.
  • Figure 4: Numerical accuracy of TGsym app using different precision levels. Mesh size $=256^3$, $M=0.5$, $Re=800$. The simulations were run for 8000 iterations using the default method.
  • Figure 5: Contours of $\rho E$ showing the TGV state at $t=10$, close to the peak of dissipation: (a) SPDP, (b) SP, (c) HPSP and (d) HP.
  • ...and 5 more figures