Table of Contents
Fetching ...

Effects of lower floating-point precision on scale-resolving numerical simulations of turbulence

Martin Karp, Ronith Stanly, Timofey Mukha, Luca Galimberti, Siavash Toosi, Hang Song, Lissandro Dalcin, Saleh Rezaeiravesh, Niclas Jansson, Stefano Markidis, Matteo Parsani, Sanjeeb Bose, Sanjiva Lele, Philipp Schlatter

Abstract

Modern computing clusters offer specialized hardware for reduced-precision arithmetic that can speed up the time to solution significantly. This is possible due to a decrease in data movement, as well as the ability to perform arithmetic operations at a faster rate. However, for high-fidelity simulations of turbulence, such as direct and large-eddy simulation, the impact of reduced precision on the computed solution and the resulting uncertainty across flow solvers and different flow cases have not been explored in detail and limits the optimal utilization of new high-performance computing systems. In this work, the effect of reduced precision is studied using four diverse computational fluid dynamics (CFD) solvers (two incompressible, Neko and Simson, and two compressible, PadeLibs and SSDC) using four test cases: turbulent channel flow at Retau = 550 and higher, forced transition in a channel, flow over a cylinder at ReD = 3900, and compressible flow over a wing section at Rec = 50000. We observe that the flow physics are remarkably robust with respect to reduction in lower floating-point precision, and that often other forms of uncertainty, due to for example time averaging, often have a much larger impact on the computed result. Our results indicate that different terms in the Navier-Stokes equations can be computed to a lower floating-point accuracy without affecting the results. In particular, standard IEEE single precision can be used effectively for the entirety of the simulation, showing no significant discrepancies from double-precision results across the solvers and cases considered. Potential pitfalls are also discussed.

Effects of lower floating-point precision on scale-resolving numerical simulations of turbulence

Abstract

Modern computing clusters offer specialized hardware for reduced-precision arithmetic that can speed up the time to solution significantly. This is possible due to a decrease in data movement, as well as the ability to perform arithmetic operations at a faster rate. However, for high-fidelity simulations of turbulence, such as direct and large-eddy simulation, the impact of reduced precision on the computed solution and the resulting uncertainty across flow solvers and different flow cases have not been explored in detail and limits the optimal utilization of new high-performance computing systems. In this work, the effect of reduced precision is studied using four diverse computational fluid dynamics (CFD) solvers (two incompressible, Neko and Simson, and two compressible, PadeLibs and SSDC) using four test cases: turbulent channel flow at Retau = 550 and higher, forced transition in a channel, flow over a cylinder at ReD = 3900, and compressible flow over a wing section at Rec = 50000. We observe that the flow physics are remarkably robust with respect to reduction in lower floating-point precision, and that often other forms of uncertainty, due to for example time averaging, often have a much larger impact on the computed result. Our results indicate that different terms in the Navier-Stokes equations can be computed to a lower floating-point accuracy without affecting the results. In particular, standard IEEE single precision can be used effectively for the entirety of the simulation, showing no significant discrepancies from double-precision results across the solvers and cases considered. Potential pitfalls are also discussed.

Paper Structure

This paper contains 20 sections, 4 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Roofline for the Nvidia A100 and Nvidia GeForce RTX4080 in double (FP64) and single (FP32) precision. The solid lines represent the roofline (the maximum attainable performance) for the two architectures as a function of operational intensity $I=\pi/\beta$, defined as the fraction between the peak performance $\pi$ and the memory bandwidth $\beta$ of the computing unit. The dashed lines represent the peak performance $\pi$ for the two architectures and the dotted line the performance limit based on the time needed to load data from memory, $\beta I$. Most CFD codes today operate in the domain limited by $\beta I$.
  • Figure 2: Mean streamwise velocity profiles of the turbulent channel flow simulated using different precisions at $Re_\tau=550$ using Neko, SSDC, and Simson. All curves agree reasonably well with one another and with the data from Lee & Moser Lee_Moser_2015 (not shown here), except State FP16 from Nekoand State E5M2, Conv. E4M3, State E4M3 from Simson. Different codes are shown using different line styles (as indicated in the legend on the left) and different roundings are represented by different colors (as shown in the legend on the right). All these cases are also compared against each other in Table \ref{['tab:channel']}.
  • Figure 3: Root-mean-square of velocity fluctuations from the turbulent channel flow simulated using different precisions at $Re_\tau=550$ using Neko and SSDC. All curves agree reasonably well with one another and with the DNS data from Lee & Moser Lee_Moser_2015 (not shown here), except State FP16 from Neko, and State E5M2, Conv. E4M3, State E4M3 from Simson. Different codes are shown using different line styles (as indicated in the legend on the left) and different roundings represented by using different colors (as shown in the legend on the right). All these cases are also compared against each other in Table \ref{['tab:channel']}.
  • Figure 4: Budget terms in the transport equation of $\langle u'u' \rangle$ (a) and $\langle u'v' \rangle$ (b) for the turbulent channel flow at $Re_\tau\approx1000$ using full FP32 (beige) and recalculated using one time step in FP64 (dark red) compared to the reference data from Lee & Moser Lee_Moser_2015 (dotted blue). Simulations are performed using Simson. Triangles and squares denote the pressure-strain and pressure transport terms, respectively.
  • Figure 5: Skewness (a) and flatness (b) of velocity components in the streamwise ($u_1$), wall-normal ($u_2$), and spanwise ($u_3$) directions in the turbulent channel flow at $Re_\tau\approx550$. Colors from light to dark show cases that were run and post-processed in FP32, run in FP32 but restarted and post-processed in FP64, and run and post-processed in FP64. Simulations were performed using Simson with the velocity-vorticity formulation. Similar behavior was observed at $Re_\tau\approx1000$, with larger fluctuations.
  • ...and 10 more figures