Table of Contents
Fetching ...

Performance evaluation of mixed-precision Runge-Kutta methods for the solution of partial differential equations

Ivo Dravins, Marcel Koch, Victoria Griehl, Katharina Kormann

TL;DR

The paper investigates mixed-precision Runge--Kutta time stepping for solving PDEs on GPUs and CPUs using the Ginkgo library. It introduces three perturbed four-stage, third-order RK schemes (4s3pA/B/C) and analyzes their stability and error behavior while leveraging low-precision solves with high-precision corrections. Through 3D heat and advection tests, it demonstrates substantial speedups for memory-bound kernels, highlights the importance of solver tolerances and kernel implementations, and provides practical guidance for deploying mixed-precision time stepping in large-scale PDEs. The work also outlines future directions toward nonlinear problems, distributed computing, and even lower precision to further enhance performance while preserving accuracy.

Abstract

This work focuses on the numerical study of a recently published class of Runge-Kutta methods designed for mixed-precision arithmetic. We employ the methods in solving partial differential equations on modern hardware. In particular we investigate what speedups are achievable by the use of mixed precision and the dependence of the methods algorithmic compatibility with the computational hardware. We use state-of-the-art software, utilizing the Ginkgo library, which is designed to incorporate mixed precision arithmetic, and perform numerical tests of 3D problems on both GPU and CPU architectures. We show that significant speedups can be achieved but that performance depends on solver parameters and performance of software kernels.

Performance evaluation of mixed-precision Runge-Kutta methods for the solution of partial differential equations

TL;DR

The paper investigates mixed-precision Runge--Kutta time stepping for solving PDEs on GPUs and CPUs using the Ginkgo library. It introduces three perturbed four-stage, third-order RK schemes (4s3pA/B/C) and analyzes their stability and error behavior while leveraging low-precision solves with high-precision corrections. Through 3D heat and advection tests, it demonstrates substantial speedups for memory-bound kernels, highlights the importance of solver tolerances and kernel implementations, and provides practical guidance for deploying mixed-precision time stepping in large-scale PDEs. The work also outlines future directions toward nonlinear problems, distributed computing, and even lower precision to further enhance performance while preserving accuracy.

Abstract

This work focuses on the numerical study of a recently published class of Runge-Kutta methods designed for mixed-precision arithmetic. We employ the methods in solving partial differential equations on modern hardware. In particular we investigate what speedups are achievable by the use of mixed precision and the dependence of the methods algorithmic compatibility with the computational hardware. We use state-of-the-art software, utilizing the Ginkgo library, which is designed to incorporate mixed precision arithmetic, and perform numerical tests of 3D problems on both GPU and CPU architectures. We show that significant speedups can be achieved but that performance depends on solver parameters and performance of software kernels.

Paper Structure

This paper contains 25 sections, 36 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Examples of Butcher tableau, implicit midpoint rule (left), mixed precision midpoint rule with one correction step (middle) and two correction steps (right). The $(\epsilon)$ superscript denotes steps to be made in low precision.
  • Figure 2: Stability regions for the 4s3pA method: 64bit $A^{(\epsilon)}$ - gray, 16bit $A^{(\epsilon)}$ - black.
  • Figure 3: Stability regions for the B and C methods in gray.
  • Figure 4: Error at the final time for Methods A, B, C and midpoint rule with correction.
  • Figure 5: GPU: Normalized (over iterations) speedup and averages for tensor operations $T_{R,M,L}$.
  • ...and 6 more figures