Table of Contents
Fetching ...

Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

Josep Plana-Riu, F. Xavier Trias, Àdel Alsalti-Baldellou, Xavier Álvarez-Farré, Guillem Colomer, Assensi Oliva

TL;DR

This work tackles the memory-bound bottleneck of sparse matrix-vector operations in CFD by replacing SpMV with sparse matrix-matrix products (SpMM) to raise arithmetic intensity, enabled by repeated block matrix structures. It extends SpMM from Poisson solves to all CFD operators and introduces an inline mesh-refinement strategy to accelerate the transition to statistically steady states in parallel-in-time ensembles. Theoretical bounds on SpMM speed-ups are derived and validated across a turbulent channel, Rayleigh‑Bénard convection, and an industrial airfoil, with reported gains from modest improvements to over 50% speed-up in mesh-refinement scenarios. The results indicate significant practical impact for fast, memory-efficient CFD on modern HPC systems and point to future GPU deployment and broader unstructured-geometry applications.

Abstract

Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces a novel approach to increase arithmetic intensity in CFD by leveraging repeated matrix block structures. The method transforms the conventional sparse matrix-vector product (SpMV) into a sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides. This shifts the computation towards a more compute-bound regime by reusing matrix coefficients. Additionally, an inline mesh-refinement strategy is proposed: simulations initially run on a coarse mesh to establish a statistically steady flow, then refine to the target mesh. This reduces the wall-clock time to reach transition, leading to faster convergence with equivalent computational cost. The methodology is evaluated using theoretical performance bounds and validated through three test cases: a turbulent channel flow, Rayleigh-Bénard convection, and an industrial airfoil simulation. Results demonstrate substantial speed-ups - from modest improvements in basic configurations to over 50% in the mesh-refinement setup - highlighting the benefits of integrating SpMM across all CFD operators, including divergence, gradient, and Laplacian.

Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

TL;DR

This work tackles the memory-bound bottleneck of sparse matrix-vector operations in CFD by replacing SpMV with sparse matrix-matrix products (SpMM) to raise arithmetic intensity, enabled by repeated block matrix structures. It extends SpMM from Poisson solves to all CFD operators and introduces an inline mesh-refinement strategy to accelerate the transition to statistically steady states in parallel-in-time ensembles. Theoretical bounds on SpMM speed-ups are derived and validated across a turbulent channel, Rayleigh‑Bénard convection, and an industrial airfoil, with reported gains from modest improvements to over 50% speed-up in mesh-refinement scenarios. The results indicate significant practical impact for fast, memory-efficient CFD on modern HPC systems and point to future GPU deployment and broader unstructured-geometry applications.

Abstract

Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces a novel approach to increase arithmetic intensity in CFD by leveraging repeated matrix block structures. The method transforms the conventional sparse matrix-vector product (SpMV) into a sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides. This shifts the computation towards a more compute-bound regime by reusing matrix coefficients. Additionally, an inline mesh-refinement strategy is proposed: simulations initially run on a coarse mesh to establish a statistically steady flow, then refine to the target mesh. This reduces the wall-clock time to reach transition, leading to faster convergence with equivalent computational cost. The methodology is evaluated using theoretical performance bounds and validated through three test cases: a turbulent channel flow, Rayleigh-Bénard convection, and an industrial airfoil simulation. Results demonstrate substantial speed-ups - from modest improvements in basic configurations to over 50% in the mesh-refinement setup - highlighting the benefits of integrating SpMM across all CFD operators, including divergence, gradient, and Laplacian.

Paper Structure

This paper contains 15 sections, 31 equations, 17 figures, 9 tables, 1 algorithm.

Figures (17)

  • Figure 1: Simplified version of a roofline model in which the memory-bound (blue) and compute-bound (red) regions are depicted. The goal of the present paper is represented in pushing the arithmetic intensity $I$ towards the compute-bound zone.
  • Figure 2: Speed-up bounds associated to a SpMM with a sparse square matrix with $n_c=n_r=10^6$ with 7 and 13 non-zeros per row for $m=[1,128]$ (left) and a zoom up to 32 rhs (right).
  • Figure 3: Proposed ensemble averaging strategy in which the case is run until $T_D$ in a coarser setup, and then in the intended setup for $t\in\left(T_D,T_T+T_A/m\right)$. I: mapping, II: developing, III: averaging.
  • Figure 4: $\tilde{\beta}/{\beta}$ as a function of $\Pi$ and $\gamma$.
  • Figure 5: Average velocity in wall units (left) and rms streamwise velocity (right) profiles for 1, 2, 4, and 8 rhs in a turbulent planar channel flow of $\text{Re}_\tau=180$.
  • ...and 12 more figures