Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

Josep Plana-Riu; F. Xavier Trias; Àdel Alsalti-Baldellou; Xavier Álvarez-Farré; Guillem Colomer; Assensi Oliva

Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

Josep Plana-Riu, F. Xavier Trias, Àdel Alsalti-Baldellou, Xavier Álvarez-Farré, Guillem Colomer, Assensi Oliva

TL;DR

This work tackles the memory-bound bottleneck of sparse matrix-vector operations in CFD by replacing SpMV with sparse matrix-matrix products (SpMM) to raise arithmetic intensity, enabled by repeated block matrix structures. It extends SpMM from Poisson solves to all CFD operators and introduces an inline mesh-refinement strategy to accelerate the transition to statistically steady states in parallel-in-time ensembles. Theoretical bounds on SpMM speed-ups are derived and validated across a turbulent channel, Rayleigh‑Bénard convection, and an industrial airfoil, with reported gains from modest improvements to over 50% speed-up in mesh-refinement scenarios. The results indicate significant practical impact for fast, memory-efficient CFD on modern HPC systems and point to future GPU deployment and broader unstructured-geometry applications.

Abstract

Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces a novel approach to increase arithmetic intensity in CFD by leveraging repeated matrix block structures. The method transforms the conventional sparse matrix-vector product (SpMV) into a sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides. This shifts the computation towards a more compute-bound regime by reusing matrix coefficients. Additionally, an inline mesh-refinement strategy is proposed: simulations initially run on a coarse mesh to establish a statistically steady flow, then refine to the target mesh. This reduces the wall-clock time to reach transition, leading to faster convergence with equivalent computational cost. The methodology is evaluated using theoretical performance bounds and validated through three test cases: a turbulent channel flow, Rayleigh-Bénard convection, and an industrial airfoil simulation. Results demonstrate substantial speed-ups - from modest improvements in basic configurations to over 50% in the mesh-refinement setup - highlighting the benefits of integrating SpMM across all CFD operators, including divergence, gradient, and Laplacian.

Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

TL;DR

Abstract

Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)