Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver
Gregor Olenik, Marcel Koch, Hartwig Anzt
TL;DR
The paper tackles the oversubscription problem that arises when integrating GPU solvers into OpenFOAM via plugin-based approaches. It introduces a repartitioning strategy that maps CPU-based matrix assembly to GPU-based solves using a controlled ratio $n_{GPU}=n_{CPU}/\alpha$, and constructs three key data structures (a sparsity pattern, an update pattern $U$, and a permutation matrix $P$) along with a dedicated MPI communicator for GPUs to enable efficient updates. The method is implemented in the OpenFOAM-Ginkgo Layer and evaluated on a lid-driven cavity benchmark, showing that repartitioning yields substantial performance gains (up to around $10\times$ speedups) and mitigates GPU oversubscription, with additional improvements when using GPU-aware MPI. The approach provides a practical, less invasive path to efficient heterogeneous HPC on OpenFOAM workflows and motivates future work on industrial-scale cases and extensions to distributed multigrid and preconditioning strategies.
Abstract
Modern high-performance computing (HPC) increasingly relies on GPUs, but integrating GPU acceleration into complex scientific frameworks like OpenFOAM remains a challenge. Existing approaches either fully refactor the codebase or use plugin-based GPU solvers, each facing trade-offs between performance and development effort. In this work, we address the limitations of plugin-based GPU acceleration in OpenFOAM by proposing a repartitioning strategy that better balances CPU matrix assembly and GPU-based linear solves. We present a detailed computational model, describe a novel matrix repartitioning and update procedure, and evaluate its performance on large-scale CFD simulations. Our results show that the proposed method significantly mitigates oversubscription issues, improving solver performance and resource utilization in heterogeneous CPU-GPU environments.
