Solving advection equations with reduction multigrids on GPUs
S. Dargaville, R. P. Smedley-Stevenson, P. N. Smith, C. C. Pain
TL;DR
This paper tackles scalable parallel solution of time-independent advection problems on GPUs by employing a reduction multigrid framework built from AIRG and PMISR DDC CF splitting. It uses matrix-free, high-order GMRES polynomials as coarse-grid solvers and F-point smoothers, with automatic truncation of the multigrid hierarchy to minimize communication on large GPUs, demonstrated on Lumi-G across multiple directions and geometries. The results show strong weak scaling (up to 101% solve efficiency) and substantial throughput improvements, with competitive strong-scaling behavior and robust performance even for challenging velocity directions and rectangular grids. These findings suggest that reduction multigrid methods can deliver scalable, sweep-free solvers for asymmetric advection-type problems on modern GPU hardware, with the code available in the open-source PFLARE library.
Abstract
Methods for solving hyperbolic systems typically depend on unknown ordering (e.g., Gauss-Seidel, or sweep/wavefront/marching methods) to achieve good convergence. For many discretisations, mesh types or decompositions these methods do not scale well in parallel. In this work we demonstrate that the combination of AIRG (a reduction multigrid which uses GMRES polynomials) and PMISR DDC (a CF splitting algorithm which gives diagonally dominant submatrices) can be used to solve linear advection equations in parallel on GPUs with good weak scaling. We find that GMRES polynomials are well suited to GPUs when applied matrix-free, either as smoothers (at low order) or as an approximate coarse grid solver (at high order). To improve the parallel performance we automatically truncate the multigrid hierarchy given the quality of the polynomials as coarse grid solvers. Solving time-independent advection equations in 2D on structured grids, we find 66-101% weak scaling efficiency in the solve and 47-63% in the setup with AIRG, across the majority of Lumi-G, a pre-exascale GPU machine.
