Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport

S. Dargaville; R. P. Smedley-Stevenson; P. N. Smith; C. C. Pain

Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport

S. Dargaville, R. P. Smedley-Stevenson, P. N. Smith, C. C. Pain

TL;DR

This work develops a scalable solver for the streaming (hyperbolic) limit of the Boltzmann Transport Equation on unstructured grids by marrying a parallel reduction multigrid (AIRG) with a novel two-pass CF splitting (PMISR DDC) that yields a diagonally dominant $A_{\mathrm{ff}}$. AIRG uses fixed-sparsity, low-order GMRES polynomials to approximate $\hat{A}_{\mathrm{ff}}^{-1}$, enabling efficient coarse-grid corrections without requiring lower-triangular blocks. Through rigorous serial and parallel experiments, the authors demonstrate near-linear work growth with problem size, strong/weak scaling up to hundreds of thousands of DOFs per core, and substantial performance gains over hypre’s $\ell$AIR, especially when combined with coarse-grid repartitioning and hierarchy truncation. The results indicate a practical, highly parallel path for streaming-dominated transport on unstructured meshes, with implications for large-scale deterministic neutron/photon transport and related hyperbolic PDEs.

Abstract

Reduction multigrids have recently shown good performance in hyperbolic problems without the need for Gauss-Seidel smoothers. When applied to the hyperbolic limit of the Boltzmann Transport Equation (BTE), these methods result in very close to $\mathcal{O}(n)$ growth in work with problem size on unstructured grids. This scalability relies on the CF splitting producing an $A_\textrm{ff}$ block that is easy to invert. We introduce a parallel two-pass CF splitting designed to give diagonally dominant $A_\textrm{ff}$. The first pass computes a maximal independent set in the symmetrized strong connections. The second pass converts F-points to C-points based on the row-wise diagonal dominance of $A_\textrm{ff}$. We find this two-pass CF splitting outperforms common CF splittings available in hypre. Furthermore, parallelisation of reduction multigrids in hyperbolic problems is difficult as we require both long-range grid-transfer operators and slow coarsenings (with rates of $\sim$1/2 in both 2D and 3D). We find that good parallel performance in the setup and solve is dependent on several factors: repartitioning the coarse grids, reducing the number of active MPI ranks as we coarsen, truncating the multigrid hierarchy and applying a GMRES polynomial as a coarse-grid solver. We compare the performance of two different reduction multigrids, AIRG (that we developed previously) and the hypre implementation of $\ell$AIR. In the streaming limit with AIRG, we demonstrate 81\% weak scaling efficiency in the solve from 2 to 64 nodes (256 to 8196 cores) with only 8.8k unknowns per core, with solve times up to 5.9$\times$ smaller than the $\ell$AIR implementation in hypre.

Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport

TL;DR

. AIRG uses fixed-sparsity, low-order GMRES polynomials to approximate

, enabling efficient coarse-grid corrections without requiring lower-triangular blocks. Through rigorous serial and parallel experiments, the authors demonstrate near-linear work growth with problem size, strong/weak scaling up to hundreds of thousands of DOFs per core, and substantial performance gains over hypre’s

AIR, especially when combined with coarse-grid repartitioning and hierarchy truncation. The results indicate a practical, highly parallel path for streaming-dominated transport on unstructured meshes, with implications for large-scale deterministic neutron/photon transport and related hyperbolic PDEs.

Abstract

growth in work with problem size on unstructured grids. This scalability relies on the CF splitting producing an

block that is easy to invert. We introduce a parallel two-pass CF splitting designed to give diagonally dominant

. The first pass computes a maximal independent set in the symmetrized strong connections. The second pass converts F-points to C-points based on the row-wise diagonal dominance of

. We find this two-pass CF splitting outperforms common CF splittings available in hypre. Furthermore, parallelisation of reduction multigrids in hyperbolic problems is difficult as we require both long-range grid-transfer operators and slow coarsenings (with rates of

1/2 in both 2D and 3D). We find that good parallel performance in the setup and solve is dependent on several factors: repartitioning the coarse grids, reducing the number of active MPI ranks as we coarsen, truncating the multigrid hierarchy and applying a GMRES polynomial as a coarse-grid solver. We compare the performance of two different reduction multigrids, AIRG (that we developed previously) and the hypre implementation of

AIR. In the streaming limit with AIRG, we demonstrate 81\% weak scaling efficiency in the solve from 2 to 64 nodes (256 to 8196 cores) with only 8.8k unknowns per core, with solve times up to 5.9

smaller than the

AIR implementation in hypre.

Paper Structure (19 sections, 8 equations, 7 figures, 9 tables, 2 algorithms)

This paper contains 19 sections, 8 equations, 7 figures, 9 tables, 2 algorithms.

Introduction
Discretisation
Reduction multigrid
AIRG
CF splitting
Results
Serial comparison
Strong tolerances
Serial comparison of CF splitting algorithms
Parallel results
GMRES polynomial order
Optimisations for parallel multigrid
Timing results for parallel optimisations
Strong scaling
Weak scaling
...and 4 more sections

Figures (7)

Figure 1: Row-wise diagonal dominance of $\textrm{A}_\textrm{ff}$ on an unstructured mesh with 2313 nodes. Red shows after the first pass with PMSIR with strong tolerance 0.5, blue shows after the second pass with DDC with fraction of 10%.
Figure 2: CF splitting produced by PMISR DDC on the top grid with strong tolerance 0.5 on an unstructured mesh with 2313 nodes. White squares are C-points, blue squares are F points. The red squares are C-points that were converted from F-points by the second pass with DDC with fraction of 10%.
Figure 3: CF splitting produced for angle 1 (direction $(1,1)$) by PMISR DDC on the top grid of an unstructured mesh with 2313 nodes. White squares are C-points, blue squares are F points. The red squares are C-points that were converted from F-points by the second pass with DDC with fraction of 10%.
Figure 4: Average ratio of local to non-local nnzs across active MPI ranks on each level on 32 nodes (4096 cores) of ARCHER2. Black is with no repartitioning, $\triangle$ is with simple repartitioning, $\square$ is with ParMETIS repartitioning onto fewer ranks.
Figure 5: Time taken for each component of the multigrid setup on each level on 32 nodes (4096 cores) of ARCHER2. The $\times$ is the CF splitting, the $\triangle$ is the prolongator, the $\otimes$ is the GMRES polynomial, the $+$ is the SpGEMM for the restrictor, the o is the SpGEMM for the coarse grid, the $\oplus$ is the matrix extract and the $\square$ is the repartitioning.
...and 2 more figures

Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport

TL;DR

Abstract

Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport

Authors

TL;DR

Abstract

Table of Contents

Figures (7)