Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport
S. Dargaville, R. P. Smedley-Stevenson, P. N. Smith, C. C. Pain
TL;DR
This work develops a scalable solver for the streaming (hyperbolic) limit of the Boltzmann Transport Equation on unstructured grids by marrying a parallel reduction multigrid (AIRG) with a novel two-pass CF splitting (PMISR DDC) that yields a diagonally dominant $A_{\mathrm{ff}}$. AIRG uses fixed-sparsity, low-order GMRES polynomials to approximate $\hat{A}_{\mathrm{ff}}^{-1}$, enabling efficient coarse-grid corrections without requiring lower-triangular blocks. Through rigorous serial and parallel experiments, the authors demonstrate near-linear work growth with problem size, strong/weak scaling up to hundreds of thousands of DOFs per core, and substantial performance gains over hypre’s $\ell$AIR, especially when combined with coarse-grid repartitioning and hierarchy truncation. The results indicate a practical, highly parallel path for streaming-dominated transport on unstructured meshes, with implications for large-scale deterministic neutron/photon transport and related hyperbolic PDEs.
Abstract
Reduction multigrids have recently shown good performance in hyperbolic problems without the need for Gauss-Seidel smoothers. When applied to the hyperbolic limit of the Boltzmann Transport Equation (BTE), these methods result in very close to $\mathcal{O}(n)$ growth in work with problem size on unstructured grids. This scalability relies on the CF splitting producing an $A_\textrm{ff}$ block that is easy to invert. We introduce a parallel two-pass CF splitting designed to give diagonally dominant $A_\textrm{ff}$. The first pass computes a maximal independent set in the symmetrized strong connections. The second pass converts F-points to C-points based on the row-wise diagonal dominance of $A_\textrm{ff}$. We find this two-pass CF splitting outperforms common CF splittings available in hypre. Furthermore, parallelisation of reduction multigrids in hyperbolic problems is difficult as we require both long-range grid-transfer operators and slow coarsenings (with rates of $\sim$1/2 in both 2D and 3D). We find that good parallel performance in the setup and solve is dependent on several factors: repartitioning the coarse grids, reducing the number of active MPI ranks as we coarsen, truncating the multigrid hierarchy and applying a GMRES polynomial as a coarse-grid solver. We compare the performance of two different reduction multigrids, AIRG (that we developed previously) and the hypre implementation of $\ell$AIR. In the streaming limit with AIRG, we demonstrate 81\% weak scaling efficiency in the solve from 2 to 64 nodes (256 to 8196 cores) with only 8.8k unknowns per core, with solve times up to 5.9$\times$ smaller than the $\ell$AIR implementation in hypre.
