Multigrid p-Robustness at Jacobi Speeds: Efficient Matrix-Free Implementation of Local p-Multigrid Solvers
Michał Wichrowski
TL;DR
This work tackles robust convergence for high-order finite element multigrid by developing a fully matrix-free, local p-multigrid solver for vertex-patch smoothers. It combines sum-factorization and explicit SIMD within deal.II to realize a practical, cache-friendly implementation that maintains $O(p^d)$ memory scaling. The paper introduces a half V-cycle optimization to reduce cost and provides an extensive performance study across Cartesian and distorted meshes, showing compute-bound residuals and favorable scaling on multi-core CPUs. The results demonstrate that the proposed patch smoother can rival simple pointwise smoothers in speed while preserving the convergence benefits of patch-based methods, making Schwarz-type robustness accessible in large-scale, high-order simulations.
Abstract
Vertex-patch smoothers are essential for the robust convergence of geometric multigrid methods in high-order finite element applications, yet their adoption is traditionally hindered by the prohibitive cost of solving local patch problems. This paper presents a high-performance, matrix-free implementation of a p-multigrid local solver that dismantles the trade-off between smoothing effectiveness and computational efficiency. We focus on the practical realization of this iterative approach, leveraging sum-factorization and explicit SIMD vectorization to minimize memory footprint and maximize arithmetic throughput. The performance analysis demonstrates that the solver effectively hides data-fetching latencies and maintains optimal $\mathcal{O}(p^d)$ memory scaling, even when dominated by geometric data on distorted meshes. The result is a robust smoother that rivals the execution speed of simple pointwise smoothers while preserving the convergence benefits of patch-based methods.
