Table of Contents
Fetching ...

Multigrid p-Robustness at Jacobi Speeds: Efficient Matrix-Free Implementation of Local p-Multigrid Solvers

Michał Wichrowski

TL;DR

This work tackles robust convergence for high-order finite element multigrid by developing a fully matrix-free, local p-multigrid solver for vertex-patch smoothers. It combines sum-factorization and explicit SIMD within deal.II to realize a practical, cache-friendly implementation that maintains $O(p^d)$ memory scaling. The paper introduces a half V-cycle optimization to reduce cost and provides an extensive performance study across Cartesian and distorted meshes, showing compute-bound residuals and favorable scaling on multi-core CPUs. The results demonstrate that the proposed patch smoother can rival simple pointwise smoothers in speed while preserving the convergence benefits of patch-based methods, making Schwarz-type robustness accessible in large-scale, high-order simulations.

Abstract

Vertex-patch smoothers are essential for the robust convergence of geometric multigrid methods in high-order finite element applications, yet their adoption is traditionally hindered by the prohibitive cost of solving local patch problems. This paper presents a high-performance, matrix-free implementation of a p-multigrid local solver that dismantles the trade-off between smoothing effectiveness and computational efficiency. We focus on the practical realization of this iterative approach, leveraging sum-factorization and explicit SIMD vectorization to minimize memory footprint and maximize arithmetic throughput. The performance analysis demonstrates that the solver effectively hides data-fetching latencies and maintains optimal $\mathcal{O}(p^d)$ memory scaling, even when dominated by geometric data on distorted meshes. The result is a robust smoother that rivals the execution speed of simple pointwise smoothers while preserving the convergence benefits of patch-based methods.

Multigrid p-Robustness at Jacobi Speeds: Efficient Matrix-Free Implementation of Local p-Multigrid Solvers

TL;DR

This work tackles robust convergence for high-order finite element multigrid by developing a fully matrix-free, local p-multigrid solver for vertex-patch smoothers. It combines sum-factorization and explicit SIMD within deal.II to realize a practical, cache-friendly implementation that maintains memory scaling. The paper introduces a half V-cycle optimization to reduce cost and provides an extensive performance study across Cartesian and distorted meshes, showing compute-bound residuals and favorable scaling on multi-core CPUs. The results demonstrate that the proposed patch smoother can rival simple pointwise smoothers in speed while preserving the convergence benefits of patch-based methods, making Schwarz-type robustness accessible in large-scale, high-order simulations.

Abstract

Vertex-patch smoothers are essential for the robust convergence of geometric multigrid methods in high-order finite element applications, yet their adoption is traditionally hindered by the prohibitive cost of solving local patch problems. This paper presents a high-performance, matrix-free implementation of a p-multigrid local solver that dismantles the trade-off between smoothing effectiveness and computational efficiency. We focus on the practical realization of this iterative approach, leveraging sum-factorization and explicit SIMD vectorization to minimize memory footprint and maximize arithmetic throughput. The performance analysis demonstrates that the solver effectively hides data-fetching latencies and maintains optimal memory scaling, even when dominated by geometric data on distorted meshes. The result is a robust smoother that rivals the execution speed of simple pointwise smoothers while preserving the convergence benefits of patch-based methods.

Paper Structure

This paper contains 20 sections, 4 equations, 10 figures, 4 tables, 3 algorithms.

Figures (10)

  • Figure 1: Workflow in a generic smoother application of a patch smoother for a single patch $j$. The three panels represent the main steps: gathering data from the global solution and right-hand side, computing the local residual, and applying the local correction and scattering it back to the global solution. Blue dots indicate interior DoFs of the patch, while red dots indicate all DoFs (including boundary) associated with each cell. The dashed rectangle highlights the patch interior.
  • Figure 2: Patch-smoother data flow for a local p-multigrid solve (illustration for $p=3$ on the fine local level). The coarse level shown is $p=1$ and contains a single interior DoF (hence the coarse solve is exact). The diagram shows gather, evaluate (local residual), pre-smoothing, restriction to the coarse level, an exact coarse correction, prolongation-and-add, and scatter-add. A dashed arrow indicates proceeding to the next local multigrid iteration. For brevity we illustrate only pre-smoothing; post-smoothing is omitted (half V-cycle).
  • Figure 3: Application of vmult in the patch-local solver
  • Figure 4: Breakdown of the wall-clock time for a single patch smoother application across different polynomial degrees $p$. Left: Cartesian mesh. Right: distorted mesh. Components: initial data fetching and setup, local residual computation, the p-multigrid local solve, and scattering the correction back to the global vector. Relative times are normalized by the time required to evaluate the patch-local operator $A$ application.
  • Figure 5: Throughput of individual components of the patch smoother application. The y-axis shows performance in GFLOP/s.
  • ...and 5 more figures