Table of Contents
Fetching ...

Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

Sean Baccas, Alexander A. Belozerov, Eike H. Müller, Tobias Weinzierl

TL;DR

The paper tackles efficient, scalable solvers for elliptic PDEs discretised with high-order DG methods by advocating a matrix-free, $hp$-multigrid approach that leverages single-touch data access and auxiliary facet variables. It introduces a DG IP formulation augmented with left/right facet projections and flux variables, enabling a matrix-free Schur-complement reduction and a highly data-local smoother. A hybrid execution model combines loop fusion, facet-based data structures, and selective tasking to balance concurrency with overhead, achieving strong performance on modern manycore architectures. Numerical results demonstrate rapid convergence of the $hp$-multigrid method and provide detailed performance analyses, including memory traffic reductions, scaling behavior, and basis-choice implications. The work offers practical, implementable techniques for HPC practitioners to realize efficient, scalable high-order DG solvers and suggests avenues for future enhancements in multigrid smoothers and asynchronous solvers.

Abstract

Matrix-free geometric multigrid solvers for elliptic PDEs that have been discretised with Higher-order Discontinuous Galerkin (DG) methods are ideally suited to exploit state-of-the-art computer architectures. Higher polynomial degrees offer exponential convergence, while the workload fits to vector units, is straightforward to parallelise, and exhibits high arithmetic intensity. Yet, DG methods such as the interior penalty DG discreisation do not magically guarantee high performance: they require non-local memory access due to coupling between neighbouring cells and break down into compute steps of widely varying costs and compute character. We address these limitations by developing efficient execution strategies for $hp$-multigrid. Separating cell- and facet-operations by introducing auxiliary facet variables localizes data access, reduces the need for frequent synchronization, and enables overlap of computation and communication. Loop fusion results in a single-touch scheme which reads (cell) data only once per smoothing step. We interpret the resulting execution strategies in the context of a task formalism, which exposes additional concurreny. The target audience of this paper are practitioners in Scientific Computing who are not necessarily experts on multigrid or familiar with sophisticated discretisation techniques. By discussing implementation techniques for a powerful solver algorithm we aim to make it accessible to the wider community.

Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

TL;DR

The paper tackles efficient, scalable solvers for elliptic PDEs discretised with high-order DG methods by advocating a matrix-free, -multigrid approach that leverages single-touch data access and auxiliary facet variables. It introduces a DG IP formulation augmented with left/right facet projections and flux variables, enabling a matrix-free Schur-complement reduction and a highly data-local smoother. A hybrid execution model combines loop fusion, facet-based data structures, and selective tasking to balance concurrency with overhead, achieving strong performance on modern manycore architectures. Numerical results demonstrate rapid convergence of the -multigrid method and provide detailed performance analyses, including memory traffic reductions, scaling behavior, and basis-choice implications. The work offers practical, implementable techniques for HPC practitioners to realize efficient, scalable high-order DG solvers and suggests avenues for future enhancements in multigrid smoothers and asynchronous solvers.

Abstract

Matrix-free geometric multigrid solvers for elliptic PDEs that have been discretised with Higher-order Discontinuous Galerkin (DG) methods are ideally suited to exploit state-of-the-art computer architectures. Higher polynomial degrees offer exponential convergence, while the workload fits to vector units, is straightforward to parallelise, and exhibits high arithmetic intensity. Yet, DG methods such as the interior penalty DG discreisation do not magically guarantee high performance: they require non-local memory access due to coupling between neighbouring cells and break down into compute steps of widely varying costs and compute character. We address these limitations by developing efficient execution strategies for -multigrid. Separating cell- and facet-operations by introducing auxiliary facet variables localizes data access, reduces the need for frequent synchronization, and enables overlap of computation and communication. Loop fusion results in a single-touch scheme which reads (cell) data only once per smoothing step. We interpret the resulting execution strategies in the context of a task formalism, which exposes additional concurreny. The target audience of this paper are practitioners in Scientific Computing who are not necessarily experts on multigrid or familiar with sophisticated discretisation techniques. By discussing implementation techniques for a powerful solver algorithm we aim to make it accessible to the wider community.

Paper Structure

This paper contains 70 sections, 34 equations, 12 figures, 6 tables, 7 algorithms.

Figures (12)

  • Figure 1: Left: We construct the spatial discretisation through a sequence of hypercubes (squares as we work in a $d=2$-dimensional setting in the illustration) embedded into each other. They form a spacetree. Middle: The union of the unrefined leaf nodes of the tree forms an adaptive Cartesian mesh. Right: Any two neighbouring cells $K^-$, $K^+$ share one facet $F$ with associated facet normal $n_F$.
  • Figure 2: Illustration of the fundamental function spaces used in this work. Discontinuous Galerkin (DG) space $\mathbb{V}_{h,p=2}^{(\text{DG})}$ with Gauss-Legendre nodes (a), DG facet space $\mathbb{F}_{h,p=2}^{(\text{DG})}$ with Gauss-Legendre nodes (b), lowest order piecewise linear Continuous Galerkin (CG) space $\mathbb{V}_{h,p=1}^{(\text{CG})}$ (c), DG space $\mathbb{V}_{h,p=2}^{(\text{DG})}$ with Gauss-Lobatto nodes (d) and composite DG facet space $\mathbb{F}_{h,p=2}^{(\text{DG})}\times \mathbb{F}_{h,p=2}^{(\text{DG})}$ with Gauss-Legendre nodes (e).
  • Figure 3: Schematic task graph (bottom) for a single block-Jacobi iteration applied to a 1d problem with four mesh cells (top). Fractional indices are used to number the facets sitting in-between two cells. \ref{['tab:tasks']} breaks down the algorithm into more detailed tasks, which are omitted here for brevity.
  • Figure 4: Visualisation of "sin-product" reference solution $u_1^{\textrm{ref.}}$ (left) as defined in \ref{['eq:results:sin_product_exact']} and "two-peak" reference solution $u_2^{\textrm{ref.}}$ (right) as defined in \ref{['eq:results:two_peak_exact']}.
  • Figure 5: Error $E_{h,p}$ in $\ell_2$ (circles $\bullet$) and $\ell_\infty$ (squares $\blacksquare$) norms for various choices of $p$ and $h$. Results are shown for the "sin-product" reference solution $u_1^{\textrm{ref.}}$ in \ref{['eq:results:sin_product_exact']} (left) and the "two-peak" reference solution $u_2^{\textrm{ref.}}$ in \ref{['eq:results:two_peak_exact']} (right).
  • ...and 7 more figures

Theorems & Definitions (7)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Definition 3.5
  • Definition 3.6
  • Definition 4.1