Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

Sean Baccas; Alexander A. Belozerov; Eike H. Müller; Tobias Weinzierl

Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

Sean Baccas, Alexander A. Belozerov, Eike H. Müller, Tobias Weinzierl

TL;DR

The paper tackles efficient, scalable solvers for elliptic PDEs discretised with high-order DG methods by advocating a matrix-free, $hp$-multigrid approach that leverages single-touch data access and auxiliary facet variables. It introduces a DG IP formulation augmented with left/right facet projections and flux variables, enabling a matrix-free Schur-complement reduction and a highly data-local smoother. A hybrid execution model combines loop fusion, facet-based data structures, and selective tasking to balance concurrency with overhead, achieving strong performance on modern manycore architectures. Numerical results demonstrate rapid convergence of the $hp$-multigrid method and provide detailed performance analyses, including memory traffic reductions, scaling behavior, and basis-choice implications. The work offers practical, implementable techniques for HPC practitioners to realize efficient, scalable high-order DG solvers and suggests avenues for future enhancements in multigrid smoothers and asynchronous solvers.

Abstract

Matrix-free geometric multigrid solvers for elliptic PDEs that have been discretised with Higher-order Discontinuous Galerkin (DG) methods are ideally suited to exploit state-of-the-art computer architectures. Higher polynomial degrees offer exponential convergence, while the workload fits to vector units, is straightforward to parallelise, and exhibits high arithmetic intensity. Yet, DG methods such as the interior penalty DG discreisation do not magically guarantee high performance: they require non-local memory access due to coupling between neighbouring cells and break down into compute steps of widely varying costs and compute character. We address these limitations by developing efficient execution strategies for $hp$-multigrid. Separating cell- and facet-operations by introducing auxiliary facet variables localizes data access, reduces the need for frequent synchronization, and enables overlap of computation and communication. Loop fusion results in a single-touch scheme which reads (cell) data only once per smoothing step. We interpret the resulting execution strategies in the context of a task formalism, which exposes additional concurreny. The target audience of this paper are practitioners in Scientific Computing who are not necessarily experts on multigrid or familiar with sophisticated discretisation techniques. By discussing implementation techniques for a powerful solver algorithm we aim to make it accessible to the wider community.

Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

TL;DR

The paper tackles efficient, scalable solvers for elliptic PDEs discretised with high-order DG methods by advocating a matrix-free,

-multigrid approach that leverages single-touch data access and auxiliary facet variables. It introduces a DG IP formulation augmented with left/right facet projections and flux variables, enabling a matrix-free Schur-complement reduction and a highly data-local smoother. A hybrid execution model combines loop fusion, facet-based data structures, and selective tasking to balance concurrency with overhead, achieving strong performance on modern manycore architectures. Numerical results demonstrate rapid convergence of the

-multigrid method and provide detailed performance analyses, including memory traffic reductions, scaling behavior, and basis-choice implications. The work offers practical, implementable techniques for HPC practitioners to realize efficient, scalable high-order DG solvers and suggests avenues for future enhancements in multigrid smoothers and asynchronous solvers.

Abstract

-multigrid. Separating cell- and facet-operations by introducing auxiliary facet variables localizes data access, reduces the need for frequent synchronization, and enables overlap of computation and communication. Loop fusion results in a single-touch scheme which reads (cell) data only once per smoothing step. We interpret the resulting execution strategies in the context of a task formalism, which exposes additional concurreny. The target audience of this paper are practitioners in Scientific Computing who are not necessarily experts on multigrid or familiar with sophisticated discretisation techniques. By discussing implementation techniques for a powerful solver algorithm we aim to make it accessible to the wider community.

Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

TL;DR

Abstract

Implementation techniques for multigrid solvers for high-order Discontinuous Galerkin methods

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)

Theorems & Definitions (7)