Table of Contents
Fetching ...

Towards Code Generation for Octree-Based Multigrid Solvers

Richard Angersbach, Sebastian Kuckuck, Harald Köstler

TL;DR

The paper addresses the challenge of solving PDEs with multigrid methods on locally refined octree meshes by introducing a code-generation workflow that produces optimized multigrid solvers and refined communication kernels. It couples ExaStencils with waLBerla to generate specialized kernels and communication routines that support coarse-to-fine and fine-to-coarse transfers while maintaining convergence of the discretization. Key contributions include new C2F and F2C schemes that minimize ghost data, dynamic data-structures to handle variable neighbor counts, a modular communication design, and comprehensive weak-scaling validation on the SuperMUC-NG cluster demonstrating substantial performance gains over manual implementations. The work shows that automated generation of refinement-aware multigrid components can yield accurate, scalable solvers for octree-based domains and paves the way for portability to other coefficients and accelerator platforms.

Abstract

This paper presents a novel method designed to generate multigrid solvers optimized for octree-based software frameworks. Our approach focuses on accurately capturing local features within a domain while leveraging the efficiency inherent in multigrid techniques. We outline the essential steps involved in generating specialized kernels for local refinement and communication routines, integrating on-the-fly interpolations to seamlessly transfer information between refinement levels. For this purpose, we established a software coupling via an automatic fusion of generated multigrid solvers and communication kernels with manual implementations of complex octree data structures and algorithms often found in established software frameworks. We demonstrate the effectiveness of our method through numerical experiments with different interpolation orders. Large-scale benchmarks conducted on the SuperMUC-NG CPU cluster underscore the advantages of our approach, offering a comparison against a reference implementation to highlight the benefits of our method and code generation in general.

Towards Code Generation for Octree-Based Multigrid Solvers

TL;DR

The paper addresses the challenge of solving PDEs with multigrid methods on locally refined octree meshes by introducing a code-generation workflow that produces optimized multigrid solvers and refined communication kernels. It couples ExaStencils with waLBerla to generate specialized kernels and communication routines that support coarse-to-fine and fine-to-coarse transfers while maintaining convergence of the discretization. Key contributions include new C2F and F2C schemes that minimize ghost data, dynamic data-structures to handle variable neighbor counts, a modular communication design, and comprehensive weak-scaling validation on the SuperMUC-NG cluster demonstrating substantial performance gains over manual implementations. The work shows that automated generation of refinement-aware multigrid components can yield accurate, scalable solvers for octree-based domains and paves the way for portability to other coefficients and accelerator platforms.

Abstract

This paper presents a novel method designed to generate multigrid solvers optimized for octree-based software frameworks. Our approach focuses on accurately capturing local features within a domain while leveraging the efficiency inherent in multigrid techniques. We outline the essential steps involved in generating specialized kernels for local refinement and communication routines, integrating on-the-fly interpolations to seamlessly transfer information between refinement levels. For this purpose, we established a software coupling via an automatic fusion of generated multigrid solvers and communication kernels with manual implementations of complex octree data structures and algorithms often found in established software frameworks. We demonstrate the effectiveness of our method through numerical experiments with different interpolation orders. Large-scale benchmarks conducted on the SuperMUC-NG CPU cluster underscore the advantages of our approach, offering a comparison against a reference implementation to highlight the benefits of our method and code generation in general.
Paper Structure (17 sections, 4 equations, 9 figures)

This paper contains 17 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Exemplary 2D mesh refinement in waLBerla. From left to right the grid is refined in the lower left corner.
  • Figure 2: Data exchange for communication for our exemplary 2D domain setup.
  • Figure 3: Proposed extra-/interpolation schemes for different refinement cases.
  • Figure 4: Weight computation for three different remapping cases.
  • Figure 5: 3D quadratic C2F extra-/interpolation scheme.
  • ...and 4 more figures