Table of Contents
Fetching ...

A Task Parallel Orthonormalization Multigrid Method For Multiphase Elliptic Problems

Teoman Toprak, Florian Kummer

TL;DR

The paper tackles scalable solution of large multiphase elliptic PDEs by enhancing K-cycle multigrid (OrthoMG) with semi-asynchronous, task-based parallelism. It introduces a residual-minimization framework that overlaps smoothing and coarse-grid corrections across levels, reducing global synchronization while preserving robustness against anisotropies. Through XDG-based discretizations and a detailed experimental study, it demonstrates that the additive Schwarz smoother, combined with task-parallel OrthoMG, delivers superior runtime and strong scaling on 2D/3D Poisson and Stokes benchmarks, albeit with higher memory usage per core. The work paves the way for petascale/exascale-level solvers for multiphase PDEs on modern HPC architectures, including potential GPU offloading and adaptive workload management.

Abstract

Multigrid methods have been a popular approach for solving linear systems arising from the discretization of partial differential equations (PDEs) for several decades. They are particularly effective for accelerating convergence rates with optimal complexity in terms of both time and space. K-cycle orthonormalization multigrid is a robust variant of the multigrid method that combines the efficiency of multigrid with the robustness of Krylov-type residual minimalizations for problems with strong anisotropies. However, traditional implementations of K-cycle orthonormalization multigrid often rely on bulk-synchronous parallelism, which can limit scalability on modern high-performance computing (HPC) systems. This paper presents a task-parallel variant of the K-cycle orthonormalization multigrid method that leverages asynchronous execution to improve scalability and performance on large-scale parallel systems.

A Task Parallel Orthonormalization Multigrid Method For Multiphase Elliptic Problems

TL;DR

The paper tackles scalable solution of large multiphase elliptic PDEs by enhancing K-cycle multigrid (OrthoMG) with semi-asynchronous, task-based parallelism. It introduces a residual-minimization framework that overlaps smoothing and coarse-grid corrections across levels, reducing global synchronization while preserving robustness against anisotropies. Through XDG-based discretizations and a detailed experimental study, it demonstrates that the additive Schwarz smoother, combined with task-parallel OrthoMG, delivers superior runtime and strong scaling on 2D/3D Poisson and Stokes benchmarks, albeit with higher memory usage per core. The work paves the way for petascale/exascale-level solvers for multiphase PDEs on modern HPC architectures, including potential GPU offloading and adaptive workload management.

Abstract

Multigrid methods have been a popular approach for solving linear systems arising from the discretization of partial differential equations (PDEs) for several decades. They are particularly effective for accelerating convergence rates with optimal complexity in terms of both time and space. K-cycle orthonormalization multigrid is a robust variant of the multigrid method that combines the efficiency of multigrid with the robustness of Krylov-type residual minimalizations for problems with strong anisotropies. However, traditional implementations of K-cycle orthonormalization multigrid often rely on bulk-synchronous parallelism, which can limit scalability on modern high-performance computing (HPC) systems. This paper presents a task-parallel variant of the K-cycle orthonormalization multigrid method that leverages asynchronous execution to improve scalability and performance on large-scale parallel systems.

Paper Structure

This paper contains 18 sections, 12 equations, 7 figures, 4 tables, 5 algorithms.

Figures (7)

  • Figure 1: Schematic illustration of the synchronous (a) and task-parallel (b) orthonormalization multigrid methods with four mesh levels distributed across processing units. Each column corresponds to a mesh level, while each row indicates the associated groups. Symbols indicate the operations on each level, and the filled portions exhibit how work is distributed among the groups.
  • Figure 2: Detailed illustration of the adaptive loop. In contrast to a standard synchronous multigrid method, the task-parallel configuration has multiple execution units, which are asynchronous but exchange information at specific points.
  • Figure 3: Runtime comparison of OrthoMG with Block-Jacobi (blue) and Additive Schwarz (red) smoothers for varying numbers of cores for the Poisson problem with $128\times128$ cells and $p=5$. The left (multiplicative synchronous) and right (additive task-parallel) panels correspond to different parallelization strategies given in \ref{['tab:orthomg-configs']}.
  • Figure 4: Convergence comparison of configurations of OrthoMG for the 2D Poisson problem with the additive Schwarz smoother, polynomial degree $p=3$, resolution of $128^2$ cells simulated with 48 cores
  • Figure 5: Convergence comparison of configurations of OrthoMG for the 2D Stokes problem with the additive Schwarz smoother, polynomial degree $p=2$, resolution of $128^2$ cells simulated with 384 cores
  • ...and 2 more figures