Table of Contents
Fetching ...

Inexact Gauss Seidel and Coarse Solvers for AMG and s-step CG

Stephen Thomas, Pasqua D'Ambra

TL;DR

The paper tackles exascale synchronization bottlenecks in Krylov solvers by replacing expensive Gram factorizations in s-step CG with an inexact, low-synchronization Forward Gauss-Seidel approach that leverages Chebyshev-based Gram matrices.It establishes a fundamental algebraic link: one FGS sweep is equivalent to one Modified Gram–Schmidt step in the $A$-norm, enabling stable, accurate projection coefficients while reducing inner-solve cost from $O(s^{3})$ to $O(\nu s^{2})$ per outer iteration.The authors analyze conditioning and stability, showing Chebyshev bases yield off-diagonal decay and polynomial Gram conditioning $\kappa(G)=O(s^{2})$, with $\|L\|_{F}=O(\sqrt{s})$, supporting practical inner iterations ($\nu\approx 20$–$30$).They extend the approach to AMG coarse-grid solves, including a streaming matrix-free variant that avoids forming dense coarse operators, reducing memory and transfer overhead while preserving convergence, and provide inexact-s-step CG convergence conditions.Numerical experiments on large-scale 3D Poisson problems demonstrate restored weak scalability with 20–30 FGS sweeps up to 64 GPUs and show the method's potential for substantial performance and memory benefits in exascale settings.

Abstract

Communication-avoiding Krylov methods require solving small dense Gram systems at each outer iteration. We present a low-synchronization approach based on Forward Gauss--Seidel (FGS), which exploits the structure of Gram matrices arising from Chebyshev polynomial bases. We show that a single FGS sweep is mathematically equivalent to Modified Gram--Schmidt (MGS) orthogonalization in the $A$-norm and provide corresponding backward error bounds. For weak scaling on AMD MI-series GPUs, we demonstrate that 20--30 FGS iterations preserve scalability up to 64 GPUs with problem sizes exceeding 700 million unknowns. We further extend this approach to Algebraic MultiGrid (AMG) coarse-grid solves, removing the need to assemble or factor dense coarse operators

Inexact Gauss Seidel and Coarse Solvers for AMG and s-step CG

TL;DR

The paper tackles exascale synchronization bottlenecks in Krylov solvers by replacing expensive Gram factorizations in s-step CG with an inexact, low-synchronization Forward Gauss-Seidel approach that leverages Chebyshev-based Gram matrices.It establishes a fundamental algebraic link: one FGS sweep is equivalent to one Modified Gram–Schmidt step in the $A$-norm, enabling stable, accurate projection coefficients while reducing inner-solve cost from $O(s^{3})$ to $O(\nu s^{2})$ per outer iteration.The authors analyze conditioning and stability, showing Chebyshev bases yield off-diagonal decay and polynomial Gram conditioning $\kappa(G)=O(s^{2})$, with $\|L\|_{F}=O(\sqrt{s})$, supporting practical inner iterations ($\nu\approx 20$–$30$).They extend the approach to AMG coarse-grid solves, including a streaming matrix-free variant that avoids forming dense coarse operators, reducing memory and transfer overhead while preserving convergence, and provide inexact-s-step CG convergence conditions.Numerical experiments on large-scale 3D Poisson problems demonstrate restored weak scalability with 20–30 FGS sweeps up to 64 GPUs and show the method's potential for substantial performance and memory benefits in exascale settings.

Abstract

Communication-avoiding Krylov methods require solving small dense Gram systems at each outer iteration. We present a low-synchronization approach based on Forward Gauss--Seidel (FGS), which exploits the structure of Gram matrices arising from Chebyshev polynomial bases. We show that a single FGS sweep is mathematically equivalent to Modified Gram--Schmidt (MGS) orthogonalization in the -norm and provide corresponding backward error bounds. For weak scaling on AMD MI-series GPUs, we demonstrate that 20--30 FGS iterations preserve scalability up to 64 GPUs with problem sizes exceeding 700 million unknowns. We further extend this approach to Algebraic MultiGrid (AMG) coarse-grid solves, removing the need to assemble or factor dense coarse operators

Paper Structure

This paper contains 9 sections, 9 theorems, 14 equations, 1 table.

Key Result

proposition 1

Let $\alpha^{(\nu+1)}$ satisfy $(D+L)\alpha^{(\nu+1)} = P^T r - L^T\alpha^{(\nu)}$. Then $r^{(\nu+1)} = P^T r - G\alpha^{(\nu+1)} = -L^T\alpha^{(\nu+1)} + L^T\alpha^{(\nu)},$ and for $\alpha^{(0)}=0$, $r^{(1)}=-L^T\alpha^{(1)}$.

Theorems & Definitions (16)

  • definition 1: Chebyshev Polynomial Basis
  • proposition 1: FGS Residual Structure
  • proof
  • theorem 1: Backward Stability
  • proof
  • corollary 1: Error Bound
  • lemma 1: Multi-Sweep Analysis
  • proof
  • theorem 2: Polynomial Growth
  • proof
  • ...and 6 more