Table of Contents
Fetching ...

Mixed precision multigrid with smoothing based on incomplete Cholesky factorization

Petr Vacek, Hartwig Anzt, Erin Carson, Nils Kohl, Ulrich Rüde, Yu-Hsiang Tsai

TL;DR

This work develops a comprehensive framework for mixed-precision multigrid methods, focusing on V-cycles with smoothing based on incomplete Cholesky factorization. It presents a general finite-precision error model, derives contraction-based bounds for two-grid and multilevel cycles, and provides a detailed analysis of IC smoothing under mixed precision, including substitution-based triangular solves. The authors show that IC smoothing can often be performed in markedly lower precision than the residual, restriction, prolongation, and correction steps, enabling notable speedups and energy savings on both simulated and GPU-based platforms. Numerical experiments on 1D and 3D Poisson problems validate the theory and demonstrate practical gains, with speedups reaching around $1.43\times$ and energy reductions up to $71\%$ in some configurations, as well as robust performance within PCG preconditioning. The results offer guidance for configuring per-level precisions and scaling strategies to harness hardware efficiency without sacrificing convergence.

Abstract

Multigrid methods are popular iterative methods for solving large-scale sparse systems of linear equations. We present a mixed precision formulation of the multigrid V-cycle with general assumptions on the finite precision errors coming from the application of coarsest-level solver and smoothing. Inspired by existing analysis, we derive a bound on the relative finite precision error of the V-cycle which gives insight into how the finite precision errors from the individual components of the method may affect the overall finite precision error. We use the result to study V-cycle methods with smoothing based on incomplete Cholesky factorization. The results imply that in certain settings the precisions used for applying the IC smoothing can be significantly lower than the precision used for computing the residual, restriction, prolongation and correction on the concrete level. We perform numerical experiments using simulated floating point arithmetic with the MATLAB Advanpix toolbox as well as experiments computed on GPUs using the Ginkgo library. The experiments illustrate the theoretical findings and show that in the considered settings the IC smoothing can be applied in relatively low precisions, resulting in significant speedups (up to 1.43x) and energy savings (down to 71%) in comparison with the uniform double precision variant.

Mixed precision multigrid with smoothing based on incomplete Cholesky factorization

TL;DR

This work develops a comprehensive framework for mixed-precision multigrid methods, focusing on V-cycles with smoothing based on incomplete Cholesky factorization. It presents a general finite-precision error model, derives contraction-based bounds for two-grid and multilevel cycles, and provides a detailed analysis of IC smoothing under mixed precision, including substitution-based triangular solves. The authors show that IC smoothing can often be performed in markedly lower precision than the residual, restriction, prolongation, and correction steps, enabling notable speedups and energy savings on both simulated and GPU-based platforms. Numerical experiments on 1D and 3D Poisson problems validate the theory and demonstrate practical gains, with speedups reaching around and energy reductions up to in some configurations, as well as robust performance within PCG preconditioning. The results offer guidance for configuring per-level precisions and scaling strategies to harness hardware efficiency without sacrificing convergence.

Abstract

Multigrid methods are popular iterative methods for solving large-scale sparse systems of linear equations. We present a mixed precision formulation of the multigrid V-cycle with general assumptions on the finite precision errors coming from the application of coarsest-level solver and smoothing. Inspired by existing analysis, we derive a bound on the relative finite precision error of the V-cycle which gives insight into how the finite precision errors from the individual components of the method may affect the overall finite precision error. We use the result to study V-cycle methods with smoothing based on incomplete Cholesky factorization. The results imply that in certain settings the precisions used for applying the IC smoothing can be significantly lower than the precision used for computing the residual, restriction, prolongation and correction on the concrete level. We perform numerical experiments using simulated floating point arithmetic with the MATLAB Advanpix toolbox as well as experiments computed on GPUs using the Ginkgo library. The experiments illustrate the theoretical findings and show that in the considered settings the IC smoothing can be applied in relatively low precisions, resulting in significant speedups (up to 1.43x) and energy savings (down to 71%) in comparison with the uniform double precision variant.

Paper Structure

This paper contains 18 sections, 6 theorems, 66 equations, 2 figures, 4 tables, 3 algorithms.

Key Result

Theorem 3.1

\newlabelthm:tg0 Let $\mathbf{y}_{\mathrm{TG}}$ and $\hat{\mathbf{y}}_{\mathrm{TG}}$ be the approximate solution of $\mathbf{A}\mathbf{y}=\mathbf{f}$ computed using one TG cycle (alg:two-grid) applied in exact and finite precision, respectively. The $\mathbf{A}$-norm of the finite precision error where $C_1$ and $C_2$ are positive constants depending on $\| \mathbf{A}\|$, $\| | \mathbf{A} | \|$,

Figures (2)

  • Figure 1: Left: 1D Poisson eq., FEM-P5 disc. Properties of $\mathbf{A}_j$ and $\mathbf{L}_j$ for IC(0) and ICT(dpt=$5 \cdot 10^{-3}$). Right: 3D Poisson equation, FEM-P5 disc. Properties of $\mathbf{A}_j$ and $\mathbf{L}_j$ (IC(0)).
  • Figure 2: 1D Poisson eq., FEM-P5 disc., solved by IR-V-cycle-IC. The plot on the left contains the values of $\dot{d}_{J,\mathrm{min}}$ and $d^{\mathrm{S}}_{J,\mathrm{min}}$, i.e., the minimal values of $\dot{d}_J$ and $d^{\mathrm{S}}_J$ such that the variant with $\dot{\varepsilon}_J=10^{-\dot{d}_J}$-precision and $\varepsilon^{\mathrm{S}}_J=\varepsilon^{\mathrm{R}}_J=10^{-d^{\mathrm{S}}_J}$-precision converges in the same number of IR iterations as the corresponding variant in double precision. The lines are labelled as $\dot{d}_{J,\mathrm{min}}$ (\ref{['line:IC_d_dot']}), $d^{\mathrm{S}}_{J,\mathrm{min}}$ (\ref{['line:IC_d_s']}) for the variant with IC(0) and $\dot{d}_{J,\mathrm{min}}$ (\ref{['line:ICT_d_dot']}), $d^{\mathrm{S}}_{J,\mathrm{min}}$ (\ref{['line:ICT_d_s']}) for the variant with ICT(dpt=$5\cdot10^{-3}$). For reference we also plot the number of digits for double precision (\ref{['line:double']}). The plot on the right contains the number of IR iterations required for convergence for the variants in double precision with IC(0) (\ref{['line:IC_iter']}) and ICT(dpt=$5\cdot10^{-3}$) (\ref{['line:ICT_iter']}).

Theorems & Definitions (10)

  • Theorem 3.1
  • Proof 1: Proof of \ref{['thm:tg']}
  • Theorem 4.1
  • Proof 2: Proof of \ref{['thm:V-cycle']}
  • Lemma 5.1
  • Lemma 5.2
  • Proof 3
  • Theorem 5.3
  • Proof 4
  • Lemma C.1