Mixed precision multigrid with smoothing based on incomplete Cholesky factorization
Petr Vacek, Hartwig Anzt, Erin Carson, Nils Kohl, Ulrich Rüde, Yu-Hsiang Tsai
TL;DR
This work develops a comprehensive framework for mixed-precision multigrid methods, focusing on V-cycles with smoothing based on incomplete Cholesky factorization. It presents a general finite-precision error model, derives contraction-based bounds for two-grid and multilevel cycles, and provides a detailed analysis of IC smoothing under mixed precision, including substitution-based triangular solves. The authors show that IC smoothing can often be performed in markedly lower precision than the residual, restriction, prolongation, and correction steps, enabling notable speedups and energy savings on both simulated and GPU-based platforms. Numerical experiments on 1D and 3D Poisson problems validate the theory and demonstrate practical gains, with speedups reaching around $1.43\times$ and energy reductions up to $71\%$ in some configurations, as well as robust performance within PCG preconditioning. The results offer guidance for configuring per-level precisions and scaling strategies to harness hardware efficiency without sacrificing convergence.
Abstract
Multigrid methods are popular iterative methods for solving large-scale sparse systems of linear equations. We present a mixed precision formulation of the multigrid V-cycle with general assumptions on the finite precision errors coming from the application of coarsest-level solver and smoothing. Inspired by existing analysis, we derive a bound on the relative finite precision error of the V-cycle which gives insight into how the finite precision errors from the individual components of the method may affect the overall finite precision error. We use the result to study V-cycle methods with smoothing based on incomplete Cholesky factorization. The results imply that in certain settings the precisions used for applying the IC smoothing can be significantly lower than the precision used for computing the residual, restriction, prolongation and correction on the concrete level. We perform numerical experiments using simulated floating point arithmetic with the MATLAB Advanpix toolbox as well as experiments computed on GPUs using the Ginkgo library. The experiments illustrate the theoretical findings and show that in the considered settings the IC smoothing can be applied in relatively low precisions, resulting in significant speedups (up to 1.43x) and energy savings (down to 71%) in comparison with the uniform double precision variant.
