A mixed precision LOBPCG algorithm
Daniel Kressner, Yuxin Ma, Meiyue Shao
TL;DR
This work addresses efficiently computing a few smallest eigenpairs of a large Hermitian positive definite matrix $A$ by introducing a mixed-precision LOBPCG framework. It combines a reduced-precision (sparse) Cholesky preconditioner with mixed-precision orthogonalization and a two-stage workflow to obtain high-accuracy solutions with reduced cost. The authors provide a finite-precision convergence analysis showing that rounding errors in the preconditioner have only a marginal effect on convergence, and they demonstrate substantial speedups (up to roughly $2\times$ on CPUs/GPUs) in sparse and dense settings, including complex kernels. The approach significantly accelerates eigenvalue computations in practical applications while preserving accuracy, enabling more scalable large-scale eigenvalue problems.
Abstract
The locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm is a popular approach for computing a few smallest eigenvalues and the corresponding eigenvectors of a large Hermitian positive definite matrix A. In this work, we propose a mixed precision variant of LOBPCG that uses a (sparse) Cholesky factorization of A computed in reduced precision as the preconditioner. To further enhance performance, a mixed precision orthogonalization strategy is proposed. To analyze the impact of reducing precision in the preconditioner on performance, we carry out a rounding error and convergence analysis of PINVIT, a simplified variant of LOBPCG. Our theoretical results predict and our numerical experiments confirm that the impact on convergence remains marginal. In practice, our mixed precision LOBPCG algorithm typically reduces the computation time by a factor of 1.4--2.0 on both CPUs and GPUs.
