Table of Contents
Fetching ...

Parameter optimization for restarted mixed precision iterative sparse solver

Alexander V. Prolubnikov

TL;DR

This work tackles the problem of optimizing precision switching in the conjugate gradient method for sparse SPD systems by deploying a two-stage mixed-precision approach: an initial single-precision CG stage with tolerance $ε_1$ followed by a double-precision refinement to $ε_2$. The optimal $ε_1$ is predicted from a small set of rapidly computable matrix features $χ(A)=(n,m,\tilde{\ell},v)$ using kNN classification, where $\tilde{\ell}$ is a pseudo-diameter estimated via 2 BFS, capturing how graph structure influences rounding error growth. The authors show that the matrix graph diameter, together with $n,m$ and early residual decay, governs the efficiency of the mixed-precision CG, achieving average speedups around 22–32% over full double-precision CG across several matrix classes, with a modest gap to the oracle choice of $ε_1$ (≤1.5%). They also argue that neural networks are ill-suited for this prediction task due to irregular mappings, and they demonstrate the algorithm’s effectiveness through extensive experiments on extended-star, random sparse, and banded sparse matrices. Overall, the method provides a practical, low-overhead mechanism to reduce double-precision iterations while maintaining accuracy, with clear guidance on when graph diameter dominates convergence behavior.

Abstract

The problem of optimal precision switching for the conjugate gradient (CG) method applied to sparse linear systems is considered. A sparse matrix is defined as an $n\!\times\!n$ matrix with $m\!=\!O(n)$ nonzero entries. The algorithm first computes an approximate solution in single precision with tolerance $\varepsilon_1$, then switches to double precision to refine the solution to the required stopping tolerance $\varepsilon_2$. Based on estimates of system matrix parameters -- computed in time which does not exceed $1\%$ of the time needed to solve the system in double precision -- we determine the optimal value of $\varepsilon_1$ that minimizes total computation time. This value is obtained by classifying the matrix using the $k$-nearest neighbors method on a small precomputed sample. Classification relies on a feature vector comprising: the matrix size $n$, the number of nonzeros $m$, the pseudo-diameter of the matrix sparsity graph, and the average rate of residual norm decay during the early CG iterations in single precision. We show that, in addition to the matrix condition number, the diameter of the sparsity graph influences the growth of rounding errors during iterative computations. The proposed algorithm reduces the computational complexity of the CG -- expressed in equivalent double-precision iterations -- by more than $17\%$ on average across the considered matrix types in a sequential setting. The resulting speedup is at most $1.5\%$ worse than that achieved with the optimal (oracle) choice of $\varepsilon_1$. While the impact of matrix structure on Krylov subspace method convergence is well understood, the use of the sparsity graph diameter as a predictive feature for rounding error growth in mixed-precision CG appears to be novel. To the best of our knowledge, no prior work employs graph diameter to guide precision switching in iterative linear solvers.

Parameter optimization for restarted mixed precision iterative sparse solver

TL;DR

This work tackles the problem of optimizing precision switching in the conjugate gradient method for sparse SPD systems by deploying a two-stage mixed-precision approach: an initial single-precision CG stage with tolerance followed by a double-precision refinement to . The optimal is predicted from a small set of rapidly computable matrix features using kNN classification, where is a pseudo-diameter estimated via 2 BFS, capturing how graph structure influences rounding error growth. The authors show that the matrix graph diameter, together with and early residual decay, governs the efficiency of the mixed-precision CG, achieving average speedups around 22–32% over full double-precision CG across several matrix classes, with a modest gap to the oracle choice of (≤1.5%). They also argue that neural networks are ill-suited for this prediction task due to irregular mappings, and they demonstrate the algorithm’s effectiveness through extensive experiments on extended-star, random sparse, and banded sparse matrices. Overall, the method provides a practical, low-overhead mechanism to reduce double-precision iterations while maintaining accuracy, with clear guidance on when graph diameter dominates convergence behavior.

Abstract

The problem of optimal precision switching for the conjugate gradient (CG) method applied to sparse linear systems is considered. A sparse matrix is defined as an matrix with nonzero entries. The algorithm first computes an approximate solution in single precision with tolerance , then switches to double precision to refine the solution to the required stopping tolerance . Based on estimates of system matrix parameters -- computed in time which does not exceed of the time needed to solve the system in double precision -- we determine the optimal value of that minimizes total computation time. This value is obtained by classifying the matrix using the -nearest neighbors method on a small precomputed sample. Classification relies on a feature vector comprising: the matrix size , the number of nonzeros , the pseudo-diameter of the matrix sparsity graph, and the average rate of residual norm decay during the early CG iterations in single precision. We show that, in addition to the matrix condition number, the diameter of the sparsity graph influences the growth of rounding errors during iterative computations. The proposed algorithm reduces the computational complexity of the CG -- expressed in equivalent double-precision iterations -- by more than on average across the considered matrix types in a sequential setting. The resulting speedup is at most worse than that achieved with the optimal (oracle) choice of . While the impact of matrix structure on Krylov subspace method convergence is well understood, the use of the sparsity graph diameter as a predictive feature for rounding error growth in mixed-precision CG appears to be novel. To the best of our knowledge, no prior work employs graph diameter to guide precision switching in iterative linear solvers.

Paper Structure

This paper contains 49 sections, 65 equations, 5 figures, 44 tables.

Figures (5)

  • Figure 1: Star and path graphs.
  • Figure 2: Residual and error norm convergence for Algorithm I and CG in two precisions applied to a star graph matrix.
  • Figure 3: Residual and error norm convergence for Algorithm I and CG in two precisions applied to a path graph matrix.
  • Figure 4: Extended star graph.
  • Figure 5: Good and bad localization of feature vectors from the sample.