Table of Contents
Fetching ...

Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs

Zherui Yang, Zhehao Li, Kangbo Lyu, Yixuan Li, Tao Du, Ligang Liu

TL;DR

This work tackles accelerating Conjugate Gradient solvers for SPD sparse systems on GPUs by learning GPU-friendly Sparse Approximate Inverse preconditioners with Graph Neural Networks. The preconditioner is constructed as M^{-1} ≈ G G^T + ε I, enabling a preconditioning step that uses only two sparse matrix–vector products per CG iteration, thus exploiting SpMV locality. Central to the approach is the Scale invariant Aligned Identity (SAI) loss, which normalizes by ||A|| to align learning with CG's convergence behavior and improves robustness across varying matrix scales. Across three PDE-derived and a synthetic dataset, the method delivers substantial GPU speedups (up to 113% over baselines) and improved conditioning, with strong generalization and robustness, while maintaining GPU-friendly construction and application dynamics. Limitations include fixed sparsity patterns and single-GPU scope, suggesting future work on dynamic sparsity, multi-GPU scalability, and extensions to broader Krylov methods and multilevel solvers.

Abstract

The conjugate gradient solver (CG) is a prevalent method for solving symmetric and positive definite linear systems Ax=b, where effective preconditioners are crucial for fast convergence. Traditional preconditioners rely on prescribed algorithms to offer rigorous theoretical guarantees, while limiting their ability to exploit optimization from data. Existing learning-based methods often utilize Graph Neural Networks (GNNs) to improve the performance and speed up the construction. However, their reliance on incomplete factorization leads to significant challenges: the associated triangular solve hinders GPU parallelization in practice, and introduces long-range dependencies which are difficult for GNNs to model. To address these issues, we propose a learning-based method to generate GPU-friendly preconditioners, particularly using GNNs to construct Sparse Approximate Inverse (SPAI) preconditioners, which avoids triangular solves and requires only two matrix-vector products at each CG step. The locality of matrix-vector product is compatible with the local propagation mechanism of GNNs. The flexibility of GNNs also allows our approach to be applied in a wide range of scenarios. Furthermore, we introduce a statistics-based scale-invariant loss function. Its design matches CG's property that the convergence rate depends on the condition number, rather than the absolute scale of A, leading to improved performance of the learned preconditioner. Evaluations on three PDE-derived datasets and one synthetic dataset demonstrate that our method outperforms standard preconditioners (Diagonal, IC, and traditional SPAI) and previous learning-based preconditioners on GPUs. We reduce solution time on GPUs by 40%-53% (68%-113% faster), along with better condition numbers and superior generalization performance. Source code available at https://github.com/Adversarr/LearningSparsePreconditioner4GPU

Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs

TL;DR

This work tackles accelerating Conjugate Gradient solvers for SPD sparse systems on GPUs by learning GPU-friendly Sparse Approximate Inverse preconditioners with Graph Neural Networks. The preconditioner is constructed as M^{-1} ≈ G G^T + ε I, enabling a preconditioning step that uses only two sparse matrix–vector products per CG iteration, thus exploiting SpMV locality. Central to the approach is the Scale invariant Aligned Identity (SAI) loss, which normalizes by ||A|| to align learning with CG's convergence behavior and improves robustness across varying matrix scales. Across three PDE-derived and a synthetic dataset, the method delivers substantial GPU speedups (up to 113% over baselines) and improved conditioning, with strong generalization and robustness, while maintaining GPU-friendly construction and application dynamics. Limitations include fixed sparsity patterns and single-GPU scope, suggesting future work on dynamic sparsity, multi-GPU scalability, and extensions to broader Krylov methods and multilevel solvers.

Abstract

The conjugate gradient solver (CG) is a prevalent method for solving symmetric and positive definite linear systems Ax=b, where effective preconditioners are crucial for fast convergence. Traditional preconditioners rely on prescribed algorithms to offer rigorous theoretical guarantees, while limiting their ability to exploit optimization from data. Existing learning-based methods often utilize Graph Neural Networks (GNNs) to improve the performance and speed up the construction. However, their reliance on incomplete factorization leads to significant challenges: the associated triangular solve hinders GPU parallelization in practice, and introduces long-range dependencies which are difficult for GNNs to model. To address these issues, we propose a learning-based method to generate GPU-friendly preconditioners, particularly using GNNs to construct Sparse Approximate Inverse (SPAI) preconditioners, which avoids triangular solves and requires only two matrix-vector products at each CG step. The locality of matrix-vector product is compatible with the local propagation mechanism of GNNs. The flexibility of GNNs also allows our approach to be applied in a wide range of scenarios. Furthermore, we introduce a statistics-based scale-invariant loss function. Its design matches CG's property that the convergence rate depends on the condition number, rather than the absolute scale of A, leading to improved performance of the learned preconditioner. Evaluations on three PDE-derived datasets and one synthetic dataset demonstrate that our method outperforms standard preconditioners (Diagonal, IC, and traditional SPAI) and previous learning-based preconditioners on GPUs. We reduce solution time on GPUs by 40%-53% (68%-113% faster), along with better condition numbers and superior generalization performance. Source code available at https://github.com/Adversarr/LearningSparsePreconditioner4GPU

Paper Structure

This paper contains 42 sections, 23 equations, 4 figures, 13 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of our approach: By inputting the matrix's nonzero entries $\mathbf{A}_{ij}$ and node features $a(x_i)$, the GNN processes these features through message passing, and outputs the entries of $\mathbf{G}_{ij}$. The sparse matrix $\mathbf{G}$ is assembled and then applied in the preconditioned CG solver.
  • Figure 2: Examples in our PDE-derived test cases.
  • Figure 3: Performance of CG with different preconditioners for the heat problem. Figure (a) compares the average total solve time $T_\mathrm{total}$ and preconditioner's construction time $T_\mathrm{construct}$ of CG with different preconditioners and devices. Figure (b) illustrates the relationship between matrix size and $T_\mathrm{total}$, including its 95% confidence interval, demonstrating the superior scalability of our approach on GPUs. Figure (c) compares the total solve time required to achieve different $\mathrm{rtol}$.
  • Figure 4: Condition number distributions. Median, IQR, and outliers are shown. A smaller conditioner number indicates better performance of the preconditioner, and the lower bound is 1.