PEARL: Preconditioner Enhancement through Actor-critic Reinforcement Learning

David Millard; Arielle Carr; Stéphane Gaudreault; Ali Baheri

PEARL: Preconditioner Enhancement through Actor-critic Reinforcement Learning

David Millard, Arielle Carr, Stéphane Gaudreault, Ali Baheri

TL;DR

PEARL presents a novel reinforcement-learning framework for learning preconditioners to accelerate iterative solvers on symmetric positive definite systems. By formulating preconditioner discovery as a contextual bandit with an actor-critic architecture, the method directly optimizes solver performance and conditioning, and introduces a cosine scheduler to balance exploration and stability. The.actor learns incomplete Cholesky factors to build SPD preconditioners while the critic provides immediate reward signals, including a dual-objective combination of iterations and condition numbers. Empirical results indicate competitive solver speedups and more structured preconditioners compared to traditional approaches, with theoretical guarantees linking conditioning improvements to reduced iteration counts.

Abstract

We present PEARL (Preconditioner Enhancement through Actor-critic Reinforcement Learning), a novel approach to learning matrix preconditioners. Existing preconditioners such as Jacobi, Incomplete LU, and Algebraic Multigrid methods offer problem-specific advantages but rely heavily on hyperparameter tuning. Recent advances have explored using deep neural networks to learn preconditioners, though challenges such as misbehaved objective functions and costly training procedures remain. PEARL introduces a reinforcement learning approach for learning preconditioners, specifically, a contextual bandit formulation. The framework utilizes an actor-critic model, where the actor generates the incomplete Cholesky decomposition of preconditioners, and the critic evaluates them based on reward-specific feedback. To further guide the training, we design a dual-objective function, combining updates from the critic and condition number. PEARL contributes a generalizable preconditioner learning method, dynamic sparsity exploration, and cosine schedulers for improved stability and exploratory power. We compare our approach to traditional and neural preconditioners, demonstrating improved flexibility and iterative solving speed.

PEARL: Preconditioner Enhancement through Actor-critic Reinforcement Learning

TL;DR

Abstract

Paper Structure (24 sections, 2 theorems, 45 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 2 theorems, 45 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Literature Review
Preliminaries
Conjugate Gradient Method
Preconditioning the Conjugate Gradient Method
Contextual Bandit
Actor-Critic
Methodology
Training Data Generation
Objectives
Models
Actor (Incomplete Cholesky Decomposition)
Critic (Single-Reward)
Critic (Multi-Reward)
Cosine Scheduler
...and 9 more sections

Key Result

Theorem 1

(Condition Number and CG Complexity) Let $\mathbf{A}$ be an $n \times n$ symmetric positive-definite (SPD) matrix, and let $\mathbf{M}$ be an $n \times n$ SPD preconditioner. Consider the preconditioned system Define $\kappa=\kappa\left(\mathbf{M}^{-1} \mathbf{A}\right)=\frac{\lambda_{\max }\left(\mathbf{M}^{-1} \mathbf{A}\right)}{\lambda_{\min }\left(\mathbf{M}^{-1} \mathbf{A}\right)}$, where $\

Figures (9)

Figure 1: Cosine-scheduler weights are plotted over time, with the waves slightly shifted to introduce additional stochasticity to the crest and troughs of the amplitude.
Figure 2: Shown (left-to-right) are the training losses for the actor and critic using the standard procedure, cosine-scheduled procedure, and critic-only procedure. The theoretical minimum value of the critic loss is 0. For the actor loss, the theoretical minimum is -1.96, derived by summing the minimum condition effect (1.0) and the negated maximum actor reward -2.96.
Figure 3: Residual analysis of the conjugate gradient (CG) solver was conducted on a randomly sampled system with an approximate condition number of 10,000. The results, depicted in the graph, demonstrate a slight advantage when employing the dual-objective cosine-scheduled training procedure.
Figure 4: Shown are the timesteps at 2500, 1000, and 100 for the four models compared in Table \ref{['tab:model_stats']}. From top to bottom, the models are condition-only, condition/critic fixed schedule, condition/critic cosine-scheduled, and critic-only. Qualitatively, the condition/critic cosine-scheduled model produces more organized preconditioners, but takes longer to converge. Additionally, the critic-only model exhibits slow structural changes, where only one or two elements of the preconditioner are adjusted at a time.
Figure 5: Output from a condition-only logit model, estimating the incomplete LU factorization. The model collapses to a completely collinear matrix. The condition number reaches nearly 1.0 but the solver runtime is worsened.
...and 4 more figures

Theorems & Definitions (4)

Theorem 1
Theorem 2
proof : Proof for Theorem 1
proof : Proof for Theorem 2

PEARL: Preconditioner Enhancement through Actor-critic Reinforcement Learning

TL;DR

Abstract

PEARL: Preconditioner Enhancement through Actor-critic Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (4)