PEARL: Preconditioner Enhancement through Actor-critic Reinforcement Learning
David Millard, Arielle Carr, Stéphane Gaudreault, Ali Baheri
TL;DR
PEARL presents a novel reinforcement-learning framework for learning preconditioners to accelerate iterative solvers on symmetric positive definite systems. By formulating preconditioner discovery as a contextual bandit with an actor-critic architecture, the method directly optimizes solver performance and conditioning, and introduces a cosine scheduler to balance exploration and stability. The.actor learns incomplete Cholesky factors to build SPD preconditioners while the critic provides immediate reward signals, including a dual-objective combination of iterations and condition numbers. Empirical results indicate competitive solver speedups and more structured preconditioners compared to traditional approaches, with theoretical guarantees linking conditioning improvements to reduced iteration counts.
Abstract
We present PEARL (Preconditioner Enhancement through Actor-critic Reinforcement Learning), a novel approach to learning matrix preconditioners. Existing preconditioners such as Jacobi, Incomplete LU, and Algebraic Multigrid methods offer problem-specific advantages but rely heavily on hyperparameter tuning. Recent advances have explored using deep neural networks to learn preconditioners, though challenges such as misbehaved objective functions and costly training procedures remain. PEARL introduces a reinforcement learning approach for learning preconditioners, specifically, a contextual bandit formulation. The framework utilizes an actor-critic model, where the actor generates the incomplete Cholesky decomposition of preconditioners, and the critic evaluates them based on reward-specific feedback. To further guide the training, we design a dual-objective function, combining updates from the critic and condition number. PEARL contributes a generalizable preconditioner learning method, dynamic sparsity exploration, and cosine schedulers for improved stability and exploratory power. We compare our approach to traditional and neural preconditioners, demonstrating improved flexibility and iterative solving speed.
