Table of Contents
Fetching ...

Optimal Krylov On Average

Qi Luo, Florian Schäfer

TL;DR

This paper tackles the cost of solving linear systems inside Krylov-based inner loops by introducing an adaptive randomized truncation estimator (the AS estimator) that preserves unbiasedness while optimizing the trade-off between variance and computation. By formulating a constrained optimization over truncation probabilities $\mathbb{P}(j)$, it achieves closed-form solutions under a diminishing returns property for methods like CG/CR, with a generalized, unbiased alternative when the property fails. The AS-CG variant is demonstrated to outperform existing RR-CG approaches in speed-variance trade-offs and shows robust performance in GP hyperparameter optimization and competitive physics-informed neural networks. These results indicate that adaptive truncation can substantially improve reliability and efficiency of Krylov solvers in large-scale, data-driven scientific computing tasks.

Abstract

We propose an adaptive randomized truncation estimator for Krylov subspace methods that optimizes the trade-off between the solution variance and the computational cost, while remaining unbiased. The estimator solves a constrained optimization problem to compute the truncation probabilities on the fly, with minimal computational overhead. The problem has a closed-form solution when the improvement of the deterministic algorithm satisfies a diminishing returns property. We prove that obtaining the optimal adaptive truncation distribution is impossible in the general case. Without the diminishing return condition, our estimator provides a suboptimal but still unbiased solution. We present experimental results in GP hyperparameter training and competitive physics-informed neural networks problem to demonstrate the effectiveness of our approach.

Optimal Krylov On Average

TL;DR

This paper tackles the cost of solving linear systems inside Krylov-based inner loops by introducing an adaptive randomized truncation estimator (the AS estimator) that preserves unbiasedness while optimizing the trade-off between variance and computation. By formulating a constrained optimization over truncation probabilities , it achieves closed-form solutions under a diminishing returns property for methods like CG/CR, with a generalized, unbiased alternative when the property fails. The AS-CG variant is demonstrated to outperform existing RR-CG approaches in speed-variance trade-offs and shows robust performance in GP hyperparameter optimization and competitive physics-informed neural networks. These results indicate that adaptive truncation can substantially improve reliability and efficiency of Krylov solvers in large-scale, data-driven scientific computing tasks.

Abstract

We propose an adaptive randomized truncation estimator for Krylov subspace methods that optimizes the trade-off between the solution variance and the computational cost, while remaining unbiased. The estimator solves a constrained optimization problem to compute the truncation probabilities on the fly, with minimal computational overhead. The problem has a closed-form solution when the improvement of the deterministic algorithm satisfies a diminishing returns property. We prove that obtaining the optimal adaptive truncation distribution is impossible in the general case. Without the diminishing return condition, our estimator provides a suboptimal but still unbiased solution. We present experimental results in GP hyperparameter training and competitive physics-informed neural networks problem to demonstrate the effectiveness of our approach.

Paper Structure

This paper contains 17 sections, 3 theorems, 49 equations, 6 figures, 1 table, 5 algorithms.

Key Result

Lemma 2.1

Assume that the Krylov subspace iterative method has a diminishing returns property, i.e., $\alpha_{j-1}^2 \|q_{j-1}\|^2 > \alpha_{j}^2 \|q_{j}\|^2$ for every j. Then,

Figures (6)

  • Figure 1: Single Linear System: the variance with respect to the average number of CG iterations. The temperature parameter $\lambda$ is fixed for RR-CG in each line. For both methods, We vary the minimal truncation number and the initial truncation probability to change the variance and the average number of CG iterations.
  • Figure 2: Multiple Linear Systems: the variance with respect to the average number of CG iterations.
  • Figure 3: The GP optimization objective for models trained with AS-CG and RR-CG ($\lambda = 0.05$)
  • Figure 4: The GP optimization objective for models trained with AS-CG, the deterministic CG, and Cholesky. We pick $1$ trial each from AS-CG and the deterministic CG since they produce very similar training results within their own trials.
  • Figure 5: Comparison of CPINNs with deterministic GMRES and two randomized estimators for the Poisson equations
  • ...and 1 more figures

Theorems & Definitions (8)

  • Lemma 2.1
  • Proof 1
  • Definition 2.2
  • Theorem 2.3
  • Proof 2
  • Theorem 2.4
  • Proof 3
  • Definition 2.5