Table of Contents
Fetching ...

Preconditioning without a preconditioner: faster ridge-regression and Gaussian sampling with randomized block Krylov subspace methods

Tyler Chen, Caroline Huber, Ethan Lin, Hajar Zaid

TL;DR

This work introduces a randomized variant of the block conjugate gradient method that achieves faster convergence for solving regularized linear systems $A_bmu x=b$ without explicitly constructing a preconditioner. By augmenting the starting block with random sketches $\mathbf{\Omega}$, the method induces implicit preconditioning within the block-Krylov subspace, enabling guarantees that compare favorably to Nyström-based preconditioners while reducing matrix-load costs. The authors derive explicit probabilistic bounds and improved matrix-vector product complexities, enabling efficient computation of the entire ridge-regression path and fast Gaussian sampling from $\mathcal{N}(\mu, A)$. They support their theory with extensive numerical experiments showing gains in convergence speed and sampling accuracy, along with discussions on practical considerations such as reorthogonalization and block-size effects. The approach provides a new lens on block-Krylov methods, suggesting broad applicability to regression and matrix-function tasks beyond standard preconditioning frameworks.

Abstract

We describe a randomized variant of the block conjugate gradient method for solving a single positive-definite linear system of equations. Our method provably outperforms preconditioned conjugate gradient with a broad-class of Nyström-based preconditioners, without ever explicitly constructing a preconditioner. In analyzing our algorithm, we derive theoretical guarantees for new variants of Nyström preconditioned conjugate gradient which may be of separate interest. We also describe how our approach yields state-of-the-art algorithms for key data-science tasks such as computing the entire ridge regression regularization path and generating multiple independent samples from a high-dimensional Gaussian distribution.

Preconditioning without a preconditioner: faster ridge-regression and Gaussian sampling with randomized block Krylov subspace methods

TL;DR

This work introduces a randomized variant of the block conjugate gradient method that achieves faster convergence for solving regularized linear systems without explicitly constructing a preconditioner. By augmenting the starting block with random sketches , the method induces implicit preconditioning within the block-Krylov subspace, enabling guarantees that compare favorably to Nyström-based preconditioners while reducing matrix-load costs. The authors derive explicit probabilistic bounds and improved matrix-vector product complexities, enabling efficient computation of the entire ridge-regression path and fast Gaussian sampling from . They support their theory with extensive numerical experiments showing gains in convergence speed and sampling accuracy, along with discussions on practical considerations such as reorthogonalization and block-size effects. The approach provides a new lens on block-Krylov methods, suggesting broad applicability to regression and matrix-function tasks beyond standard preconditioning frameworks.

Abstract

We describe a randomized variant of the block conjugate gradient method for solving a single positive-definite linear system of equations. Our method provably outperforms preconditioned conjugate gradient with a broad-class of Nyström-based preconditioners, without ever explicitly constructing a preconditioner. In analyzing our algorithm, we derive theoretical guarantees for new variants of Nyström preconditioned conjugate gradient which may be of separate interest. We also describe how our approach yields state-of-the-art algorithms for key data-science tasks such as computing the entire ridge regression regularization path and generating multiple independent samples from a high-dimensional Gaussian distribution.

Paper Structure

This paper contains 40 sections, 17 theorems, 91 equations, 11 figures.

Key Result

Corollary 2.3

[corollary]thm:pcg_condno_bd Let $\mathbf{P}_\mu$ be any preconditioner. Then the $t$-th def:PCG iterate corresponding to the preconditioner $\mathbf{P}_\mu$ satisfies

Figures (11)

  • Figure 1.1: Relative error $\|\mathbf{A}^{-1}\mathbf{b} - \mathsf{alg}\|_{\mathbf{A}} / \|\mathbf{A}^{-1}\mathbf{b}\|_{\mathbf{A}}$ in terms of matrix-loads (left) and wall-clock time (right) for our proposed randomized variant of the block conjugate gradient method (), standard conjugate gradient (), Nyström preconditioned conjugate gradient from frangella_tropp_udell_23 (), and generalizations Nyström preconditioned conjugate gradient using higher-depth Nyström approximations (). Our method outperforms all these methods without the need for selecting hyperparameters (see \ref{['thm:main']}), which may be difficult to do effectively in practice. In particular, we store $\mathbf{A}$ in 8 separate $1000\times 8000$ chunks and perform (block) matrix-vector products with $\mathbf{A}$ by sequentially loading a single chunk from the disk into random access memory and performing the appropriate part of the products. The runtime is dominated by the cost of loading chunks of the matrix into memory, so the wall-clock-time is nearly proportional to matrix-loads. Full experiment description in \ref{['sec:numerical:convergence']}.
  • Figure 7.1: Relative error $\|\mathbf{A}^{-1}\mathbf{b} - \mathsf{alg}\|_{\mathbf{A}} / \|\mathbf{A}^{-1}\mathbf{b}\|_{\mathbf{A}}$ versus matrix-loads for block-CG (), CG (), and Nyström PCG with $s=1$ () and $s=3$ () on several test problems.
  • Figure 7.2: Relative error $\|\mathbf{A}_\mu^{-1}\mathbf{b} - \mathsf{alg}\|_{\mathbf{A}_\mu} / \|\mathbf{A}_\mu^{-1}\mathbf{b}\|_{\mathbf{A}_\mu}$ after a fixed number of matrix-loads as a function of the regularization parameter $\mu$ for block-CG (), CG (), and Nyström PCG with $s=1$ () and $s=3$ ().
  • Figure 7.3: Maximum relative sample error $\max_i \|\mathbf{A}^{1/2}\mathbf{b}_i - \mathsf{alg}\| / \|\mathbf{A}^{1/2}\mathbf{b}_i\|$ versus matrix-loads for Lanczos square root () and block-Lanczos square root ().
  • Figure A.1: Relative error $\|\mathbf{A}^{-1}\mathbf{b} - \mathsf{alg}\|_{\mathbf{A}} / \|\mathbf{A}^{-1}\mathbf{b}\|_{\mathbf{A}}$ versus matrix-loads for block-CG (), CG (), and Nyström PCG with $s=1$ () and $s=3$ () on several test problems; see also \ref{['fig:iter']}.
  • ...and 6 more figures

Theorems & Definitions (35)

  • Definition 2.1
  • Definition 2.2
  • Corollary 2.3
  • Corollary 2.4
  • Definition 3.1
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • remark 1
  • ...and 25 more