Table of Contents
Fetching ...

A Randomized Algorithm for Preconditioner Selection

Conner DiPaolo, Weiqing Gu

TL;DR

The paper tackles the challenging problem of selecting effective preconditioners for iterative solvers by focusing on the preconditioner stability $\|\boldsymbol{I}-\boldsymbol{M}^{-1}\boldsymbol{A}\|_\mathsf{F}$. It introduces a randomized, sketching-based estimator that computes this stability efficiently and proves deterministic impossibility results, establishing randomness as essential. A practical algorithm is then shown to provably identify near-minimal stability among $n$ candidates with cost close to $\mathcal{O}(n\log n)$ CG steps, including parallelization and extensions when a clear winner exists. The authors validate the approach on sparse systems and kernel-regression problems, demonstrating that the method often matches or outperforms the best candidate with manageable overhead and, in kernel regression, yields robust preconditioning where none previously existed. Overall, the work provides a theoretically grounded, scalable framework for preconditioner selection with meaningful impact for large-scale linear systems and data-driven applications.

Abstract

The task of choosing a preconditioner $\boldsymbol{M}$ to use when solving a linear system $\boldsymbol{Ax}=\boldsymbol{b}$ with iterative methods is difficult. For instance, even if one has access to a collection $\boldsymbol{M}_1,\boldsymbol{M}_2,\ldots,\boldsymbol{M}_n$ of candidate preconditioners, it is currently unclear how to practically choose the $\boldsymbol{M}_i$ which minimizes the number of iterations of an iterative algorithm to achieve a suitable approximation to $\boldsymbol{x}$. This paper makes progress on this sub-problem by showing that the preconditioner stability $\|\boldsymbol{I}-\boldsymbol{M}^{-1}\boldsymbol{A}\|_\mathsf{F}$, known to forecast preconditioner quality, can be computed in the time it takes to run a constant number of iterations of conjugate gradients through use of sketching methods. This is in spite of folklore which suggests the quantity is impractical to compute, and a proof we give that ensures the quantity could not possibly be approximated in a useful amount of time by a deterministic algorithm. Using our estimator, we provide a method which can provably select the minimal stability preconditioner among $n$ candidates using floating point operations commensurate with running on the order of $n\log n$ steps of the conjugate gradients algorithm. Our method can also advise the practitioner to use no preconditioner at all if none of the candidates appears useful. The algorithm is extremely easy to implement and trivially parallelizable. In one of our experiments, we use our preconditioner selection algorithm to create to the best of our knowledge the first preconditioned method for kernel regression reported to never use more iterations than the non-preconditioned analog in standard tests.

A Randomized Algorithm for Preconditioner Selection

TL;DR

The paper tackles the challenging problem of selecting effective preconditioners for iterative solvers by focusing on the preconditioner stability . It introduces a randomized, sketching-based estimator that computes this stability efficiently and proves deterministic impossibility results, establishing randomness as essential. A practical algorithm is then shown to provably identify near-minimal stability among candidates with cost close to CG steps, including parallelization and extensions when a clear winner exists. The authors validate the approach on sparse systems and kernel-regression problems, demonstrating that the method often matches or outperforms the best candidate with manageable overhead and, in kernel regression, yields robust preconditioning where none previously existed. Overall, the work provides a theoretically grounded, scalable framework for preconditioner selection with meaningful impact for large-scale linear systems and data-driven applications.

Abstract

The task of choosing a preconditioner to use when solving a linear system with iterative methods is difficult. For instance, even if one has access to a collection of candidate preconditioners, it is currently unclear how to practically choose the which minimizes the number of iterations of an iterative algorithm to achieve a suitable approximation to . This paper makes progress on this sub-problem by showing that the preconditioner stability , known to forecast preconditioner quality, can be computed in the time it takes to run a constant number of iterations of conjugate gradients through use of sketching methods. This is in spite of folklore which suggests the quantity is impractical to compute, and a proof we give that ensures the quantity could not possibly be approximated in a useful amount of time by a deterministic algorithm. Using our estimator, we provide a method which can provably select the minimal stability preconditioner among candidates using floating point operations commensurate with running on the order of steps of the conjugate gradients algorithm. Our method can also advise the practitioner to use no preconditioner at all if none of the candidates appears useful. The algorithm is extremely easy to implement and trivially parallelizable. In one of our experiments, we use our preconditioner selection algorithm to create to the best of our knowledge the first preconditioned method for kernel regression reported to never use more iterations than the non-preconditioned analog in standard tests.

Paper Structure

This paper contains 23 sections, 5 theorems, 30 equations, 1 figure, 2 tables.

Key Result

Theorem 1

\newlabelthm:impossible0 Fix some $0 \leq \epsilon < 1$. Suppose we have a deterministic algorithm which takes as input an arbitrary positive semi-definite matrix $\boldsymbol{A}\in\mathbb{F}^{d\times d}$ and positive definite matrix $\boldsymbol{M}\in\mathbb{F}^{d\times d}$, and returns an estima after sequentially querying and observing matrix vector multiplies of the form $(\boldsymbol{M}^{-1}

Figures (1)

  • Figure 1: This figure presents the relative improvement of using our proposed preconditioners, or the one automatically chosen by Algorithm \ref{['alg:pick']}, with respect to using no preconditioner at all. Each individual matrix corresponds to a specific preconditioner and dataset pair. Each row gives the value of $\log\sigma_n^2$ used in the experiment, whereas each column corresponds to $\log\ell$. The absence of red cells in the result matrices corresponding to 'Our Method' indicates significant improvement over the results in cutajar2016preconditioning.

Theorems & Definitions (10)

  • Theorem 1
  • Proof 1
  • Theorem 2
  • Proof 2
  • Theorem 3
  • Proof 3
  • Theorem 4
  • Proof 4
  • Theorem 5
  • Proof 5