Solving Dense Linear Systems Faster Than via Preconditioning

Michał Dereziński; Jiaming Yang

Solving Dense Linear Systems Faster Than via Preconditioning

Michał Dereziński, Jiaming Yang

TL;DR

This work introduces a stochastic optimization framework for solving dense linear systems $A x=b$ by eschewing traditional preconditioning in favor of randomized, block-wise updates. The core method combines a deterministic leaping strategy with determinantal point process sampling, a randomized Hadamard transform to enable cheap sampling, and an accelerated sketch-and-project paradigm with efficient inner solvers. The main contribution is a near-optimal time bound $\tilde O(n^2 + nk^{\omega-1})\log(1/\varepsilon)$ for dense systems, with near-linear performance $\tilde O(n^2)$ when the spectrum has only $k=O(n^{1/(\omega-1)})$ large singular values, plus extensions to least-squares and PSD problems. The paper also develops a robust analysis using elementary symmetric polynomials, coupling arguments to replace costly $k$-DPP sampling with uniform sampling, and matrix sketching to accelerate inner iterations, yielding practical, scalable solvers for broad classes of dense and structured matrices.

Abstract

We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq ε\|b\|$ in time: $$\tilde O((n^2+nk^{ω-1})\log1/ε),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $ω< 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-θ})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.

Solving Dense Linear Systems Faster Than via Preconditioning

TL;DR

This work introduces a stochastic optimization framework for solving dense linear systems

by eschewing traditional preconditioning in favor of randomized, block-wise updates. The core method combines a deterministic leaping strategy with determinantal point process sampling, a randomized Hadamard transform to enable cheap sampling, and an accelerated sketch-and-project paradigm with efficient inner solvers. The main contribution is a near-optimal time bound

for dense systems, with near-linear performance

when the spectrum has only

large singular values, plus extensions to least-squares and PSD problems. The paper also develops a robust analysis using elementary symmetric polynomials, coupling arguments to replace costly

-DPP sampling with uniform sampling, and matrix sketching to accelerate inner iterations, yielding practical, scalable solvers for broad classes of dense and structured matrices.

Abstract

We give a stochastic optimization algorithm that solves a dense

real-valued linear system

, returning

such that

in time:

where

is the number of singular values of

larger than

times its smallest positive singular value,

is the matrix multiplication exponent, and

hides a poly-logarithmic in

factor. When

(namely,

has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an

runtime when

. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.

Paper Structure (42 sections, 27 theorems, 130 equations, 1 table, 8 algorithms)

This paper contains 42 sections, 27 theorems, 130 equations, 1 table, 8 algorithms.

Introduction
Main Results
Least squares.
Positive semidefinite systems.
Numerical stability of our methods.
Our Techniques
Background and Related Work
Preliminaries
Problem Setup and Notation.
Elementary symmetric polynomials.
Determinantal point processes.
Markov chain $k$-DPP sampling.
Total variation distance.
Randomized Hadamard transform.
Conjugate gradient.
...and 27 more sections

Key Result

Theorem 1.1

Given $\mathbf{A}\in\mathbb{R}^{n\times n}$, $\mathbf{b}\in\mathbb{R}^n$, $\epsilon>0$ and a constant $C=O(1)$, we can compute $\tilde{\mathbf{x}}$ such that $\|\mathbf{A}\tilde{\mathbf{x}}-\mathbf{b}\|\leq\epsilon\|\mathbf{b}\|$ in time: where $k$ is the number of singular values of $\mathbf{A}$ larger than $C$ times its smallest positive singular value.

Theorems & Definitions (51)

Theorem 1.1: Dense linear system, simplified Theorem \ref{['thm:main_2']}
Remark 1.1
Theorem 1.2: Least squares, simplified Theorem \ref{['thm:main_ls']}
Theorem 1.3: PSD linear system, simplified Theorem \ref{['thm:main_psd']}
Definition 2.1: Elementary symmetric polynomial
Lemma 2.1: Sum of principal minors
Definition 2.2: $k$-DPP
Definition 2.3: Projection DPP
Lemma 2.2: Algorithm 8 and Theorem 5.2 in kt12
Lemma 2.3: Corollary 7 in alv22
...and 41 more

Solving Dense Linear Systems Faster Than via Preconditioning

TL;DR

Abstract

Solving Dense Linear Systems Faster Than via Preconditioning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (51)