Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

Michał Dereziński; Christopher Musco; Jiaming Yang

Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

Michał Dereziński, Christopher Musco, Jiaming Yang

TL;DR

This work introduces Multi-level Sketched Preconditioning (MSP), a deterministic, sketch-based framework for solving linear systems by building low-rank Nyström preconditioners from sparse random sketches and inverting them through additional sketching levels. The key idea is that the convergence depends on an average tail condition number, enabling faster runtimes when the matrix has only a few large singular values; MSP yields a near-optimal $ ilde{O}(n^{2.065} ext{log}^3(1/ u) + k^{oldsymbol{ ame}})$ time solver for such systems and extends to regularized problems and matrix-norm estimation. In particular, MSP achieves $ ilde{O}(n^2 ext{log}^3(1/ u) + d_{oldsymbol{ ame}}^{oldsymbol{ ame}})$ for PSD cases, and $ ilde{O}(n^{2.11})$-time Schatten $1$-norm estimation, improving over prior stochastic approaches. The framework includes rigorous stability analysis of inexact preconditioned Lanczos iterations and a three-level extension to general linear systems, supported by detailed cost analyses and spectral guarantees for the inner solves. Overall, MSP provides a unifying, deterministic, sketch-based methodology that leverages average-condition-number decay to outperform previous stochastic or power-iteration based solvers in a broad class of problems including kernel ridge regression and spectral-numern estimation. The results advance both the theory and practice of fast linear-system and matrix-norm computations in the real RAM model, with practical implications for large-scale machine learning and numerical linear algebra tasks.

Abstract

We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nyström approximation to $A$ using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nyström approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any $n\times n$ linear system that is well-conditioned except for $k$ outlying large singular values in $\tilde{O}(n^{2.065} + k^ω)$ time, improving on a recent result of [Dereziński, Yang, STOC 2024] for all $k \gtrsim n^{0.78}$. 2. We give the first $\tilde{O}(n^2 + {d_λ}^ω$) time algorithm for solving a regularized linear system $(A + λI)x = b$, where $A$ is positive semidefinite with effective dimension $d_λ=\mathrm{tr}(A(A+λI)^{-1})$. This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten $p$-norms and other matrix norms. For example, for the Schatten 1-norm (nuclear norm), we give an algorithm that runs in $\tilde{O}(n^{2.11})$ time, improving on an $\tilde{O}(n^{2.18})$ method of [Musco et al., ITCS 2018]. All results are proven in the real RAM model of computation. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.

Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

TL;DR

time solver for such systems and extends to regularized problems and matrix-norm estimation. In particular, MSP achieves

for PSD cases, and

-time Schatten

-norm estimation, improving over prior stochastic approaches. The framework includes rigorous stability analysis of inexact preconditioned Lanczos iterations and a three-level extension to general linear systems, supported by detailed cost analyses and spectral guarantees for the inner solves. Overall, MSP provides a unifying, deterministic, sketch-based methodology that leverages average-condition-number decay to outperform previous stochastic or power-iteration based solvers in a broad class of problems including kernel ridge regression and spectral-numern estimation. The results advance both the theory and practice of fast linear-system and matrix-norm computations in the real RAM model, with practical implications for large-scale machine learning and numerical linear algebra tasks.

Abstract

We present a new class of preconditioned iterative methods for solving linear systems of the form

. Our methods are based on constructing a low-rank Nyström approximation to

using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of

, which improves as the rank of the Nyström approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any

linear system that is well-conditioned except for

outlying large singular values in

time, improving on a recent result of [Dereziński, Yang, STOC 2024] for all

. 2. We give the first

) time algorithm for solving a regularized linear system

, where

is positive semidefinite with effective dimension

. This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten

-norms and other matrix norms. For example, for the Schatten 1-norm (nuclear norm), we give an algorithm that runs in

time, improving on an

method of [Musco et al., ITCS 2018]. All results are proven in the real RAM model of computation. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.

Paper Structure (34 sections, 21 theorems, 118 equations, 1 figure, 7 algorithms)

This paper contains 34 sections, 21 theorems, 118 equations, 1 figure, 7 algorithms.

Introduction
Main Results
Regularized linear systems.
Matrix norm estimation.
Our Techniques
Additional Related Work
Preliminaries
Notation.
Sparse Subspace Embedding.
Computational Model.
Main Technical Results
Applications to Regularized Linear Systems and Least Squares
Kernel ridge regression.
Least Squares.
Applications to Matrix Norm Estimation
...and 19 more sections

Key Result

Theorem 1.1

Given an invertible $n\times n$ matrix $\mathbf{A}$ with at most $k$ singular values larger than $O(1)$ times its smallest singular value, and a length $n$ vector $\mathbf{b}$, there is an algorithm that, with high probability, computes $\tilde{\mathbf{x}}$ such that $\|\mathbf{A}\tilde{\mathbf{x}}-

Figures (1)

Figure 1: Time complexity for solving an $n\times n$ linear system with $k = n^\theta$ large singular values under current matrix multiplication exponent $\omega \approx 2.372$. The $x$-axis denotes the exponent $\theta$, while the $y$-axis denotes the exponent $\beta$ in the time complexity $\tilde{O}(n^\beta)$. The yellow line is our work (Theorem \ref{['thm:main']}), with the yellow area showing the complexity improvement compared to prior work. The red line denotes a lower bound for the problem, which we prove in Theorem \ref{['t:lower']}. The red area is unachievable under the assumption that solving general dense linear systems requires $\Omega(n^\omega)$ time.

Theorems & Definitions (34)

Theorem 1.1: Main result, informal Theorem \ref{['thm:main_rec']}
Theorem 1.2: Regularized linear systems, informal Theorem \ref{['thm:krr_formal']}
Theorem 1.3: Schatten 1-norm estimation, corollary of Theorem \ref{['t:schatten']}
Lemma 2.1: Lemma 5.4 in frangella2023randomized
Definition 2.1: Sparse embedding matrix
Lemma 2.2: Adapted from Theorem 1.4 in chenakkod2023optimal
Theorem 3.1: Main technical result
Remark 3.1
proof : Proof of Theorem \ref{['thm:main']}
Theorem 3.2: Regularized linear systems, formal version of Theorem \ref{['thm:krr']}
...and 24 more

Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

TL;DR

Abstract

Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (34)