Optimal Embedding Dimension for Sparse Subspace Embeddings

Shabarish Chenakkod; Michał Dereziński; Xiaoyu Dong; Mark Rudelson

Optimal Embedding Dimension for Sparse Subspace Embeddings

Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson

TL;DR

This work resolves the optimal embedding dimension question for sparse oblivious subspace embeddings by proving that m=(1+θ)d is achievable with column sparsity s=O(log^4(d)) and distortion ε=O(1), and that m=O(d/ε^2) is attainable in the presence of leverage-score information via LESS. It introduces independent-diagonals constructions and leverages universality to connect sparse embeddings to Gaussian sketches, enabling fast input-sparsity-time SKETCHING and the first single-pass fast LS algorithms with optimal dimension. The results extend to leverage-score aware non-oblivious embeddings (LESS) and yield fast, practical subspace embeddings with low distortion ε=o(1) and optimal m, suitable for streaming and turnstile settings. Collectively, these findings yield near-optimal, fast sketches for linear regression and related problems, with significant implications for scalable randomized linear algebra and data-efficient dimensionality reduction.

Abstract

A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $ε>0$, $δ\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+ε)^{-1}\|x\|\leq\|Sx\|\leq (1+ε)\|x\|\,\big)\geq 1-δ.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $θ> 0$, a Gaussian embedding matrix with $m\geq (1+θ) d$ is an OSE with $ε= O_θ(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $θ> 0$, an $m\times n$ random matrix $S$ with $m\geq (1+θ)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $ε= O_θ(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to Leverage Score Sparsification (LESS), which is a recently introduced non-oblivious embedding technique. We use LESS to construct the first subspace embedding with low distortion $ε=o(1)$ and optimal embedding dimension $m=O(d/ε^2)$ that can be applied in current matrix multiplication time.

Optimal Embedding Dimension for Sparse Subspace Embeddings

TL;DR

Abstract

A random

matrix

is an oblivious subspace embedding (OSE) with parameters

and

, if for any

-dimensional subspace

It is known that the embedding dimension of an OSE must satisfy

, and for any

, a Gaussian embedding matrix with

is an OSE with

. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having

non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any

, an

random matrix

with

consisting of randomly sparsified

entries and having

non-zeros per column, is an oblivious subspace embedding with

. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve

embedding dimension, and it improves on

shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with

embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to Leverage Score Sparsification (LESS), which is a recently introduced non-oblivious embedding technique. We use LESS to construct the first subspace embedding with low distortion

and optimal embedding dimension

that can be applied in current matrix multiplication time.

Paper Structure (27 sections, 21 theorems, 145 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 27 sections, 21 theorems, 145 equations, 2 figures, 1 table, 2 algorithms.

Introduction
Main results
Fast subspace embeddings
Applications to linear regression
Overview of techniques
Related work
Preliminaries
Notation
Oblivious Subspace Embeddings
Non-oblivious subspace embeddings
Uniformizing leverage scores by preconditioning
Universality
Spectrum of Gaussian Matrices
Analysis of Oblivious Sparse Embeddings
Independent Diagonals Construction
...and 12 more sections

Key Result

Theorem 1.2

Given any constant $\theta>0$, an $m\times n$ sparse embedding matrix $S$ with $n\geq m\geq (1+\theta)d$ and $s= O(\log^4(d))$ non-zeros per column is an oblivious subspace embedding with distortion $\epsilon = O(1)$. Moreover, given any $\epsilon,\delta$, it suffices to use $s=O(\log^4(d/\delta)/\e

Figures (2)

Figure 1: Left: Structure of the random matrix $F_{\gamma_l}(W_l)$. Right: Illustration of an embedding matrix $S$ with independent diagonals with parameters $d=8$, $m=10$, and $n=30$, showing how the nonzero entries of $S$ occur along diagonals. The number of diagonals is controlled by the parameter $p$.
Figure 2: LESS-IND-ENT with decreasing leverage scores. Since the probability of an entry being non-zero is proportional to the corresponding leverage score, we see that the matrix becomes sparser as we move in the direction of decreasing leverage scores. Since the scaling of entries is inversely proportional to the square root of the corresponding leverage score, the magnitude of the non-zero entries becomes larger as we move to the right.

Theorems & Definitions (43)

Definition 1.1
Theorem 1.2: Sparse OSEs; informal Theorem \ref{['osngeneral']}
Theorem 1.3: Sparser Non-oblivious SE; informal Theorem \ref{['nonose']}
Theorem 1.4: Fast oblivious subspace embedding
Remark 1.5
Theorem 1.6: Fast Low-distortion Subspace Embedding
Theorem 1.7: Fast Least Squares
Theorem 1.7: Fast Least Squares
Theorem 1.8: Fast reduction for constrained/regularized least squares
Definition 2.1: OSE-IID-ENT
...and 33 more

Optimal Embedding Dimension for Sparse Subspace Embeddings

TL;DR

Abstract

Optimal Embedding Dimension for Sparse Subspace Embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (43)