Table of Contents
Fetching ...

Optimal Embedding Dimension for Sparse Subspace Embeddings

Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson

TL;DR

This work resolves the optimal embedding dimension question for sparse oblivious subspace embeddings by proving that m=(1+θ)d is achievable with column sparsity s=O(log^4(d)) and distortion ε=O(1), and that m=O(d/ε^2) is attainable in the presence of leverage-score information via LESS. It introduces independent-diagonals constructions and leverages universality to connect sparse embeddings to Gaussian sketches, enabling fast input-sparsity-time SKETCHING and the first single-pass fast LS algorithms with optimal dimension. The results extend to leverage-score aware non-oblivious embeddings (LESS) and yield fast, practical subspace embeddings with low distortion ε=o(1) and optimal m, suitable for streaming and turnstile settings. Collectively, these findings yield near-optimal, fast sketches for linear regression and related problems, with significant implications for scalable randomized linear algebra and data-efficient dimensionality reduction.

Abstract

A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $ε>0$, $δ\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+ε)^{-1}\|x\|\leq\|Sx\|\leq (1+ε)\|x\|\,\big)\geq 1-δ.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $θ> 0$, a Gaussian embedding matrix with $m\geq (1+θ) d$ is an OSE with $ε= O_θ(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $θ> 0$, an $m\times n$ random matrix $S$ with $m\geq (1+θ)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $ε= O_θ(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to Leverage Score Sparsification (LESS), which is a recently introduced non-oblivious embedding technique. We use LESS to construct the first subspace embedding with low distortion $ε=o(1)$ and optimal embedding dimension $m=O(d/ε^2)$ that can be applied in current matrix multiplication time.

Optimal Embedding Dimension for Sparse Subspace Embeddings

TL;DR

This work resolves the optimal embedding dimension question for sparse oblivious subspace embeddings by proving that m=(1+θ)d is achievable with column sparsity s=O(log^4(d)) and distortion ε=O(1), and that m=O(d/ε^2) is attainable in the presence of leverage-score information via LESS. It introduces independent-diagonals constructions and leverages universality to connect sparse embeddings to Gaussian sketches, enabling fast input-sparsity-time SKETCHING and the first single-pass fast LS algorithms with optimal dimension. The results extend to leverage-score aware non-oblivious embeddings (LESS) and yield fast, practical subspace embeddings with low distortion ε=o(1) and optimal m, suitable for streaming and turnstile settings. Collectively, these findings yield near-optimal, fast sketches for linear regression and related problems, with significant implications for scalable randomized linear algebra and data-efficient dimensionality reduction.

Abstract

A random matrix is an oblivious subspace embedding (OSE) with parameters , and , if for any -dimensional subspace , It is known that the embedding dimension of an OSE must satisfy , and for any , a Gaussian embedding matrix with is an OSE with . However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any , an random matrix with consisting of randomly sparsified entries and having non-zeros per column, is an oblivious subspace embedding with . Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve embedding dimension, and it improves on shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to Leverage Score Sparsification (LESS), which is a recently introduced non-oblivious embedding technique. We use LESS to construct the first subspace embedding with low distortion and optimal embedding dimension that can be applied in current matrix multiplication time.
Paper Structure (27 sections, 21 theorems, 145 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 27 sections, 21 theorems, 145 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Theorem 1.2

Given any constant $\theta>0$, an $m\times n$ sparse embedding matrix $S$ with $n\geq m\geq (1+\theta)d$ and $s= O(\log^4(d))$ non-zeros per column is an oblivious subspace embedding with distortion $\epsilon = O(1)$. Moreover, given any $\epsilon,\delta$, it suffices to use $s=O(\log^4(d/\delta)/\e

Figures (2)

  • Figure 1: Left: Structure of the random matrix $F_{\gamma_l}(W_l)$. Right: Illustration of an embedding matrix $S$ with independent diagonals with parameters $d=8$, $m=10$, and $n=30$, showing how the nonzero entries of $S$ occur along diagonals. The number of diagonals is controlled by the parameter $p$.
  • Figure 2: LESS-IND-ENT with decreasing leverage scores. Since the probability of an entry being non-zero is proportional to the corresponding leverage score, we see that the matrix becomes sparser as we move in the direction of decreasing leverage scores. Since the scaling of entries is inversely proportional to the square root of the corresponding leverage score, the magnitude of the non-zero entries becomes larger as we move to the right.

Theorems & Definitions (43)

  • Definition 1.1
  • Theorem 1.2: Sparse OSEs; informal Theorem \ref{['osngeneral']}
  • Theorem 1.3: Sparser Non-oblivious SE; informal Theorem \ref{['nonose']}
  • Theorem 1.4: Fast oblivious subspace embedding
  • Remark 1.5
  • Theorem 1.6: Fast Low-distortion Subspace Embedding
  • Theorem 1.7: Fast Least Squares
  • Theorem 1.7: Fast Least Squares
  • Theorem 1.8: Fast reduction for constrained/regularized least squares
  • Definition 2.1: OSE-IID-ENT
  • ...and 33 more