Distributed Least Squares in Small Space via Sketching and Bias Reduction

Sachin Garg; Kevin Tan; Michał Dereziński

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Sachin Garg, Kevin Tan, Michał Dereziński

TL;DR

This work tackles the challenge of performing least-squares in space-limited distributed settings by focusing on reducing estimator bias rather than minimizing error alone. It introduces a sparse leverage-score sparsified embedding (LESS) that, combined with a leave-one-out analysis and higher-moment bounds, yields a nearly unbiased LS estimator from a compact sketch. A two-pass distributed scheme with a preconditioner achieves near-optimal time and space, enabling current-matrix-multiplication-time performance with only O(d^2 log(nd)) bits of space. The results extend beyond LS to bias-variance analyses for other sketching-based estimators and demonstrate a practical free-lunch phenomenon in distributed averaging, with significant implications for scalable, robust RandNLA in streaming and distributed contexts.

Abstract

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.

Distributed Least Squares in Small Space via Sketching and Bias Reduction

TL;DR

Abstract

Paper Structure (21 sections, 24 theorems, 118 equations, 3 figures, 1 table)

This paper contains 21 sections, 24 theorems, 118 equations, 3 figures, 1 table.

Introduction
Our Techniques.
Related Work
Randomized numerical linear algebra.
Unbiased estimators for least squares.
Statistical and RMT analysis of sketching.
Preliminaries
Notations.
Computational model.
Definitions and useful lemmas.
Least squares bias analysis
Completing the proof of Theorem \ref{['t:main']}.
Completing the proof of Theorem \ref{['t:two-passes']}.
Conclusions and further applications
Theoretical applications: Bias-variance analysis for other estimators.
...and 6 more sections

Key Result

Theorem 1

Given streaming access to $\mathbf A\in\mathbb R^{n\times d}$ and $\mathbf b\in\mathbb R^n$, and direct access to a preconditioner matrix $\mathbf P\in\mathbb R^{d\times d}$ such that $\kappa(\mathbf A\mathbf P)\le\alpha$, within a single pass over $(\mathbf A,\mathbf b)$, in $O(\gamma^{-1}{\mathrm{

Figures (3)

Figure 1: Illustration of the leverage score sparsification algorithm used in Theorem \ref{['t:main']}. Each row of the sketch mixes $\tilde{O}(1/\epsilon)$ leverage score samples from $\mathbf A$. Remarkably, the $\epsilon$-error guarantee of the subsampled estimator is retained as $\epsilon$-bias of the sketched estimator.
Figure 2: Distributed averaging experiment on the YearPredictionMSD dataset libsvm, which shows that sparse sketching can be used to preserve near-unbiasedness without increasing the estimation cost (see Appendix \ref{['s:experiments']} for similar results on two other datasets).
Figure 3: Comparison of the relative error of the distributed averaging estimator of sketch-and-solve least squares estimates where the sketches are constructed with sparse sketching matrices with uniform probabilities (LESSUniform) on libsvm datasets Abalone and Boston (see Figure \ref{['fig:msd']} for results on YearPredictionMSD). For each dataset, the computational cost of sketching is the same in all four parameter settings. Remarkably, sketching to a smaller size appears to preserve near-unbiasedness without incurring any additional computational cost.

Theorems & Definitions (30)

Theorem 1
Remark 1
Theorem 2
Definition 1: $(\beta_1,\beta_2)$-approximate leverage scores
Lemma 1: Based on Lemma 7.2 from chepurko2022near
Definition 2: $(s,\beta_1,\beta_2)$-LESS embedding
Remark 2
Definition 3: $(\epsilon,\delta)$-unbiased estimator
Lemma 2: Subspace embedding for LESS, Theorem 1.3, chenakkod2023optimal
Lemma 3: Bai-Silverstein's Inequality, Lemma B.26, bai2010spectral
...and 20 more

Distributed Least Squares in Small Space via Sketching and Bias Reduction

TL;DR

Abstract

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (30)