Table of Contents
Fetching ...

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Sachin Garg, Kevin Tan, Michał Dereziński

TL;DR

This work tackles the challenge of performing least-squares in space-limited distributed settings by focusing on reducing estimator bias rather than minimizing error alone. It introduces a sparse leverage-score sparsified embedding (LESS) that, combined with a leave-one-out analysis and higher-moment bounds, yields a nearly unbiased LS estimator from a compact sketch. A two-pass distributed scheme with a preconditioner achieves near-optimal time and space, enabling current-matrix-multiplication-time performance with only O(d^2 log(nd)) bits of space. The results extend beyond LS to bias-variance analyses for other sketching-based estimators and demonstrate a practical free-lunch phenomenon in distributed averaging, with significant implications for scalable, robust RandNLA in streaming and distributed contexts.

Abstract

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.

Distributed Least Squares in Small Space via Sketching and Bias Reduction

TL;DR

This work tackles the challenge of performing least-squares in space-limited distributed settings by focusing on reducing estimator bias rather than minimizing error alone. It introduces a sparse leverage-score sparsified embedding (LESS) that, combined with a leave-one-out analysis and higher-moment bounds, yields a nearly unbiased LS estimator from a compact sketch. A two-pass distributed scheme with a preconditioner achieves near-optimal time and space, enabling current-matrix-multiplication-time performance with only O(d^2 log(nd)) bits of space. The results extend beyond LS to bias-variance analyses for other sketching-based estimators and demonstrate a practical free-lunch phenomenon in distributed averaging, with significant implications for scalable, robust RandNLA in streaming and distributed contexts.

Abstract

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.
Paper Structure (21 sections, 24 theorems, 118 equations, 3 figures, 1 table)

This paper contains 21 sections, 24 theorems, 118 equations, 3 figures, 1 table.

Key Result

Theorem 1

Given streaming access to $\mathbf A\in\mathbb R^{n\times d}$ and $\mathbf b\in\mathbb R^n$, and direct access to a preconditioner matrix $\mathbf P\in\mathbb R^{d\times d}$ such that $\kappa(\mathbf A\mathbf P)\le\alpha$, within a single pass over $(\mathbf A,\mathbf b)$, in $O(\gamma^{-1}{\mathrm{

Figures (3)

  • Figure 1: Illustration of the leverage score sparsification algorithm used in Theorem \ref{['t:main']}. Each row of the sketch mixes $\tilde{O}(1/\epsilon)$ leverage score samples from $\mathbf A$. Remarkably, the $\epsilon$-error guarantee of the subsampled estimator is retained as $\epsilon$-bias of the sketched estimator.
  • Figure 2: Distributed averaging experiment on the YearPredictionMSD dataset libsvm, which shows that sparse sketching can be used to preserve near-unbiasedness without increasing the estimation cost (see Appendix \ref{['s:experiments']} for similar results on two other datasets).
  • Figure 3: Comparison of the relative error of the distributed averaging estimator of sketch-and-solve least squares estimates where the sketches are constructed with sparse sketching matrices with uniform probabilities (LESSUniform) on libsvm datasets Abalone and Boston (see Figure \ref{['fig:msd']} for results on YearPredictionMSD). For each dataset, the computational cost of sketching is the same in all four parameter settings. Remarkably, sketching to a smaller size appears to preserve near-unbiasedness without incurring any additional computational cost.

Theorems & Definitions (30)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Definition 1: $(\beta_1,\beta_2)$-approximate leverage scores
  • Lemma 1: Based on Lemma 7.2 from chepurko2022near
  • Definition 2: $(s,\beta_1,\beta_2)$-LESS embedding
  • Remark 2
  • Definition 3: $(\epsilon,\delta)$-unbiased estimator
  • Lemma 2: Subspace embedding for LESS, Theorem 1.3, chenakkod2023optimal
  • Lemma 3: Bai-Silverstein's Inequality, Lemma B.26, bai2010spectral
  • ...and 20 more