Table of Contents
Fetching ...

Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton

Chengmei Niu, Zhenyu Liao, Zenan Ling, Michael W. Mahoney

TL;DR

This paper shows how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform.

Abstract

A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.

Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton

TL;DR

This paper shows how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform.

Abstract

A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.

Paper Structure

This paper contains 37 sections, 26 theorems, 209 equations, 4 figures.

Key Result

Lemma 2.7

Given $\mathbf{A}\in {\mathbb{R}}^{n\times d}$ of rank $d$ with $n\geq d$ and p.s.d. $\mathbf{C}\in{\mathbb{R}}^{d\times d}$, let $\mathbf{S}$ be a random sampling matrix with number of trials $m$ and importance sampling distribution $\{ \pi_i \}_{i=1}^n$ as in def:RS, and let $d_{ \mathrm{eff} }=\m

Figures (4)

  • Figure 1: The phase transition behavior of inversion bias $\epsilon$ as a function of $\rho_{\max}$ discussed in \ref{['rem:inv_bias_exact_VS_approx']} with scalar debiasing.
  • Figure 2: Relative errors (in solid lines) and wall-clock time (in dashed lines) as a function of the sketch size $m$, for Newton-LESS and the proposed de-biased SSN-ARLev methods, applied to logistic regression on both MNIST and CIFAR-10 data. Relative errors are obtained after a fixed number of iterations ($T=5$ for MNIST data and $T=7$ for CIFAR-10 data). Results are obtained by averaging over $30$ independent runs.
  • Figure 3: Convergence--complexity trade-off between various optimization methods on logistic regression for MNIST and CIFAR-10 data, with sketch size $m=300$ for MNIST and $m=400$ for CIFAR-10 data. Results are obtained by averaging over $10$ independent runs (except for GD that is deterministic).
  • Figure 4: Inversion bias as a function of the sketch size $m$, for various sampling methods on both MNIST and CIFAR-10 data, with ridge parameter $\lambda=10^{-1}$ for MNIST data and $\lambda=10^{-6}$ for CIFAR-10 data. The results are averaged over $500$ independent runs.

Theorems & Definitions (48)

  • Definition 2.1: Random sampling
  • Definition 2.2: Leverage score sampling, mahoney2011randomized
  • Definition 2.3: Importance sampling approximation factor
  • Definition 2.4: Relative error approximation
  • Remark 2.5: Subspace embedding
  • Definition 2.6: Unbiased estimator
  • Lemma 2.7: Subspace embedding for random sampling
  • Proposition 2.8: Coarse-grained debiasing of random sampling
  • Remark 2.9: On \ref{['prop:coarse-RS']}
  • Theorem 3.1: Inversion bias for random sampling: fine-grained analysis
  • ...and 38 more