Table of Contents
Fetching ...

Unbiased least squares regression via averaged stochastic gradient descent

Nabil Kahalé

TL;DR

This work tackles online least-squares regression by making the time-average SGD estimator unbiased via randomized multilevel Monte Carlo. It constructs unbiased estimators for the bias-corrected mean and for the minimizer, achieving an expected time of order $k$ per target level and an $O(1/k)$ excess risk, with poly-logarithmic dependence on the smallest eigenvalue of the Hessian. It also develops unbiased estimators for squared bias and variances, enabling unbiased risk assessments for multiple copies and both standard and average-start variants without knowledge of $H$ or $\theta^*$. Empirical results on synthetic Gaussian setups corroborate the theory, illustrating efficiency and parallelizability of the proposed estimators. The approach offers a principled way to quantify and reduce bias and variance in online least-squares with scalable unbiased estimation techniques.

Abstract

We consider an on-line least squares regression problem with optimal solution $θ^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $θ^*$. For $k\ge2$, we provide an unbiased estimator of $θ^*$ that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or $θ^*$. We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.

Unbiased least squares regression via averaged stochastic gradient descent

TL;DR

This work tackles online least-squares regression by making the time-average SGD estimator unbiased via randomized multilevel Monte Carlo. It constructs unbiased estimators for the bias-corrected mean and for the minimizer, achieving an expected time of order per target level and an excess risk, with poly-logarithmic dependence on the smallest eigenvalue of the Hessian. It also develops unbiased estimators for squared bias and variances, enabling unbiased risk assessments for multiple copies and both standard and average-start variants without knowledge of or . Empirical results on synthetic Gaussian setups corroborate the theory, illustrating efficiency and parallelizability of the proposed estimators. The approach offers a principled way to quantify and reduce bias and variance in online least-squares with scalable unbiased estimation techniques.

Abstract

We consider an on-line least squares regression problem with optimal solution and Hessian matrix H, and study a time-average stochastic gradient descent estimator of . For , we provide an unbiased estimator of that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or . We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.
Paper Structure (32 sections, 28 theorems, 147 equations, 4 figures, 4 tables)

This paper contains 32 sections, 28 theorems, 147 equations, 4 figures, 4 tables.

Key Result

Lemma 2.1

Assume that $(x,y)$ is square-integrable, that $((x_{t},y_{t}), t\geq0)$ is a sequence of independent copies of $(x,y)$, that Assumption 1 holds, and that Then and, for $k\geq1$, we have and Moreover, If $H$ is positive-definite then $E(\theta_{k})\rightarrow \theta^{*}$ as $k$ goes to infinity.

Figures (4)

  • Figure 1: Product of Average Running Time and of Variance with $H=\hbox{\rm diag}(i^{-3})$, $1\le i\le 25$, $\sigma^{2}=1$, $s(k)=k/2$ and $n=10^{8}/k$.
  • Figure 2: Expected Excess Risk with $H=\hbox{\rm diag}(i^{-3})$, $1\le i\le 25$, $\sigma^{2}=1$, $s(k)=k/2$ and $n=10^{8}/k$.
  • Figure 3: Product of Average Running Time and of Variance with $H=\hbox{\rm diag}(i^{-1})$, $1\le i\le 50$, $\sigma^{2}=0.01$, $s(k)=k/2$ and $n=10^{8}/k$.
  • Figure 4: Expected Excess Risk with $H=\hbox{\rm diag}(i^{-1})$, $1\le i\le 50$, $\sigma^{2}=0.01$, $s(k)=k/2$ and $n=10^{8}/k$.

Theorems & Definitions (44)

  • Lemma 2.1
  • Proposition 2.1
  • Theorem 3.1: GlynnRhee2015unbiased
  • Corollary 3.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.2
  • Proposition 3.1
  • Theorem 3.3
  • ...and 34 more