Unbiased least squares regression via averaged stochastic gradient descent
Nabil Kahalé
TL;DR
This work tackles online least-squares regression by making the time-average SGD estimator unbiased via randomized multilevel Monte Carlo. It constructs unbiased estimators for the bias-corrected mean and for the minimizer, achieving an expected time of order $k$ per target level and an $O(1/k)$ excess risk, with poly-logarithmic dependence on the smallest eigenvalue of the Hessian. It also develops unbiased estimators for squared bias and variances, enabling unbiased risk assessments for multiple copies and both standard and average-start variants without knowledge of $H$ or $\theta^*$. Empirical results on synthetic Gaussian setups corroborate the theory, illustrating efficiency and parallelizability of the proposed estimators. The approach offers a principled way to quantify and reduce bias and variance in online least-squares with scalable unbiased estimation techniques.
Abstract
We consider an on-line least squares regression problem with optimal solution $θ^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $θ^*$. For $k\ge2$, we provide an unbiased estimator of $θ^*$ that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or $θ^*$. We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.
