Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
TL;DR
This work develops a rigorous theory for constant-step stochastic gradient descent in high dimensions by recasting SGD as a high-dimensional nonlinear time series. It establishes geometric-moment contraction to a stationary distribution, yielding non-asymptotic $q$-th moment bounds in general $\ell^s$ norms and sharp high-probability concentration via a Fuk–Nagaev-type inequality. The authors also analyze Ruppert–Polyak averaged SGD (ASGD), deriving a detailed $\ell^{\infty}$-norm bound, an explicit complexity result, and a Gaussian-approximation for the stationary ASGD, with dimension-dependent learning-rate guidance. Collectively, these results close a key gap in SGD theory for constant learning rates in large-scale, overparameterized models and provide a versatile toolkit for analyzing a broad class of high-dimensional online learning algorithms.
Abstract
Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the $q$-th moment convergence of SGD and ASGD for any $q\ge2$ in general $\ell^s$-norms, and, in particular, the $\ell^{\infty}$-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.
