Table of Contents
Fetching ...

ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm

Chris Junchi Li, Wenlong Mou, Martin J. Wainwright, Michael I. Jordan

TL;DR

It is shown that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cram\'er-Rao optimal asymptotic covariance, for a broad range of step-size choices.

Abstract

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as Recursive One-Over-T SGD (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the non-asymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cramér-Rao optimal asymptotic covariance, for a broad range of step-size choices.

ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm

TL;DR

It is shown that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cram\'er-Rao optimal asymptotic covariance, for a broad range of step-size choices.

Abstract

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as Recursive One-Over-T SGD (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the non-asymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cramér-Rao optimal asymptotic covariance, for a broad range of step-size choices.

Paper Structure

This paper contains 64 sections, 22 theorems, 292 equations, 1 table, 2 algorithms.

Key Result

Theorem 1

Under Assumptions assu_StrcvxSmooth, assu_noisethetastar, assu_smoothnoise, suppose that we run Algorithm algo_singleepoch with burn-in period ${{T_0}}$ and step-size $\eta$ such that Then, for any iteration $T \ge 1$, the iterate $\theta_{T}$ satisfies the bound

Theorems & Definitions (23)

  • Theorem 1: Preliminary nonasymptotic results, single-epoch ROOT-SGD
  • Theorem 2: Improved nonasymptotic upper bound, multi-epoch ROOT-SGD
  • Corollary 3: Nonasymptotic bounds in alternative metrics, multi-epoch ROOT-SGD
  • Theorem 4: Asymptotic efficiency, multi-epoch ROOT-SGD
  • Theorem 5: Unified nonasymptotic results, single-epoch ROOT-SGD
  • Lemma 1: Recursion involving $z_t$
  • Lemma 2: Evolution of $v_t$
  • Lemma 3: Second moment of pointwise stochastic noise
  • proof : Proof of Theorem \ref{['theo_finitebdd_single_complete']}
  • Proposition 1: Improved nonasymptotic upper bound, single-epoch ROOT-SGD
  • ...and 13 more