Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Aaron Mishkin; Mert Pilanci; Mark Schmidt

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Aaron Mishkin, Mert Pilanci, Mark Schmidt

TL;DR

This work addresses accelerating stochastic gradient methods under the interpolation setting by developing a generalized stochastic AGD framework built on estimating sequences. The key contribution is proving that any progression-guaranteeing primal update can be accelerated, yielding a reduced dependence on the strong growth constant from $\rho$ to $\sqrt{\rho}$ and, in the strongly convex case, an iteration complexity of $O\left(\frac{\sqrt{L L_{\max}}}{\mu}\log\frac{1}{\epsilon}\right)$. The results specialize to accelerated SGD under the strong growth condition and extend to preconditioned variants, offering faster convergence than SGD under favorable conditioning and addressing criticisms about stochastic acceleration guarantees. Comparisons with existing rates show improved robustness to noise and broader applicability beyond quadratics. The work also identifies future directions, including relaxing growth conditions and exploring fully stochastic estimating-sequence formulations.

Abstract

We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialized to accelerated SGD under the strong growth condition. In this special case, our analysis reduces the dependence on the strong growth constant from $ρ$ to $\sqrtρ$ as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

TL;DR

and, in the strongly convex case, an iteration complexity of

. The results specialize to accelerated SGD under the strong growth condition and extend to preconditioned variants, offering faster convergence than SGD under favorable conditioning and addressing criticisms about stochastic acceleration guarantees. Comparisons with existing rates show improved robustness to noise and broader applicability beyond quadratics. The work also identifies future directions, including relaxing growth conditions and exploring fully stochastic estimating-sequence formulations.

Abstract

as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

Paper Structure (13 sections, 18 theorems, 82 equations, 1 table)

This paper contains 13 sections, 18 theorems, 82 equations, 1 table.

Introduction
Additional Related Work
Assumptions
Convergence of Stochastic AGD
Specializations
Comparison to Existing Rates
Conclusion
Assumptions: Proofs
Convergence of Stochastic AGD: Proofs
Specializations: Proofs
Comparison to Existing Rates: Proofs
Theoretical Issues in the Preprint
Failed Equivalence Argument

Key Result

Lemma 0

Suppose $f$ is $L$-smooth, the strong growth condition holds, and $\eta_k$ is independent of $z_{k}$. Then the stochastic gradient step in eq:stochastic-agd-simple makes progress as,

Theorems & Definitions (37)

Lemma 0
Definition 1: Estimating Sequences
Definition 2
Lemma 2
Proposition 2
Theorem 3
Theorem 4
Corollary 5
Corollary 6
Example 6: ( ≫ L )
...and 27 more

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

TL;DR

Abstract

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (37)