Towards Weaker Variance Assumptions for Stochastic Optimization

Ahmet Alacaoglu; Yura Malitsky; Stephen J. Wright

Towards Weaker Variance Assumptions for Stochastic Optimization

Ahmet Alacaoglu, Yura Malitsky, Stephen J. Wright

TL;DR

The paper investigates the weakest known variance-growth condition for stochastic optimization, the Blum-Gladyshev BG0 assumption, and shows how a Halpern-anchored framework yields horizon-free, anytime convergence with strong last-iterate guarantees for convex minimization and min-max problems without requiring bounded domains or bounded variance. By permitting dynamic parameter choices $\beta_k$ and $\tau_k$, the authors connect Halpern iteration to stochastic optimization under BG0, obtaining $\mathbb{E}[f(\mathbf{x}_k^{\mathsf{out}}) - f(\mathbf{x}^*)] = \widetilde{O}(1/\sqrt{k})$ for weighted averages and $\mathbb{E}[f(\mathbf{x}_k) - f(\mathbf{x}^*)] = \widetilde{O}(1/\sqrt{k})$ for the last iterate, without horizons. They extend the approach to min-max problems with two robust optimality measures—objective suboptimality with feasibility and residuals—achieving meaningful rates via Halpern-anchored gradient methods and variance-reduced extragradient schemes under BG0. The results demonstrate that BG0 subsumes many earlier variance relaxations and yield practical, horizon-free convergence guarantees that apply to constrained, functionally constrained, and min-max settings, broadening the applicability of stochastic methods beyond bounded domains and classical variance assumptions.

Abstract

We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.

Towards Weaker Variance Assumptions for Stochastic Optimization

TL;DR

Abstract

Towards Weaker Variance Assumptions for Stochastic Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (25)