Table of Contents
Fetching ...

Towards Weaker Variance Assumptions for Stochastic Optimization

Ahmet Alacaoglu, Yura Malitsky, Stephen J. Wright

TL;DR

The paper investigates the weakest known variance-growth condition for stochastic optimization, the Blum-Gladyshev BG0 assumption, and shows how a Halpern-anchored framework yields horizon-free, anytime convergence with strong last-iterate guarantees for convex minimization and min-max problems without requiring bounded domains or bounded variance. By permitting dynamic parameter choices $\beta_k$ and $\tau_k$, the authors connect Halpern iteration to stochastic optimization under BG0, obtaining $\mathbb{E}[f(\mathbf{x}_k^{\mathsf{out}}) - f(\mathbf{x}^*)] = \widetilde{O}(1/\sqrt{k})$ for weighted averages and $\mathbb{E}[f(\mathbf{x}_k) - f(\mathbf{x}^*)] = \widetilde{O}(1/\sqrt{k})$ for the last iterate, without horizons. They extend the approach to min-max problems with two robust optimality measures—objective suboptimality with feasibility and residuals—achieving meaningful rates via Halpern-anchored gradient methods and variance-reduced extragradient schemes under BG0. The results demonstrate that BG0 subsumes many earlier variance relaxations and yield practical, horizon-free convergence guarantees that apply to constrained, functionally constrained, and min-max settings, broadening the applicability of stochastic methods beyond bounded domains and classical variance assumptions.

Abstract

We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.

Towards Weaker Variance Assumptions for Stochastic Optimization

TL;DR

The paper investigates the weakest known variance-growth condition for stochastic optimization, the Blum-Gladyshev BG0 assumption, and shows how a Halpern-anchored framework yields horizon-free, anytime convergence with strong last-iterate guarantees for convex minimization and min-max problems without requiring bounded domains or bounded variance. By permitting dynamic parameter choices and , the authors connect Halpern iteration to stochastic optimization under BG0, obtaining for weighted averages and for the last iterate, without horizons. They extend the approach to min-max problems with two robust optimality measures—objective suboptimality with feasibility and residuals—achieving meaningful rates via Halpern-anchored gradient methods and variance-reduced extragradient schemes under BG0. The results demonstrate that BG0 subsumes many earlier variance relaxations and yield practical, horizon-free convergence guarantees that apply to constrained, functionally constrained, and min-max settings, broadening the applicability of stochastic methods beyond bounded domains and classical variance assumptions.

Abstract

We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.

Paper Structure

This paper contains 16 sections, 10 theorems, 102 equations.

Key Result

Lemma 3.2

Let asp: 1 hold and $\{ \mathbf{x}_k \}$ be generated by alg: halpern1 with $\beta_k \in (0, 1/2]$ and $\tau_k \leq \frac{\sqrt{\beta_k(1-\beta_k)}}{\sqrt{6}B}$. Then for any $\mathbf{x}\in X$ that is deterministic under conditioning of $\mathbb{E}_k$, we have

Theorems & Definitions (25)

  • Example 1.1
  • Lemma 3.2
  • Remark 3.3
  • proof : Proof of \ref{['lem: one_iteration_subg']}
  • Corollary 3.4
  • Remark 3.5
  • proof : Proof of \ref{['cor: weighted_avg_rate']}
  • Theorem 3.6
  • proof
  • Proposition 4.2
  • ...and 15 more