Table of Contents
Fetching ...

Parameter-free Stochastic Optimization of Variationally Coherent Functions

Francesco Orabona, Dávid Pál

TL;DR

This work introduces a parameter-free stochastic optimization algorithm for differentiable functions on $\mathbb{R}^d$ that simultaneously achieves almost-sure convergence to the global minimizer for variationally coherent objectives and near-optimal convex-rate guarantees without averaging. Central to the method is Follow The Regularized Leader with gradient rescaling and a novel linearithmic regularizer, yielding robust last-iterate convergence in non-convex settings and favorable $O\left(T^{-(1-\alpha)}\right)$ rates in convex scenarios. The authors develop explicit regularizers $\phi_t$ via the functions $\psi$ and $\psi^*$, prove key regret bounds, and establish convergence through Bregman-divergence analysis. The work also discusses limitations, adaptivity, and potential extensions, highlighting the framework’s applicability to broader online-to-batch contexts and parameter-free stochastic optimization problems.

Abstract

We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on $\mathbb{R}^d$. In particular, we consider the \emph{variationally coherent} functions which can be convex or non-convex. The iterates of our algorithm on variationally coherent functions converge almost surely to the global minimizer $\boldsymbol{x}^*$. Additionally, the very same algorithm with the same hyperparameters, after $T$ iterations guarantees on convex functions that the expected suboptimality gap is bounded by $\widetilde{O}(\|\boldsymbol{x}^* - \boldsymbol{x}_0\| T^{-1/2+ε})$ for any $ε>0$. It is the first algorithm to achieve both these properties at the same time. Also, the rate for convex functions essentially matches the performance of parameter-free algorithms. Our algorithm is an instance of the Follow The Regularized Leader algorithm with the added twist of using \emph{rescaled gradients} and time-varying linearithmic regularizers.

Parameter-free Stochastic Optimization of Variationally Coherent Functions

TL;DR

This work introduces a parameter-free stochastic optimization algorithm for differentiable functions on that simultaneously achieves almost-sure convergence to the global minimizer for variationally coherent objectives and near-optimal convex-rate guarantees without averaging. Central to the method is Follow The Regularized Leader with gradient rescaling and a novel linearithmic regularizer, yielding robust last-iterate convergence in non-convex settings and favorable rates in convex scenarios. The authors develop explicit regularizers via the functions and , prove key regret bounds, and establish convergence through Bregman-divergence analysis. The work also discusses limitations, adaptivity, and potential extensions, highlighting the framework’s applicability to broader online-to-batch contexts and parameter-free stochastic optimization problems.

Abstract

We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on . In particular, we consider the \emph{variationally coherent} functions which can be convex or non-convex. The iterates of our algorithm on variationally coherent functions converge almost surely to the global minimizer . Additionally, the very same algorithm with the same hyperparameters, after iterations guarantees on convex functions that the expected suboptimality gap is bounded by for any . It is the first algorithm to achieve both these properties at the same time. Also, the rate for convex functions essentially matches the performance of parameter-free algorithms. Our algorithm is an instance of the Follow The Regularized Leader algorithm with the added twist of using \emph{rescaled gradients} and time-varying linearithmic regularizers.

Paper Structure

This paper contains 26 sections, 26 theorems, 169 equations, 2 algorithms.

Key Result

Theorem 2

Let $F:\mathbb{R}^d \to \mathbb{R}$ be a variationally coherent function with minimizer $\boldsymbol{x}^*$. Assume the stochastic gradients satisfy equation:stochastic-gradient and equation:gradient-bound. Assume that learning rate $\eta_t$ is a non-negative $\mathcal{F}_t$-measurable random variabl Then, the sequence $\boldsymbol{x}_1, \boldsymbol{x}_2, \dots$ generated by Algorithm algorithm:ftr

Theorems & Definitions (53)

  • Definition 1: Variatonally coherent function
  • Definition 1: Variatonally coherent function
  • Theorem 2: $\boldsymbol{x}_t$ converges to $\boldsymbol{x}^*$
  • Theorem 3: Convergence rate of running average for convex functions
  • Theorem 4: Convergence rate of last iterate for convex functions
  • Lemma 5: FTRL regret equality
  • Lemma 6: FTRL for stochastic optimization
  • proof
  • Lemma 7: Convergence of Bregman divergences
  • Lemma 8: $\left\|{\boldsymbol{x}_t-\boldsymbol{x}^*}\right\|^2$ is squeezed
  • ...and 43 more