Parameter-free Stochastic Optimization of Variationally Coherent Functions
Francesco Orabona, Dávid Pál
TL;DR
This work introduces a parameter-free stochastic optimization algorithm for differentiable functions on $\mathbb{R}^d$ that simultaneously achieves almost-sure convergence to the global minimizer for variationally coherent objectives and near-optimal convex-rate guarantees without averaging. Central to the method is Follow The Regularized Leader with gradient rescaling and a novel linearithmic regularizer, yielding robust last-iterate convergence in non-convex settings and favorable $O\left(T^{-(1-\alpha)}\right)$ rates in convex scenarios. The authors develop explicit regularizers $\phi_t$ via the functions $\psi$ and $\psi^*$, prove key regret bounds, and establish convergence through Bregman-divergence analysis. The work also discusses limitations, adaptivity, and potential extensions, highlighting the framework’s applicability to broader online-to-batch contexts and parameter-free stochastic optimization problems.
Abstract
We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on $\mathbb{R}^d$. In particular, we consider the \emph{variationally coherent} functions which can be convex or non-convex. The iterates of our algorithm on variationally coherent functions converge almost surely to the global minimizer $\boldsymbol{x}^*$. Additionally, the very same algorithm with the same hyperparameters, after $T$ iterations guarantees on convex functions that the expected suboptimality gap is bounded by $\widetilde{O}(\|\boldsymbol{x}^* - \boldsymbol{x}_0\| T^{-1/2+ε})$ for any $ε>0$. It is the first algorithm to achieve both these properties at the same time. Also, the rate for convex functions essentially matches the performance of parameter-free algorithms. Our algorithm is an instance of the Follow The Regularized Leader algorithm with the added twist of using \emph{rescaled gradients} and time-varying linearithmic regularizers.
