Table of Contents
Fetching ...

Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness

Yuheng Zhao, Yu-Hu Yan, Kfir Yehuda Levy, Peng Zhao

TL;DR

The paper builds a bridge between gradient-variation online learning under Hölder smoothness and universal offline optimization, showing that adaptivity to unknown smoothness enables accelerated convergence in offline settings. By developing AdaGrad-style step sizes, optimistic OGD, and stabilized online-to-batch conversions, it obtains universal guarantees for both convex and strongly convex objectives across Hölder scales. Specifically, convex Hölder optimization attains rates of the form $O\Big(\frac{L_\nu D^{1+\nu}}{T^{(1+3\nu)/2}} + \frac{\sigma D}{\sqrt{T}}\Big)$ in stochastic settings, while strongly convex cases achieve gradient-variation regret bounds that interpolate between smooth and non-smooth regimes, plus a detection-based grid-search scheme to handle unknown curvature. The combined online-to-batch framework yields universal offline methods with accelerated convergence in the smooth regime and near-optimal performance in the non-smooth one, and the grid-search construction further removes the need for prior knowledge of strong convexity, making the approach broadly applicable. The work highlights open questions on universality in unconstrained settings and suggests future directions for integrating online adaptivity into broader offline optimization frameworks.

Abstract

Smoothness is known to be crucial for acceleration in offline optimization, and for gradient-variation regret minimization in online learning. Interestingly, these two problems are actually closely connected -- accelerated optimization can be understood through the lens of gradient-variation online learning. In this paper, we investigate online learning with Hölder smooth functions, a general class encompassing both smooth and non-smooth (Lipschitz) functions, and explore its implications for offline optimization. For (strongly) convex online functions, we design the corresponding gradient-variation online learning algorithm whose regret smoothly interpolates between the optimal guarantees in smooth and non-smooth regimes. Notably, our algorithms do not require prior knowledge of the Hölder smoothness parameter, exhibiting strong adaptivity over existing methods. Through online-to-batch conversion, this gradient-variation online adaptivity yields an optimal universal method for stochastic convex optimization under Hölder smoothness. However, achieving universality in offline strongly convex optimization is more challenging. We address this by integrating online adaptivity with a detection-based guess-and-check procedure, which, for the first time, yields a universal offline method that achieves accelerated convergence in the smooth regime while maintaining near-optimal convergence in the non-smooth one.

Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness

TL;DR

The paper builds a bridge between gradient-variation online learning under Hölder smoothness and universal offline optimization, showing that adaptivity to unknown smoothness enables accelerated convergence in offline settings. By developing AdaGrad-style step sizes, optimistic OGD, and stabilized online-to-batch conversions, it obtains universal guarantees for both convex and strongly convex objectives across Hölder scales. Specifically, convex Hölder optimization attains rates of the form in stochastic settings, while strongly convex cases achieve gradient-variation regret bounds that interpolate between smooth and non-smooth regimes, plus a detection-based grid-search scheme to handle unknown curvature. The combined online-to-batch framework yields universal offline methods with accelerated convergence in the smooth regime and near-optimal performance in the non-smooth one, and the grid-search construction further removes the need for prior knowledge of strong convexity, making the approach broadly applicable. The work highlights open questions on universality in unconstrained settings and suggests future directions for integrating online adaptivity into broader offline optimization frameworks.

Abstract

Smoothness is known to be crucial for acceleration in offline optimization, and for gradient-variation regret minimization in online learning. Interestingly, these two problems are actually closely connected -- accelerated optimization can be understood through the lens of gradient-variation online learning. In this paper, we investigate online learning with Hölder smooth functions, a general class encompassing both smooth and non-smooth (Lipschitz) functions, and explore its implications for offline optimization. For (strongly) convex online functions, we design the corresponding gradient-variation online learning algorithm whose regret smoothly interpolates between the optimal guarantees in smooth and non-smooth regimes. Notably, our algorithms do not require prior knowledge of the Hölder smoothness parameter, exhibiting strong adaptivity over existing methods. Through online-to-batch conversion, this gradient-variation online adaptivity yields an optimal universal method for stochastic convex optimization under Hölder smoothness. However, achieving universality in offline strongly convex optimization is more challenging. We address this by integrating online adaptivity with a detection-based guess-and-check procedure, which, for the first time, yields a universal offline method that achieves accelerated convergence in the smooth regime while maintaining near-optimal convergence in the non-smooth one.

Paper Structure

This paper contains 31 sections, 21 theorems, 74 equations, 1 table, 3 algorithms.

Key Result

Lemma 1

Suppose the function $f$ is $(L_\nu, \nu)$-Hölder smooth. Then, for any $\delta>0$, denoting by $L = \delta^{\frac{\nu-1}{1+\nu}} L_\nu^{\frac{2}{1+\nu}}$, it holds that for all $\mathbf{x},\mathbf{y}\in\mathbb{R}^d$:

Theorems & Definitions (42)

  • Definition 1: Weak/Strong Universality
  • Lemma 1
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Theorem 4
  • Remark 3
  • Remark 4
  • ...and 32 more