Table of Contents
Fetching ...

Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization

Peng Zhao, Yu-Hu Yan, Hang Yu, Zhi-Hua Zhou

TL;DR

This work addresses universal online learning in Online Convex Optimization when curvature is unknown, introducing UniGrad to achieve both universality and problem-dependent adaptivity to gradient variation. It presents two core realizations, UniGrad.Correct and UniGrad.Bregman, each delivering gradient-variation regret bounds that scale with V_T across strongly convex, exp-concave, and convex losses; UniGrad.Correct emphasizes stability cancellation via a cascaded three-layer ensemble, while UniGrad.Bregman leverages negative Bregman divergence to attain optimal convex rates. A shared meta-ensemble (MoM) and a surrogate-optimization extension (UniGrad++) further yield a one-gradient-per-round variant with comparable guarantees, and an anytime extension removes dependence on the horizon T. The results unify universal rates with gradient-variation adaptivity, enabling small-loss and gradient-variance guarantees, SEA-model compatibility, and faster convergence in online games, with practical implications for adversarial-stochastic hybrids and dynamic environments. Overall, UniGrad advances parameter-free, horizon-free, and gradient-variation-aware online learning, offering robust theoretical guarantees and broad applicability across OCO and game-theoretic settings.

Abstract

Universal online learning aims to achieve optimal regret guarantees without requiring prior knowledge of the curvature of online functions. Existing methods have established minimax-optimal regret bounds for universal online learning, where a single algorithm can simultaneously attain $\mathcal{O}(\sqrt{T})$ regret for convex functions, $\mathcal{O}(d \log T)$ for exp-concave functions, and $\mathcal{O}(\log T)$ for strongly convex functions, where $T$ is the number of rounds and $d$ is the dimension of the feasible domain. However, these methods still lack problem-dependent adaptivity. In particular, no universal method provides regret bounds that scale with the gradient variation $V_T$, a key quantity that plays a crucial role in applications such as stochastic optimization and fast-rate convergence in games. In this work, we introduce UniGrad, a novel approach that achieves both universality and adaptivity, with two distinct realizations: UniGrad.Correct and UniGrad.Bregman. Both methods achieve universal regret guarantees that adapt to gradient variation, simultaneously attaining $\mathcal{O}(\log V_T)$ regret for strongly convex functions and $\mathcal{O}(d \log V_T)$ regret for exp-concave functions. For convex functions, the regret bounds differ: UniGrad.Correct achieves an $\mathcal{O}(\sqrt{V_T \log V_T})$ bound while preserving the RVU property that is crucial for fast convergence in online games, whereas UniGrad.Bregman achieves the optimal $\mathcal{O}(\sqrt{V_T})$ regret bound through a novel design. Both methods employ a meta algorithm with $\mathcal{O}(\log T)$ base learners, which naturally requires $\mathcal{O}(\log T)$ gradient queries per round. To enhance computational efficiency, we introduce UniGrad++, which retains the regret while reducing the gradient query to just $1$ per round via surrogate optimization. We further provide various implications.

Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization

TL;DR

This work addresses universal online learning in Online Convex Optimization when curvature is unknown, introducing UniGrad to achieve both universality and problem-dependent adaptivity to gradient variation. It presents two core realizations, UniGrad.Correct and UniGrad.Bregman, each delivering gradient-variation regret bounds that scale with V_T across strongly convex, exp-concave, and convex losses; UniGrad.Correct emphasizes stability cancellation via a cascaded three-layer ensemble, while UniGrad.Bregman leverages negative Bregman divergence to attain optimal convex rates. A shared meta-ensemble (MoM) and a surrogate-optimization extension (UniGrad++) further yield a one-gradient-per-round variant with comparable guarantees, and an anytime extension removes dependence on the horizon T. The results unify universal rates with gradient-variation adaptivity, enabling small-loss and gradient-variance guarantees, SEA-model compatibility, and faster convergence in online games, with practical implications for adversarial-stochastic hybrids and dynamic environments. Overall, UniGrad advances parameter-free, horizon-free, and gradient-variation-aware online learning, offering robust theoretical guarantees and broad applicability across OCO and game-theoretic settings.

Abstract

Universal online learning aims to achieve optimal regret guarantees without requiring prior knowledge of the curvature of online functions. Existing methods have established minimax-optimal regret bounds for universal online learning, where a single algorithm can simultaneously attain regret for convex functions, for exp-concave functions, and for strongly convex functions, where is the number of rounds and is the dimension of the feasible domain. However, these methods still lack problem-dependent adaptivity. In particular, no universal method provides regret bounds that scale with the gradient variation , a key quantity that plays a crucial role in applications such as stochastic optimization and fast-rate convergence in games. In this work, we introduce UniGrad, a novel approach that achieves both universality and adaptivity, with two distinct realizations: UniGrad.Correct and UniGrad.Bregman. Both methods achieve universal regret guarantees that adapt to gradient variation, simultaneously attaining regret for strongly convex functions and regret for exp-concave functions. For convex functions, the regret bounds differ: UniGrad.Correct achieves an bound while preserving the RVU property that is crucial for fast convergence in online games, whereas UniGrad.Bregman achieves the optimal regret bound through a novel design. Both methods employ a meta algorithm with base learners, which naturally requires gradient queries per round. To enhance computational efficiency, we introduce UniGrad++, which retains the regret while reducing the gradient query to just per round via surrogate optimization. We further provide various implications.

Paper Structure

This paper contains 128 sections, 33 theorems, 325 equations, 6 figures, 4 tables, 7 algorithms.

Key Result

Lemma 1

Under assum:smoothness-X, the empirical gradient variation can be upper bounded as follows:

Figures (6)

  • Figure 1: Decomposition of the positive term $\|\mathbf{x}_t - \mathbf{x}_{t-1}\|^2$ and how it is handled by our online ensemble method via intrinsic negative stability terms and injected corrections.
  • Figure 2: The summary of the theoretical results in our work. Specifically, we propose two methods named UniGrad.Correct and UniGrad.Bregman (thm:unigrad-correct and thm:unigrad-bregman) to achieve gradient-variation universal regret. Both methods can be strengthened to the one-gradient feedback scenario (thm:unigrad-correct-1grad and thm:unigrad-bregman-1grad). Besides, our results find important implications in small-loss and gradient-variance problem-dependent regret (cor:FT-WT-Correct and cor:FT-bregman), stochastically extended adversarial (SEA) model (thm:SEA-correct and thm:SEA-bregman), and game theory (thm:game). Furthermore, our results can be extended to the anytime setup (without knowing the time horizon $T$) in thm:anytime.
  • Figure 3: Comparison of the three-layer online ensemble structures between the conference version NeurIPS'23:universal and UniGrad.Correct. The key difference lies in how base learners are managed: NeurIPS'23:universal maintain a separate group of base learners for each MoM-Mid, whereas UniGrad.Correct employs shared base learners across all MoM-Mid's, thereby reducing the total number of base learners from $\mathcal{O}((\log T)^2)$ to $\mathcal{O}(\log T)$.
  • Figure 4: Universality: comparisons on three problem classes---convex, exp-concave, and strongly convex---across three datasets (ijcnn1, svmguide1, skin_nonskin). Rows correspond to datasets, columns correspond to problem classes. Our methods are evaluated against the optimal algorithm specifically designed for each class, showing comparable regret performance.
  • Figure 5: Adaptivity: comparisons on the adaptivity of our methods against USC of ICML'22:universal. Our methods outperform USC when the gradient variation $V_T$ is small, e.g., $V_T = \mathcal{O}(1)$ in fig:adaptivity-vt-o1, and show comparable performance when $V_T = \mathcal{O}(T)$ in fig:adaptivity-vt-ot.
  • ...and 1 more figures

Theorems & Definitions (41)

  • Definition 1: Strong Convexity
  • Definition 2: Exp-Concavity
  • Lemma 1: Empirical Gradient Variation Conversion
  • Lemma 2: MsMwC Regret
  • Lemma 3: Universality of Optimism
  • Lemma 4: Two-layer MoM
  • Lemma 5
  • Theorem 1
  • Remark 1: Technique
  • Remark 2: Comparison to Conference Version
  • ...and 31 more