Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

Yu-Hu Yan; Peng Zhao; Zhi-Hua Zhou

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou

TL;DR

The paper tackles online convex optimization where the function type and curvature are unknown, and environmental niceness varies over time. It proposes a three-layer online ensemble with a novel universal optimism design and cascaded negative corrections to achieve gradient-variation guarantees across strongly convex, exp-concave, and convex losses, using only one gradient query per round. It proves bounds of $O(\

Abstract

In this paper, we propose an online convex optimization approach with two different levels of adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures of the online functions, while at a lower level, it can exploit the unknown niceness of the environments and attain problem-dependent guarantees. Specifically, we obtain $\mathcal{O}(\log V_T)$, $\mathcal{O}(d \log V_T)$ and $\hat{\mathcal{O}}(\sqrt{V_T})$ regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where $d$ is the dimension, $V_T$ denotes problem-dependent gradient variations and the $\hat{\mathcal{O}}(\cdot)$-notation omits $\log V_T$ factors. Our result not only safeguards the worst-case guarantees but also directly implies the small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic convex optimization and game theory problems, our result enhances the existing universal guarantees. Our approach is based on a multi-layer online ensemble framework incorporating novel ingredients, including a carefully designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Notably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with carefully designed surrogate losses.

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

TL;DR

Abstract

and

regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where

is the dimension,

denotes problem-dependent gradient variations and the

-notation omits

factors. Our result not only safeguards the worst-case guarantees but also directly implies the small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic convex optimization and game theory problems, our result enhances the existing universal guarantees. Our approach is based on a multi-layer online ensemble framework incorporating novel ingredients, including a carefully designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Notably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with carefully designed surrogate losses.

Paper Structure (33 sections, 19 theorems, 132 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 33 sections, 19 theorems, 132 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
High Level: Adaptive to Unknown Curvature of Online Functions
Low Level: Adaptive to Unknown Niceness of Online Environments
Our Contributions and Techniques
Preliminaries
Our Approach
Universal Optimism Design
Negative Terms for Cancellation
Endogenous Negativity: Stability Analysis of Meta Algorithms
Exogenous Negativity: Cascaded Correction Terms
Overall Algorithm: A Multi-layer Online Ensemble Structure
Universal Regret Guarantees
Improved Gradient Query Complexity
Conclusion
Applications
...and 18 more sections

Key Result

Lemma 1

Under Assumptions assum:boundedness and assum:smoothness, if the optimism is chosen as $m_{t,i} = r_{t-1,i} = \langle \nabla f_{t-1}(\mathbf{x}_{t-1}), \mathbf{x}_{t-1} - \mathbf{x}_{t-1,i} \rangle$, it holds that

Figures (1)

Figure 1: Decomposition of the positive term $\|\mathbf{x}_t - \mathbf{x}_{t-1}\|^2$ and how it is handled by the multi-layer online ensemble via endogenous negativity from meta algorithm and exogenous negativity from cascaded corrections.

Theorems & Definitions (36)

Lemma 1: Key Lemma
Lemma 2
Theorem 1
Corollary 1
Theorem 2
Proposition 1
Theorem 3
Theorem 4
proof
proof : Proof of lem:MsMwC-refine
...and 26 more

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

TL;DR

Abstract

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (36)