A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

Mohammad Pedramfar; Vaneet Aggarwal

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

Mohammad Pedramfar, Vaneet Aggarwal

TL;DR

This work tackles online convex optimization across diverse feedback types and adversaries by proposing a unified meta-algorithm framework that converts algorithms between settings (e.g., full-information, semi-bandit, bandit) and between first-order and zeroth-order feedback. The authors establish core connections, such as that semi-bandit online linear optimization for fully adaptive adversaries extends to online convex optimization, and show how to turn full-information solutions into semi-bandit or bandit ones with preserved regret. They introduce smoothing-based gradient estimators and two-point gradient techniques (e.g., FOTZO and FOTZO-2P) and derive regret bounds that recover many known results with simplified proofs, plus new deterministic zeroth-order guarantees (adaptive regret $O(\sqrt{T})$ and static regret $O(\log T)$ for strongly convex cases). The framework thus unifies prior results, facilitates cross-setting transfers, and provides a pathway to novel meta-algorithms applicable to non-convex problems as well. Overall, the work offers a principled, scalable approach to analyzing and designing meta-algorithms in online optimization with broad theoretical and practical impact.

Abstract

In this paper, we analyze the problem of online convex optimization in different settings, including different feedback types (full-information/semi-bandit/bandit/etc) in either stochastic or non-stochastic setting and different notions of regret (static adversarial regret/dynamic regret/adaptive regret). This is done through a framework which allows us to systematically propose and analyze meta-algorithms for the various settings described above. We show that any algorithm for online linear optimization with deterministic gradient feedback against fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, recovers several results in the literature with a simplified proof technique, and provides new results.

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

TL;DR

and static regret

for strongly convex cases). The framework thus unifies prior results, facilitates cross-setting transfers, and provides a pathway to novel meta-algorithms applicable to non-convex problems as well. Overall, the work offers a principled, scalable approach to analyzing and designing meta-algorithms in online optimization with broad theoretical and practical impact.

Abstract

Paper Structure (20 sections, 11 theorems, 22 equations, 5 figures)

This paper contains 20 sections, 11 theorems, 22 equations, 5 figures.

Introduction
Background and Notations
Problem Setup
Re-statement of Previous Result: Oblivious to fully adaptive adversary
Linear to convex with fully adaptive adversary
Full information feedback to semi-bandit feedback
Lipschitz to non-Lipschitz
Deterministic feedback to stochastic feedback
First order feedback to zeroth order feedback
First order feedback to deterministic zeroth order feedback
Applications
Conclusions
Acknowledgments
Proof of Theorem \ref{['thm:det-is-adaptive']}
Proof of Theorem \ref{['thm:main']}
...and 5 more sections

Key Result

Theorem 1

Let $i \in \{0, 1\}$ and assume ${\mathcal{A}}$ is a deterministic online algorithm designed for $i$-th order feedback and ${\mathbf{F}}$ is a function class. Then we have

Figures (5)

Figure 1: Summary of the main results
Figure 2: Full information to semi-bandit - $\mathtt{FTS}({\mathcal{A}})$
Figure 3: First order to zeroth order - $\mathtt{FOTZO}({\mathcal{A}})$
Figure 4: Semi-bandit to bandit - $\mathtt{STB}({\mathcal{A}})$
Figure 5: First order to zeroth order with two-point gradient estimator - $\mathtt{FOTZO\textrm{-}2P}({\mathcal{A}})$

Theorems & Definitions (16)

Remark 1
Theorem 1
Corollary 1
Definition 1
Theorem 2
Corollary 2: Theorem 14 in abernethy2008optimal
Theorem 3
Theorem 4
Theorem 5
Theorem 6
...and 6 more

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

TL;DR

Abstract

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (16)