A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization
Mohammad Pedramfar, Vaneet Aggarwal
TL;DR
This work tackles online convex optimization across diverse feedback types and adversaries by proposing a unified meta-algorithm framework that converts algorithms between settings (e.g., full-information, semi-bandit, bandit) and between first-order and zeroth-order feedback. The authors establish core connections, such as that semi-bandit online linear optimization for fully adaptive adversaries extends to online convex optimization, and show how to turn full-information solutions into semi-bandit or bandit ones with preserved regret. They introduce smoothing-based gradient estimators and two-point gradient techniques (e.g., FOTZO and FOTZO-2P) and derive regret bounds that recover many known results with simplified proofs, plus new deterministic zeroth-order guarantees (adaptive regret $O(\sqrt{T})$ and static regret $O(\log T)$ for strongly convex cases). The framework thus unifies prior results, facilitates cross-setting transfers, and provides a pathway to novel meta-algorithms applicable to non-convex problems as well. Overall, the work offers a principled, scalable approach to analyzing and designing meta-algorithms in online optimization with broad theoretical and practical impact.
Abstract
In this paper, we analyze the problem of online convex optimization in different settings, including different feedback types (full-information/semi-bandit/bandit/etc) in either stochastic or non-stochastic setting and different notions of regret (static adversarial regret/dynamic regret/adaptive regret). This is done through a framework which allows us to systematically propose and analyze meta-algorithms for the various settings described above. We show that any algorithm for online linear optimization with deterministic gradient feedback against fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, recovers several results in the literature with a simplified proof technique, and provides new results.
