A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

Wei-Cheng Lee; Francesco Orabona

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

Wei-Cheng Lee, Francesco Orabona

TL;DR

This note provides a concise, self-contained proof that Tsallis-INF achieves a best-of-both-worlds guarantee for both adversarial and stochastic bandits without using Fenchel conjugates. It leverages standard online convex optimization tools, notably the FTRL regret lemma and a local-norm analysis, to derive a regret bound of $\mathbb{E}[\mathrm{Regret}_T] \leq 32 G \sqrt{(d-1) T}$ and, in the stochastic case with a unique minimizer, a pseudo-regret bound $\mathrm{P-Regret}_T \leq 256 G^2 \sum_{i: \mu_i \neq \mu^*} \frac{1+\ln T}{\Delta_i}$. The approach distills the core ideas from Zimmert and Seldin, reframing the algorithm as FTRL, and achieves a slight improvement in the $d$-dependence while keeping constants simple at the cost of tightness. Overall, the work clarifies a direct, conjugate-free path to best-of-both-worlds guarantees for Tsallis-INF, highlighting the role of local-norm regret bounds in online learning for bandits.

Abstract

In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

TL;DR

Abstract

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (6)