A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates
Wei-Cheng Lee, Francesco Orabona
TL;DR
This note provides a concise, self-contained proof that Tsallis-INF achieves a best-of-both-worlds guarantee for both adversarial and stochastic bandits without using Fenchel conjugates. It leverages standard online convex optimization tools, notably the FTRL regret lemma and a local-norm analysis, to derive a regret bound of $\mathbb{E}[\mathrm{Regret}_T] \leq 32 G \sqrt{(d-1) T}$ and, in the stochastic case with a unique minimizer, a pseudo-regret bound $\mathrm{P-Regret}_T \leq 256 G^2 \sum_{i: \mu_i \neq \mu^*} \frac{1+\ln T}{\Delta_i}$. The approach distills the core ideas from Zimmert and Seldin, reframing the algorithm as FTRL, and achieves a slight improvement in the $d$-dependence while keeping constants simple at the cost of tightness. Overall, the work clarifies a direct, conjugate-free path to best-of-both-worlds guarantees for Tsallis-INF, highlighting the role of local-norm regret bounds in online learning for bandits.
Abstract
In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.
