Table of Contents
Fetching ...

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

Wei-Cheng Lee, Francesco Orabona

TL;DR

This note provides a concise, self-contained proof that Tsallis-INF achieves a best-of-both-worlds guarantee for both adversarial and stochastic bandits without using Fenchel conjugates. It leverages standard online convex optimization tools, notably the FTRL regret lemma and a local-norm analysis, to derive a regret bound of $\mathbb{E}[\mathrm{Regret}_T] \leq 32 G \sqrt{(d-1) T}$ and, in the stochastic case with a unique minimizer, a pseudo-regret bound $\mathrm{P-Regret}_T \leq 256 G^2 \sum_{i: \mu_i \neq \mu^*} \frac{1+\ln T}{\Delta_i}$. The approach distills the core ideas from Zimmert and Seldin, reframing the algorithm as FTRL, and achieves a slight improvement in the $d$-dependence while keeping constants simple at the cost of tightness. Overall, the work clarifies a direct, conjugate-free path to best-of-both-worlds guarantees for Tsallis-INF, highlighting the role of local-norm regret bounds in online learning for bandits.

Abstract

In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

TL;DR

This note provides a concise, self-contained proof that Tsallis-INF achieves a best-of-both-worlds guarantee for both adversarial and stochastic bandits without using Fenchel conjugates. It leverages standard online convex optimization tools, notably the FTRL regret lemma and a local-norm analysis, to derive a regret bound of and, in the stochastic case with a unique minimizer, a pseudo-regret bound . The approach distills the core ideas from Zimmert and Seldin, reframing the algorithm as FTRL, and achieves a slight improvement in the -dependence while keeping constants simple at the cost of tightness. Overall, the work clarifies a direct, conjugate-free path to best-of-both-worlds guarantees for Tsallis-INF, highlighting the role of local-norm regret bounds in online learning for bandits.

Abstract

In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.

Paper Structure

This paper contains 10 sections, 3 theorems, 21 equations, 1 algorithm.

Key Result

Theorem 1

Assume that $0\leq g_{t,i}\leq G$, for all $t=1, \dots, T$, $i=1, \dots, d$, where $d\geq 1$. Then, Algorithm alg:tsallis satisfies Moreover, if in addition the $g_{t,i}$ are i.i.d. from a distribution $\rho_i$ with mean $\mu_i$ for $i=1,\dots, d$, and $\mathop{\mathrm{argmin}}_i \mu_i$ is unique, then we also have

Theorems & Definitions (6)

  • Theorem 1
  • Lemma 2
  • proof
  • Lemma 3: Orabona19
  • proof
  • proof : Proof of Theorem \ref{['thm:main']}