LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Masahiro Kato; Shinji Ito

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Masahiro Kato, Shinji Ito

TL;DR

This work delivers a practical BoBW algorithm for linear contextual bandits by leveraging FTRL with Tsallis entropy, producing tight regret bounds that scale favorably with time in stochastic settings and remain robust in adversarial scenarios. By introducing the α-LC-Tsallis-INF and leveraging a regression estimator along with a context-aware exploration policy, the authors achieve O(log T) regret under a margin condition and O(√T) in adversarial regimes, with improved T-dependence compared to Shannon-entropy variants. The analysis encompasses arm-dependent feature settings and includes a regret-transformation framework to translate results between arm-dependent and arm-independent formulations, plus practical considerations on computation via MGR and exact Σ^{-1} when feasible. These contributions advance BoBW strategies for linear contextual bandits and offer practically implementable alternatives to black-box BoBW methods, with clear pathways to extend under milder margin conditions. The work thus provides a theoretically grounded, computation-conscious approach with meaningful implications for robust online decision-making in dynamic environments.

Abstract

We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $α$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $β\in (1, \infty]$, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+β}{2+β}}T^{\frac{1}{2+β}}\right)$ regret under the margin condition.

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

TL;DR

Abstract

-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most

in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most

in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter

, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves

regret under the margin condition.

Paper Structure (54 sections, 25 theorems, 134 equations, 1 table, 2 algorithms)

This paper contains 54 sections, 25 theorems, 134 equations, 1 table, 2 algorithms.

Introduction
Problem setting
Contributions
Related work
Preliminaries
Boundedness of variables
Assumptions on contexts
Contexts and feature map
Contexts in a stochastic regime
Exploration policy
DGP: stochastic and adversarial regimes
(1) Adversarial regime
(2) Stochastic regime with a margin condition
(3) Adversarial regime with a self-bounding constraint
Algorithm: TEXT-LC-Tsallis-INF
...and 39 more sections

Key Result

Theorem 4.1

Consider the $1/2$-LC-Tsallis-INF. Assumptions asm:contextual_dist and asm:contextual_dist2--asm:bounded_random hold. Then, the regret satisfies and $\omega_t \in [0, 1]$ is given as

Theorems & Definitions (48)

Definition 2.5: Stochastic regime with a margin condition
Definition 2.6: Adversarial regime with a self-bounding constraint
Theorem 4.1: General regret bounds
Theorem 4.2: Regret upper bound in an adversarial regime
Theorem 4.3: Regret upper bound in a stochastic regime with a margin condition
Theorem 4.4: Regret upper bound in an adversarial regime with a self-bounding constraint
Theorem 4.5
Proposition A.1
proof : Proof of Proposition \ref{['lem:basic']}
Lemma A.2
...and 38 more

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

TL;DR

Abstract

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (48)