Table of Contents
Fetching ...

Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

Mengmeng Li, Philipp J. Schneider, Jelisaveta Aleksić, Daniel Kuhn

Abstract

We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $\widetilde{\mathcal{O}}(\sqrt{T})$ regret in the adversarial regime and $\widetilde{\mathcal{O}}(\ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.

Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

Abstract

We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees regret in the adversarial regime and regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the -dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.

Paper Structure

This paper contains 12 sections, 13 theorems, 51 equations, 5 figures, 3 tables, 3 algorithms.

Key Result

Theorem 2.1

The regret of Algorithm alg:entropy-ftrl-contextual satisfies the following.

Figures (5)

  • Figure 1: Per-iteration runtime for different regularizers.
  • Figure 2: Stochastic setting for $\Delta=0.0625$ (Tsallis regularizer).
  • Figure 3: Stochastic setting for $\Delta=0.0625$ (Negative Shannon entropy regularizer).
  • Figure 4: Adversarial setting for $\Delta=0.0625$ (Tsallis regularizer).
  • Figure 5: Adversarial setting for $\Delta=0.0625$ (Negative Shannon entropy regularizer).

Theorems & Definitions (26)

  • Theorem 2.1: Best-of-both-worlds regret guarantee for contextual combinatorial bandits
  • Lemma 2.2: Original game vs. auxiliary game neu2020efficient
  • Lemma 2.3: Bias control
  • Lemma 2.4: Regret decomposition for the auxiliary game
  • Lemma 2.5: Refined entropy bound in the stochastic regime
  • Definition 1: Bregman divergence
  • Theorem 3.1: Convergence of Algorithm \ref{['alg:bisection']}
  • Corollary 3.2: Convergence with an approximate inverse oracle
  • Lemma A.1: Stability-penalty decomposition
  • proof : Proof of Lemma \ref{['lemma:stab-pen-decomp']}
  • ...and 16 more