Table of Contents
Fetching ...

Optimal Arm Elimination Algorithms for Combinatorial Bandits

Yuxiao Wen, Yanjun Han, Zhengyuan Zhou

TL;DR

This work introduces a novel elimination scheme that partitions arms into three categories, and incorporates explicit exploration to update these sets, and demonstrates the efficacy of this algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit.

Abstract

Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and assortment optimization. While extensions of upper confidence bound (UCB) algorithms arise naturally in this context, adapting arm elimination methods has proved more challenging. We introduce a novel elimination scheme that partitions arms into three categories (confirmed, active, and eliminated), and incorporates explicit exploration to update these sets. We demonstrate the efficacy of our algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit. In both cases, our approach achieves near-optimal regret, whereas UCB-based methods can provably fail due to insufficient explicit exploration. Matching lower bounds are also provided.

Optimal Arm Elimination Algorithms for Combinatorial Bandits

TL;DR

This work introduces a novel elimination scheme that partitions arms into three categories, and incorporates explicit exploration to update these sets, and demonstrates the efficacy of this algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit.

Abstract

Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and assortment optimization. While extensions of upper confidence bound (UCB) algorithms arise naturally in this context, adapting arm elimination methods has proved more challenging. We introduce a novel elimination scheme that partitions arms into three categories (confirmed, active, and eliminated), and incorporates explicit exploration to update these sets. We demonstrate the efficacy of our algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit. In both cases, our approach achieves near-optimal regret, whereas UCB-based methods can provably fail due to insufficient explicit exploration. Matching lower bounds are also provided.

Paper Structure

This paper contains 34 sections, 29 theorems, 111 equations, 5 algorithms.

Key Result

Lemma 1

Fix any $\delta\in(0,1)$. With probability at least $1-\delta$, we have for every arm $a$ at every time $t$, where $\bar{r}_{t,a}$ is the empirical mean and $n_{t,a}$ the number of observations at time $t$.

Theorems & Definitions (43)

  • Lemma 1
  • Lemma 2
  • Theorem 2.1: Instance-dependent regret
  • Corollary 1
  • Theorem 2.2
  • Theorem 2.3: Minimax regret
  • Theorem 2.4
  • Lemma 3
  • Remark 1
  • Lemma 4
  • ...and 33 more