Table of Contents
Fetching ...

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Shintaro Nakamura, Masashi Sugiyama

TL;DR

This work addresses real-valued combinatorial pure exploration when the action set is polynomial in the number of arms (R-CPE-MAB) by casting it as a transductive linear bandit problem and introducing CombGapE, a gap-based exploration algorithm. CombGapE selects two candidate actions and pulls the most informative arm to shrink the confidence bound on their gap, achieving an identification accuracy of at least $1 - \delta$ and a sample complexity that matches the information-theoretic lower bound up to a problem-dependent constant. The authors also derive a tight confidence-bound framework for arm selection, compare against RAGE, Peace, and GenTS-Explore, and demonstrate superior performance on knapsack-like synthetic tasks and a real-world optimal-transport dataset. The results advance practical pure exploration in linear/combinatorial bandits, enabling efficient real-valued decision-making in structured combinatorial problems under uncertainty.

Abstract

We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

TL;DR

This work addresses real-valued combinatorial pure exploration when the action set is polynomial in the number of arms (R-CPE-MAB) by casting it as a transductive linear bandit problem and introducing CombGapE, a gap-based exploration algorithm. CombGapE selects two candidate actions and pulls the most informative arm to shrink the confidence bound on their gap, achieving an identification accuracy of at least and a sample complexity that matches the information-theoretic lower bound up to a problem-dependent constant. The authors also derive a tight confidence-bound framework for arm selection, compare against RAGE, Peace, and GenTS-Explore, and demonstrate superior performance on knapsack-like synthetic tasks and a real-world optimal-transport dataset. The results advance practical pure exploration in linear/combinatorial bandits, enabling efficient real-valued decision-making in structured combinatorial problems under uncertainty.

Abstract

We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
Paper Structure (19 sections, 7 theorems, 31 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 7 theorems, 31 equations, 2 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.1

Let $T_s(t)$ be the number of times arm $s$ has been pulled before round $t$. Then, for any $t\in\mathbb{N}$ and $\boldsymbol{\pi}^k,\boldsymbol{\pi}^l \in \mathcal{A}$, with probability at least $1 - \delta$, we have

Figures (2)

  • Figure 1: A simple sketch of the shortest path problem. One candidate of $\boldsymbol{\pi}$ can be $\boldsymbol{\pi} = \left(1, 0, 1, 0, 0, 1, 0 \right)^{\top}$.
  • Figure 2: A simple sketch of the optimal transport problem. One candidate of $\boldsymbol{\pi}$ can be $\boldsymbol{\pi} = \left(20001200021000100022\right).$

Theorems & Definitions (12)

  • Proposition 3.1
  • Proposition 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Lemma C.1: Hoeffding's inequality SChen2014
  • proof : Proof of Proposition \ref{['KeyProposition']}
  • proof
  • proof
  • Lemma F.1
  • proof
  • ...and 2 more