A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Shintaro Nakamura; Masashi Sugiyama

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Shintaro Nakamura, Masashi Sugiyama

TL;DR

This work addresses real-valued combinatorial pure exploration when the action set is polynomial in the number of arms (R-CPE-MAB) by casting it as a transductive linear bandit problem and introducing CombGapE, a gap-based exploration algorithm. CombGapE selects two candidate actions and pulls the most informative arm to shrink the confidence bound on their gap, achieving an identification accuracy of at least $1 - \delta$ and a sample complexity that matches the information-theoretic lower bound up to a problem-dependent constant. The authors also derive a tight confidence-bound framework for arm selection, compare against RAGE, Peace, and GenTS-Explore, and demonstrate superior performance on knapsack-like synthetic tasks and a real-world optimal-transport dataset. The results advance practical pure exploration in linear/combinatorial bandits, enabling efficient real-valued decision-making in structured combinatorial problems under uncertainty.

Abstract

We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

TL;DR

and a sample complexity that matches the information-theoretic lower bound up to a problem-dependent constant. The authors also derive a tight confidence-bound framework for arm selection, compare against RAGE, Peace, and GenTS-Explore, and demonstrate superior performance on knapsack-like synthetic tasks and a real-world optimal-transport dataset. The results advance practical pure exploration in linear/combinatorial bandits, enabling efficient real-valued decision-making in structured combinatorial problems under uncertainty.

Abstract

Paper Structure (19 sections, 7 theorems, 31 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 7 theorems, 31 equations, 2 figures, 3 tables, 2 algorithms.

Introduction
Problem Formulation
The Arm Selection Strategy
Limitation of Existing Works in CPE-MAB
Confidence Bounds and the Arm Selection Strategy
CombGapE Algorithm and Theoretical Analysis
CombGapE Algorithm
Accuracy and the Sample Complexity
Comparison with Existing Works
Experiment
Experiment on Synthetic Data
Experiment on Real-World Data
Conclusion
Situations where we can assume the size of $\mathcal{A}$ is polynomial in $d$
Limitation of existing works
...and 4 more sections

Key Result

Proposition 3.1

Let $T_s(t)$ be the number of times arm $s$ has been pulled before round $t$. Then, for any $t\in\mathbb{N}$ and $\boldsymbol{\pi}^k,\boldsymbol{\pi}^l \in \mathcal{A}$, with probability at least $1 - \delta$, we have

Figures (2)

Figure 1: A simple sketch of the shortest path problem. One candidate of $\boldsymbol{\pi}$ can be $\boldsymbol{\pi} = \left(1, 0, 1, 0, 0, 1, 0 \right)^{\top}$.
Figure 2: A simple sketch of the optimal transport problem. One candidate of $\boldsymbol{\pi}$ can be $\boldsymbol{\pi} = \left(20001200021000100022\right).$

Theorems & Definitions (12)

Proposition 3.1
Proposition 3.1
Theorem 4.1
Theorem 4.2
Lemma C.1: Hoeffding's inequality SChen2014
proof : Proof of Proposition \ref{['KeyProposition']}
proof
proof
Lemma F.1
proof
...and 2 more

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

TL;DR

Abstract

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)