Table of Contents
Fetching ...

The Batch Complexity of Bandit Pure Exploration

Adrienne Tuynman, Rémy Degenne

TL;DR

This work addresses batched, fixed-confidence pure exploration in stochastic multi-armed bandits by establishing an instance-dependent lower bound on batch complexity and proposing a general batched algorithm, PET, inspired by Track-and-Stop. PET uses phased uniform exploration followed by a Track-and-Stop-inspired tracking phase with a GLR-style stopping rule, achieving near-optimal sample complexity and batch complexity under mild assumptions. The framework is specialized to Best Arm Identification and Thresholding Bandits, with theoretical guarantees and experiments demonstrating favorable performance against state-of-the-art baselines. Overall, the paper advances understanding of the trade-off between adaptation (batching) and optimality in pure exploration, offering practical batch-efficient algorithms with provable guarantees.

Abstract

In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task. We then give a general batched algorithm and prove upper bounds on its expected sample complexity and batch complexity. We illustrate both lower and upper bounds on best-arm identification and thresholding bandits.

The Batch Complexity of Bandit Pure Exploration

TL;DR

This work addresses batched, fixed-confidence pure exploration in stochastic multi-armed bandits by establishing an instance-dependent lower bound on batch complexity and proposing a general batched algorithm, PET, inspired by Track-and-Stop. PET uses phased uniform exploration followed by a Track-and-Stop-inspired tracking phase with a GLR-style stopping rule, achieving near-optimal sample complexity and batch complexity under mild assumptions. The framework is specialized to Best Arm Identification and Thresholding Bandits, with theoretical guarantees and experiments demonstrating favorable performance against state-of-the-art baselines. Overall, the paper advances understanding of the trade-off between adaptation (batching) and optimality in pure exploration, offering practical batch-efficient algorithms with provable guarantees.

Abstract

In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task. We then give a general batched algorithm and prove upper bounds on its expected sample complexity and batch complexity. We illustrate both lower and upper bounds on best-arm identification and thresholding bandits.

Paper Structure

This paper contains 31 sections, 35 theorems, 121 equations, 3 figures, 1 algorithm.

Key Result

Lemma 2.0

Suppose that a $\delta$-correct algorithm satisfies $\mathbb{P}_{\bm\mu}\left(\tau_\delta >\gamma T^\star(\bm\mu)\ln(1/\delta)\right)\leq c$ for some $\gamma,c > 0$ on any Gaussian instance $\bm\mu$ with variance $\sigma^2$ with $T^\star(\bm\mu)\in (T_{\min},T_{\max})$. Let $(\bm\mu^n)_{0\leq n\leq with

Figures (3)

  • Figure 1: Illustration of a sequence of instances
  • Figure 2: $\bm b$ satisfying $\overline{T}^\star(\mathcal{B}_\infty(\bm\mu,\varepsilon))=T^\star(\bm b)$
  • Figure 3: Experimental results, $\delta=0.05$, $N=1000$ runs

Theorems & Definitions (61)

  • Lemma 2.0
  • Theorem 2.2
  • Lemma 3.1: garivierOptimalBestArm2016
  • Lemma 3.2
  • Definition 3.3
  • Lemma 3.4
  • proof
  • Theorem 3.5
  • Definition 3.6
  • Lemma 3.7
  • ...and 51 more