The Batch Complexity of Bandit Pure Exploration
Adrienne Tuynman, Rémy Degenne
TL;DR
This work addresses batched, fixed-confidence pure exploration in stochastic multi-armed bandits by establishing an instance-dependent lower bound on batch complexity and proposing a general batched algorithm, PET, inspired by Track-and-Stop. PET uses phased uniform exploration followed by a Track-and-Stop-inspired tracking phase with a GLR-style stopping rule, achieving near-optimal sample complexity and batch complexity under mild assumptions. The framework is specialized to Best Arm Identification and Thresholding Bandits, with theoretical guarantees and experiments demonstrating favorable performance against state-of-the-art baselines. Overall, the paper advances understanding of the trade-off between adaptation (batching) and optimality in pure exploration, offering practical batch-efficient algorithms with provable guarantees.
Abstract
In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task. We then give a general batched algorithm and prove upper bounds on its expected sample complexity and batch complexity. We illustrate both lower and upper bounds on best-arm identification and thresholding bandits.
