A Batch Sequential Halving Algorithm without Performance Degradation
Sotetsu Koyamada, Soichiro Nishimori, Shin Ishii
TL;DR
This work tackles pure exploration in stochastic multi-armed bandits under fixed-size batch pulls by presenting two batched SH variants, BSH and ASH. The main theoretical contribution is a rigorous equivalence: ASH is algorithmically identical to SH when the batch budget satisfies $B \ge \max\{4, \tfrac{n}{b}\} \lceil \log_2 n \rceil$, ensuring the same optimal arm selection for total budget $T = b B$. Empirically, ASH demonstrates comparable simple regret to SH across large batches and remains competitive in smaller-batch regimes, validating batch-based efficiency without sacrificing performance. The results imply that batched SH approaches can scale to large budgets and complex evaluations (e.g., neural-network reward computations) while preserving the robustness of sequential SH.
Abstract
In this paper, we investigate the problem of pure exploration in the context of multi-armed bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches. Batching has been shown to enhance computational efficiency, but it can potentially lead to a degradation compared to the original sequential algorithm's performance due to delayed feedback and reduced adaptability. We introduce a simple batch version of the Sequential Halving (SH) algorithm (Karnin et al., 2013) and provide theoretical evidence that batching does not degrade the performance of the original algorithm under practical conditions. Furthermore, we empirically validate our claim through experiments, demonstrating the robust nature of the SH algorithm in fixed-size batch settings.
