Table of Contents
Fetching ...

A Batch Sequential Halving Algorithm without Performance Degradation

Sotetsu Koyamada, Soichiro Nishimori, Shin Ishii

TL;DR

This work tackles pure exploration in stochastic multi-armed bandits under fixed-size batch pulls by presenting two batched SH variants, BSH and ASH. The main theoretical contribution is a rigorous equivalence: ASH is algorithmically identical to SH when the batch budget satisfies $B \ge \max\{4, \tfrac{n}{b}\} \lceil \log_2 n \rceil$, ensuring the same optimal arm selection for total budget $T = b B$. Empirically, ASH demonstrates comparable simple regret to SH across large batches and remains competitive in smaller-batch regimes, validating batch-based efficiency without sacrificing performance. The results imply that batched SH approaches can scale to large budgets and complex evaluations (e.g., neural-network reward computations) while preserving the robustness of sequential SH.

Abstract

In this paper, we investigate the problem of pure exploration in the context of multi-armed bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches. Batching has been shown to enhance computational efficiency, but it can potentially lead to a degradation compared to the original sequential algorithm's performance due to delayed feedback and reduced adaptability. We introduce a simple batch version of the Sequential Halving (SH) algorithm (Karnin et al., 2013) and provide theoretical evidence that batching does not degrade the performance of the original algorithm under practical conditions. Furthermore, we empirically validate our claim through experiments, demonstrating the robust nature of the SH algorithm in fixed-size batch settings.

A Batch Sequential Halving Algorithm without Performance Degradation

TL;DR

This work tackles pure exploration in stochastic multi-armed bandits under fixed-size batch pulls by presenting two batched SH variants, BSH and ASH. The main theoretical contribution is a rigorous equivalence: ASH is algorithmically identical to SH when the batch budget satisfies , ensuring the same optimal arm selection for total budget . Empirically, ASH demonstrates comparable simple regret to SH across large batches and remains competitive in smaller-batch regimes, validating batch-based efficiency without sacrificing performance. The results imply that batched SH approaches can scale to large budgets and complex evaluations (e.g., neural-network reward computations) while preserving the robustness of sequential SH.

Abstract

In this paper, we investigate the problem of pure exploration in the context of multi-armed bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches. Batching has been shown to enhance computational efficiency, but it can potentially lead to a degradation compared to the original sequential algorithm's performance due to delayed feedback and reduced adaptability. We introduce a simple batch version of the Sequential Halving (SH) algorithm (Karnin et al., 2013) and provide theoretical evidence that batching does not degrade the performance of the original algorithm under practical conditions. Furthermore, we empirically validate our claim through experiments, demonstrating the robust nature of the SH algorithm in fixed-size batch settings.
Paper Structure (27 sections, 2 theorems, 12 equations, 9 figures, 7 algorithms)

This paper contains 27 sections, 2 theorems, 12 equations, 9 figures, 7 algorithms.

Key Result

Theorem 1

Given a stochastic bandit problem with $n\geq2$ arms, let $b \geq 2$ be the batch size and $B$ be the batch budget satisfying $B \geq \max \{ 4, \frac{n}{b} \} \lceil \log_2 n \rceil$. Then, the ASH algorithm (algo:ash) is algorithmically equivalent to the SH algorithm (algo:sh) with the same total

Figures (9)

  • Figure 1: Pictorial representation of breadth-first SH (BSH; \ref{['sec:bsh']}) and advance-first SH (ASH; \ref{['sec:ash']}) for an 8-armed bandit problem. Batch size $b$ is $24$ and batch budget $B$ is $8$. The same color indicates the same batch pull --- For example, in the first batch pull (blue), BSH pulls each of the 8 arms 3 times, while ASH pulls 3 arms 8 times each. BSH selects arms so that the number of pulls of each active arm becomes as equal as possible, while ASH selects arms so that once an arm is selected, it is pulled until the budget for the arm in the round is exhausted. These pull sequences are characterized by the target pulls $L^{\text{\color[rgb]{0, 0, 0.80}{B}}}$ and $L^{\text{\color[rgb]{0.80, 0, 0}{A}}}$: $L^{\text{\color[rgb]{0, 0, 0.80}{B}}}=$(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,...)$L^{\text{\color[rgb]{0.80, 0, 0}{A}}}=$(0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,...)
  • Figure 2: Inequality \ref{['eq:ash-to-prove']}.
  • Figure 3: \ref{['lem']}.
  • Figure 4: Visualization of conditions \ref{['eq:ash-cond-2']} and \ref{['eq:ash-cond']} for $n \leq 1024$, $B \leq 1024$, and $b \in \{4, 64, 1024\}$.
  • Figure 5: Polynomial$(\alpha)$
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Lemma 1