Optimal Batched Best Arm Identification

Tianyuan Jin; Yu Yang; Jing Tang; Xiaokui Xiao; Pan Xu

Optimal Batched Best Arm Identification

Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu

TL;DR

This work addresses batched best arm identification under fixed confidence, aiming to identify the unique best arm with probability at least $1-\delta$ while minimizing both sample and batch complexity. It introduces Tri-BBAI, a four-stage batched algorithm that attains asymptotically optimal sample complexity with only three batches in expectation by leveraging batch-wise allocations $\bm{w}^*(\bm{b}^q)$ and budgets $T^*(\bm{b}^q)$ and a Chernoff-based stopping rule. Building on Tri-BBAI, the paper presents Opt-BBAI, which achieves near-optimal non-asymptotic sample and batch complexity and matches Tri-BBAI asymptotically as $\delta\to0$, without conditioning on the event of returning the best arm. A key novelty is the introduction of a “Checking for Best Arm Elimination” procedure in Stage IV, which decouples complexity from low-probability events and may benefit other elimination-based batched BAI methods. Overall, the results yield practical batched strategies with strong theoretical guarantees and favorable empirical performance compared to Track-and-Stop and other batched baselines.

Abstract

We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability $1-δ$ for some small constant $δ>0$ while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i.e., $δ\rightarrow 0$) and runs in $3$ batches in expectation. Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i.e., $δ$ is finite), while enjoying the same batch and sample complexity as Tri-BBAI when $δ$ tends to zero. Moreover, in the non-asymptotic setting, the complexity of previous batch algorithms is usually conditioned on the event that the best arm is returned (with a probability of at least $1-δ$), which is potentially unbounded in cases where a sub-optimal arm is returned. In contrast, the complexity of Opt-BBAI does not rely on such an event. This is achieved through a novel procedure that we design for checking whether the best arm is eliminated, which is of independent interest.

Optimal Batched Best Arm Identification

TL;DR

This work addresses batched best arm identification under fixed confidence, aiming to identify the unique best arm with probability at least

while minimizing both sample and batch complexity. It introduces Tri-BBAI, a four-stage batched algorithm that attains asymptotically optimal sample complexity with only three batches in expectation by leveraging batch-wise allocations

and budgets

and a Chernoff-based stopping rule. Building on Tri-BBAI, the paper presents Opt-BBAI, which achieves near-optimal non-asymptotic sample and batch complexity and matches Tri-BBAI asymptotically as

, without conditioning on the event of returning the best arm. A key novelty is the introduction of a “Checking for Best Arm Elimination” procedure in Stage IV, which decouples complexity from low-probability events and may benefit other elimination-based batched BAI methods. Overall, the results yield practical batched strategies with strong theoretical guarantees and favorable empirical performance compared to Track-and-Stop and other batched baselines.

Abstract

for some small constant

while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i.e.,

) and runs in

batches in expectation. Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i.e.,

is finite), while enjoying the same batch and sample complexity as Tri-BBAI when

tends to zero. Moreover, in the non-asymptotic setting, the complexity of previous batch algorithms is usually conditioned on the event that the best arm is returned (with a probability of at least

), which is potentially unbounded in cases where a sub-optimal arm is returned. In contrast, the complexity of Opt-BBAI does not rely on such an event. This is achieved through a novel procedure that we design for checking whether the best arm is eliminated, which is of independent interest.

Paper Structure (33 sections, 20 theorems, 116 equations, 4 tables, 2 algorithms)

This paper contains 33 sections, 20 theorems, 116 equations, 4 tables, 2 algorithms.

Introduction
Related Work
Achieving Asymptotic Optimality with at Most Three Batches
Reward Distribution
The Proposed Three-Batch Algorithm
Stage I: Initial exploration.
Stage II: Exploration using $\bm{w}^*(\bm{b}^{q})$ and $T^*(\bm{b}^{q})$.
Stage III: Statistical test with Chernoff’s stopping rule.
Stage IV: Re-exploration.
Theoretical Guarantees of Tri-BBAI
Best of Both Worlds: Achieving Asymptotic and Non-asymptotic Optimalities
Successive Elimination.
Checking for Best Arm Elimination.
Conclusion, Limitations, and Future Work
Computing $w^*({\mu})$ and $T^*({\mu})$
...and 18 more sections

Key Result

Theorem 3.1

Given any $\delta>0$, let $\epsilon=\frac{1}{\log \log (\delta^{-1})}$, $L_1={\sqrt{\log \delta^{-1}}}$, $L_2=\frac{\log \delta^{-1}\log\log \delta^{-1}}{n}$, and $L_3=(\log \delta^{-1})^2$. Meanwhile, for any given $\alpha\in (1,e/2]$, define function $\beta(t,\delta)$ as $\beta(t,\delta)=\log ( \l

Theorems & Definitions (23)

Theorem 3.1: Asymptotic Sample Complexity
Theorem 3.2: Correctness
Theorem 3.3: Asumptotic Batch Complexity
Remark 3.4
Example 4.1
Theorem 4.2
Theorem 4.3
Remark 4.4
Lemma A.1: garivier2016optimal
Lemma A.2: garivier2016optimal
...and 13 more

Optimal Batched Best Arm Identification

TL;DR

Abstract

Optimal Batched Best Arm Identification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (23)