Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

Seohwa Hwang; Junyong Park

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

Seohwa Hwang, Junyong Park

TL;DR

The paper tackles Best Arm Identification under unknown or fixed budgets, focusing on the data-poor regime where not all arms can be evaluated. It introduces Box Thirding (B3), a fully anytime algorithm that builds a hierarchical box structure and uses iterative ternary comparisons to promote strong candidates, defer uncertain ones, and discard weak arms, while reusing past samples to refine decisions. The authors derive a decomposition of the misidentification probability into non-inclusion and within-set misidentification, establishing sharp, data-poor–condition–aware bounds, and show that B3 matches or improves upon existing anytime BAI methods by maximizing screening capacity and achieving fast error decay. Empirical results on the NYCCC dataset demonstrate B3’s robust performance across high, moderate, and deterministic noise regimes, highlighting its practical value for large-scale, budget-constrained BAI problems. The work provides theoretical and empirical support for an algorithm that balances screening and discrimination without budget knowledge, with potential extensions to sample reuse and non-data-poor scenarios that could further narrow the gap to fixed-budget methods.

Abstract

We introduce Box Thirding (B3), a flexible and efficient algorithm for Best Arm Identification (BAI) under fixed-budget constraints. It is designed for both anytime BAI and scenarios with large N, where the number of arms is too large for exhaustive evaluation within a limited budget T. The algorithm employs an iterative ternary comparison: in each iteration, three arms are compared--the best-performing arm is explored further, the median is deferred for future comparisons, and the weakest is discarded. Even without prior knowledge of T, B3 achieves an epsilon-best arm misidentification probability comparable to Successive Halving (SH), which requires T as a predefined parameter, applied to a randomly selected subset of c0 arms that fit within the budget. Empirical results show that B3 outperforms existing methods under limited-budget constraints in terms of simple regret, as demonstrated on the New Yorker Cartoon Caption Contest dataset.

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

TL;DR

Abstract

Paper Structure (45 sections, 21 theorems, 127 equations, 5 figures, 2 tables, 5 algorithms)

This paper contains 45 sections, 21 theorems, 127 equations, 5 figures, 2 tables, 5 algorithms.

Introduction
Preliminaries
Setup and Notations
Related Work
Theoretical Results of BAI
Comparison of key BAI Algorithms in Fixed Budget Setting
Strategies for algorithms under anytime setting/data-poor regime
The Box Thirding Algorithm
Box Operations and Main Algorithm
Justification of the Box Thirding Algorithm
Theoretical Analysis
Candidate Set and Data-poor Condition
Main Result
Non-Inclusion Probability of the $\epsilon/2$-Best Arm
Misidentification Probability of the $\epsilon/2$-Best Arm Within Set $C$
...and 30 more sections

Key Result

Theorem 4.3

Under the data-poor condition for $\epsilon$, the B3 algorithm satisfies the following upper bound: where $N_{\epsilon/2}$ denotes the number of $\epsilon/2$-best arms.

Figures (5)

Figure 1: Toy example of remedian estimation: partition the data into three blocks and take their within-block medians $(2.8,\,5.3,\,4.8)$; taking the median of these medians yields $4.8$. Repeating this hierarchical “median-of-medians” construction produces an estimator that converges in probability to the population median.
Figure 2: Illustration of ARRANGE_BOX($l,j;D$) when $\hat{\mu}_{i_1} > \hat{\mu}_{i_2} > \hat{\mu}_{i_3}$. The DISCARD operation is omitted for clarity.
Figure 3: Fraction of arms that are lifted, shifted, and discarded at a fixed level $l$.
Figure 4: Simulation results on the NYCCC 893 dataset under three reward noise regimes. Curves indicate mean performance and shaded regions denote the 25%--75% quantile range.
Figure 5: Simulation results on the NYCCC 893 dataset under different reward distributions. Curves indicate the mean performance, and shaded regions correspond to the 25%--75% quantile range.

Theorems & Definitions (40)

Definition 4.1: Candidate Set $C$
Definition 4.2: Data-Poor Condition
Theorem 4.3
Corollary 4.4: Simple Regret
Corollary 4.5: ($\epsilon, \delta$)-Sample Complexity
Remark 4.6
Theorem 4.7
Proposition 4.8
Corollary 4.9
Theorem 4.10
...and 30 more

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

TL;DR

Abstract

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (40)