Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification
Masahiro Kato
TL;DR
This work addresses two-arm fixed-budget best-arm identification with unknown variances. It proposes a Neyman allocation, coupled with adaptive variance estimation and an AIPW estimator for final mean estimation, and proves that this approach attains a tight minimax optimality for the simple regret $Regret_P(\pi)$, with both lower and matching upper bounds: $\inf_{\pi}\liminf_{T\to\infty} \sqrt{T}\sup_{P\in\mathcal{P}_{\bm{\sigma}^2}} Regret_P(\pi) \ge \frac{1}{\sqrt{e}}(\sigma(1)+\sigma(2))$ and $\limsup_{T\to\infty} \sup_{P\in\mathcal{P}_{\bm{\sigma}^2}} \sqrt{T} Regret_P(\pi^{NA}) \le \frac{1}{\sqrt{e}}(\sigma(1)+\sigma(2))$. Notably, the results hold without locality assumptions and extend beyond Gaussian settings; in Bernoulli cases, Neyman coincides with uniform allocation. These findings advance the theory of minimax optimality in adaptive experimental design with unknown variances and non-Gaussian outcomes.
Abstract
This study investigates an asymptotically minimax optimal algorithm in the two-armed fixed-budget best-arm identification (BAI) problem. Given two treatment arms, the objective is to identify the arm with the highest expected outcome through an adaptive experiment. We focus on the Neyman allocation, where treatment arms are allocated following the ratio of their outcome standard deviations. Our primary contribution is to prove the minimax optimality of the Neyman allocation for the simple regret, defined as the difference between the expected outcomes of the true best arm and the estimated best arm. Specifically, we first derive a minimax lower bound for the expected simple regret, which characterizes the worst-case performance achievable under the location-shift distributions, including Gaussian distributions. We then show that the simple regret of the Neyman allocation asymptotically matches this lower bound, including the constant term, not just the rate in terms of the sample size, under the worst-case distribution. Notably, our optimality result holds without imposing locality restrictions on the distribution, such as the local asymptotic normality. Furthermore, we demonstrate that the Neyman allocation reduces to the uniform allocation, i.e., the standard randomized controlled trial, under Bernoulli distributions.
