Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem
Joonhwi Joo
TL;DR
This work develops a frequentist minimax-regret framework for selecting the best arm in one-shot, multi-arm RCTs, deriving an optimal MMR rule for general multivariate location-family rewards and specializing to MVN rewards to yield a practical plug-in AMMR rule based on estimated means and covariances. It shows a sharp distinction between two-arm and multi-arm designs: with two arms, the empirical zero-threshold rule remains MMR-optimal regardless of variances, while with three or more arms and heteroskedasticity the MMR boundaries become nonlinear and penalize high-variance arms, requiring stronger evidence to choose them. The analysis uses a dual maximin formulation, proves the existence and structure of nature’s least-favorable prior (supported on exactly one point in each Θ^i region), and establishes that the corresponding Bayes rule is unique and minimax; it then provides a local asymptotic justification for plug-in MVN rules when the mean-estimator is locally uniformly normal and covariances are estimated. Numerically, the paper computes the least-favorable priors and visualizes decision boundaries for two- and three-arm MVN experiments, showing how variance structure shapes optimal boundaries and prior masses, and offering a practical framework for comparing multiple policies in a unified decision-theoretic setting.
Abstract
We develop a frequentist decision-theoretic framework for selecting the best arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach characterizes the minimax-regret (MMR) optimal decision rule for any multivariate location family reward distribution with full support. We show that the MMR rule is deterministic, unique, and computationally tractable. We then specialize to the case of multivariate normal (MVN) rewards with an arbitrary covariance matrix, and establish the local asymptotic minimaxity of a plug-in version of the rule when only estimated means and covariances are available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix estimate directly into decision boundaries, allowing straightforward implementation in practice. Our analysis highlights a sharp contrast between two-arm and multi-arm designs. With two arms, the "pick-the-winner" empirical success rule remains MMR-optimal, regardless of the arm-specific variances. By contrast, with three or more arms and heterogeneous variances, the empirical success rule is no longer optimal: the MMR decision boundaries become nonlinear and systematically penalize high-variance arms, requiring stronger evidence to select them. Our multi-arm AMMR framework offers a rigorous foundation that leads to practical criteria for comparing multiple policies simultaneously.
