Table of Contents
Fetching ...

Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem

Joonhwi Joo

TL;DR

This work develops a frequentist minimax-regret framework for selecting the best arm in one-shot, multi-arm RCTs, deriving an optimal MMR rule for general multivariate location-family rewards and specializing to MVN rewards to yield a practical plug-in AMMR rule based on estimated means and covariances. It shows a sharp distinction between two-arm and multi-arm designs: with two arms, the empirical zero-threshold rule remains MMR-optimal regardless of variances, while with three or more arms and heteroskedasticity the MMR boundaries become nonlinear and penalize high-variance arms, requiring stronger evidence to choose them. The analysis uses a dual maximin formulation, proves the existence and structure of nature’s least-favorable prior (supported on exactly one point in each Θ^i region), and establishes that the corresponding Bayes rule is unique and minimax; it then provides a local asymptotic justification for plug-in MVN rules when the mean-estimator is locally uniformly normal and covariances are estimated. Numerically, the paper computes the least-favorable priors and visualizes decision boundaries for two- and three-arm MVN experiments, showing how variance structure shapes optimal boundaries and prior masses, and offering a practical framework for comparing multiple policies in a unified decision-theoretic setting.

Abstract

We develop a frequentist decision-theoretic framework for selecting the best arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach characterizes the minimax-regret (MMR) optimal decision rule for any multivariate location family reward distribution with full support. We show that the MMR rule is deterministic, unique, and computationally tractable. We then specialize to the case of multivariate normal (MVN) rewards with an arbitrary covariance matrix, and establish the local asymptotic minimaxity of a plug-in version of the rule when only estimated means and covariances are available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix estimate directly into decision boundaries, allowing straightforward implementation in practice. Our analysis highlights a sharp contrast between two-arm and multi-arm designs. With two arms, the "pick-the-winner" empirical success rule remains MMR-optimal, regardless of the arm-specific variances. By contrast, with three or more arms and heterogeneous variances, the empirical success rule is no longer optimal: the MMR decision boundaries become nonlinear and systematically penalize high-variance arms, requiring stronger evidence to select them. Our multi-arm AMMR framework offers a rigorous foundation that leads to practical criteria for comparing multiple policies simultaneously.

Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem

TL;DR

This work develops a frequentist minimax-regret framework for selecting the best arm in one-shot, multi-arm RCTs, deriving an optimal MMR rule for general multivariate location-family rewards and specializing to MVN rewards to yield a practical plug-in AMMR rule based on estimated means and covariances. It shows a sharp distinction between two-arm and multi-arm designs: with two arms, the empirical zero-threshold rule remains MMR-optimal regardless of variances, while with three or more arms and heteroskedasticity the MMR boundaries become nonlinear and penalize high-variance arms, requiring stronger evidence to choose them. The analysis uses a dual maximin formulation, proves the existence and structure of nature’s least-favorable prior (supported on exactly one point in each Θ^i region), and establishes that the corresponding Bayes rule is unique and minimax; it then provides a local asymptotic justification for plug-in MVN rules when the mean-estimator is locally uniformly normal and covariances are estimated. Numerically, the paper computes the least-favorable priors and visualizes decision boundaries for two- and three-arm MVN experiments, showing how variance structure shapes optimal boundaries and prior masses, and offering a practical framework for comparing multiple policies in a unified decision-theoretic setting.

Abstract

We develop a frequentist decision-theoretic framework for selecting the best arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach characterizes the minimax-regret (MMR) optimal decision rule for any multivariate location family reward distribution with full support. We show that the MMR rule is deterministic, unique, and computationally tractable. We then specialize to the case of multivariate normal (MVN) rewards with an arbitrary covariance matrix, and establish the local asymptotic minimaxity of a plug-in version of the rule when only estimated means and covariances are available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix estimate directly into decision boundaries, allowing straightforward implementation in practice. Our analysis highlights a sharp contrast between two-arm and multi-arm designs. With two arms, the "pick-the-winner" empirical success rule remains MMR-optimal, regardless of the arm-specific variances. By contrast, with three or more arms and heterogeneous variances, the empirical success rule is no longer optimal: the MMR decision boundaries become nonlinear and systematically penalize high-variance arms, requiring stronger evidence to select them. Our multi-arm AMMR framework offers a rigorous foundation that leads to practical criteria for comparing multiple policies simultaneously.

Paper Structure

This paper contains 67 sections, 17 theorems, 114 equations, 3 figures, 2 tables.

Key Result

Theorem 2.1

The following hold for the selection problem: (i) where $\sup_{\tilde{\pi}}$ is over all Borel probability measures on $\Theta$. (ii) There exists a least-favorable prior $\pi$ supported on at most $J$ distinct support points of $\Theta$. (iii) Suppose $\bm{\delta}$ is a Bayes rule with respect to a least-favorable prior $\pi$, i.e., $\bm{\delta}=\ (iv) Consider the least-favorable prior $\pi$ su

Figures (3)

  • Figure 1:
  • Figure 2: Three-arm decision boundaries for $\bm{\Sigma}_{\text{sym}}$
  • Figure 3: Three-arm decision boundaries for $\bm{\Sigma}_{\text{asym}}$

Theorems & Definitions (31)

  • Theorem 2.1: Minimax Theorem for the Best-Population Selection Problem
  • proof
  • Lemma 2.1: Pointwise Bayes Action for Any Finite Prior
  • proof
  • Theorem 2.2: Support Point Characterization of the Least-Favorable Prior
  • proof
  • Theorem 2.3: DM's MMR-Risk Strategy
  • proof
  • Corollary
  • Proposition 3.1: Scaling of the MVN Selection Problem
  • ...and 21 more