Minimizing Type 2 Errors in an Experiment-Rich Regime via Optimal Resource Allocation

Fenghua Yang; Dae Woong Ham; Stefanus Jasin

Minimizing Type 2 Errors in an Experiment-Rich Regime via Optimal Resource Allocation

Fenghua Yang, Dae Woong Ham, Stefanus Jasin

Abstract

Randomized experiments (often known as "A/B tests") are widely used to evaluate product and service innovations. We study how to allocate limited experimentation resources across M concurrent experiments in an experiment-rich regime. Existing work on allocation has predominantly focused on minimizing the worst-case mean squared error (MSE) of estimated treatment effects, which favors experiments with larger (and typically unknown) outcome variance. While appropriate for controlling estimation accuracy, this objective does not directly capture a common managerial priority in screening stages: detecting practically meaningful treatment effects with high probability. Motivated by this, we consider the objective of minimizing the worst-case Type II error across all experiments. When the standard deviations are known, we characterize the power-optimal allocation and show that MSE-based allocations can be highly inefficient for detection, even though the two objectives align asymptotically. When the standard deviations are unknown and must be learned from pilot data, we show that a naive plug-in approach, treating pilot standard deviations as truth, can suffer substantial power loss. We propose inflating pilot estimates via correction factors and develop three optimization-based frameworks for selecting them, each reflecting a different risk criterion with distinct managerial implications. Although the resulting stochastic programs are computationally challenging at scale, we derive tractable surrogate reformulations inspired by robust optimization and establish favorable theoretical properties. We further propose Surrogate-S, a fully data-dependent and implementable procedure that computes correction factors using only pilot variance estimates and achieves near-oracle performance in numerical experiments.

Minimizing Type 2 Errors in an Experiment-Rich Regime via Optimal Resource Allocation

Abstract

Paper Structure (36 sections, 13 theorems, 169 equations, 7 figures)

This paper contains 36 sections, 13 theorems, 169 equations, 7 figures.

Introduction
Literature Review
Large-Scale Experimentation and A/B Testing
Resource Allocation in Multi-Armed Bandits
Testing with Unknown Variance and Pilot Studies
Model
The case with known $\vec{\sigma}$
The case with unknown $\vec{\sigma}$
Known $\vec{\sigma}$
Optimal Allocation under POWER-OPT
Comparison with the MSE-Minimization Approach
Unknown $\vec{\sigma}$: Exact Analysis of a Two-Experiment Setting
Analysis of TOL and CONF
Analysis of EXP
Unknown $\vec{\sigma}$: Approximations for the General Case
...and 21 more sections

Key Result

Proposition 1

The optimal allocation under POWER-OPT is given by which equalizes the Type 2 error across all experiments, i.e., $\beta(\sigma_1, n_1^*(\vec{\sigma})) = \cdots = \beta(\sigma_M, n_M^*(\vec{\sigma})) = \beta^*(\vec{\sigma}),$ where the common (and optimal) Type 2 error is given by

Figures (7)

Figure 1: Comparison of worst-case Type 2 error under power-optimal and MSE-optimal allocations as a function of $N$. Parameters: $M = 50$, $\alpha = 0.05$, $\sigma_i$ and $\Delta_i$ are randomly generated from $[0.5,2]$ and $[0.01, 1]$, repeated for $R=1,000$ times.
Figure 2: Optimal inflation ratio $r^\star$ for the TOL and CONF objectives under varying difficulty ratios ($a_1/a_2$), confidence levels, and tolerances.
Figure 3: Optimal inflation ratio $r^*$ under EXP$(N=200)$ as a function of difficulty ratio $(a_1/a_2)$ for varying pilot sizes $\epsilon$. The deviation from $r^*=1$ is most pronounced for small $\epsilon$.
Figure 4: End-to-End Process Flow of the Surrogate-$S$ Method. The procedure transforms raw pilot data into final sample size allocations through a sequence of convex optimization and deterministic mappings.
Figure 5: Distribution of Maximum Type 2 Error Relative to $\beta^*(\vec{\sigma})$ for the R-TOL objective ($\gamma = 0.7$). The vertical dotted lines mark the 70th percentile cutoffs. The curves compare the Naive Plug-in (Blue) with no correction factor, the Oracle Surrogate-$\sigma$ (Orange), and our proposed Surrogate-$S$ (Green).
...and 2 more figures

Theorems & Definitions (15)

Proposition 1
Remark 1
Lemma 1
Proposition 2
Corollary 1
Corollary 2
Proposition 3
Lemma 2
Lemma 3
Lemma 4
...and 5 more

Minimizing Type 2 Errors in an Experiment-Rich Regime via Optimal Resource Allocation

Abstract

Minimizing Type 2 Errors in an Experiment-Rich Regime via Optimal Resource Allocation

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (15)