Demonstration Experiments

Guido Imbens; Lorenzo Masoero; Alexander Rakhlin; Thomas S. Richardson; Suhas Vijaykumar

Demonstration Experiments

Guido Imbens, Lorenzo Masoero, Alexander Rakhlin, Thomas S. Richardson, Suhas Vijaykumar

Abstract

Adaptive experiments are used extensively in online platforms, healthcare and biotechnology, and a variety of other settings. In many of these applications, the main goal is not to precisely estimate a treatment effect, but to demonstrate that at least one candidate intervention yields a positive effect, for some subpopulation, on some measured outcome. We formalize this objective in a multi-armed bandit framework and develop inference procedures for testing whether any arm's mean exceeds a given threshold under fully adaptive sampling: one which pools information across promising arms, and one which corresponds to time-uniform multiple inference on the means of individual arms. To support the latter, we establish a moderate deviations principle for the sequential t-statistic, justifying anytime-valid testing of a large number of hypotheses concurrently. To illustrate how adaptive design can target the proposed statistics, we recast experimental design as bandit optimization where an arm's reward corresponds to its signal-to-noise ratio, and analyze an adaptive allocation rule for which we establish a logarithmic regret bound.

Demonstration Experiments

Abstract

Paper Structure (19 sections, 7 theorems, 23 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 7 theorems, 23 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Contributions and related work
Anytime-Valid Inference and Game-Theoretic Statistics
Statistical Inference for Multi-Armed Bandits
Outline of the paper
Setup and Notation
Statistics that are robust to strategic sampling
Pooled Testing
Feasible statistics and regularized variance estimates
Max Statistic
Strategic sampling algorithms and power
The SN-UCB Algorithm
Simulations
Simulation Design
Type I Error
...and 4 more sections

Key Result

Lemma 1

Under the null hypothesis $\mathcal{H}_0$, for any $v \le t$ it holds so that $H_t$ is a supermartingale adapted to $\mathcal{F}_t$.

Figures (2)

Figure 1: Power curves under the multi-scale alternative ($\mu_g = \delta g$, $\sigma_g^2 = g^3$) with $k=10$ arms and horizon $T=200$, and $\delta \in [0,1]$. Each panel compares three adaptive sampling strategies: SN-UCB (red), standard UCB (blue), and Thompson sampling (green). These are contrasted with two baselines: an oracle that chooses the arm with highest SNR and performs a standard $t$-test (black), and uniform allocation strategy that performs a $t$-test for each arm and adjusts for multiplicity (gray).
Figure 2: Power curves under the single-spike alternative ($\mu_1 = \delta$, $\mu_g = 0$ for $g > 1$, $\sigma_g = 1$) with $k=10$ arms and horizon $T=200$. Each panel compares three adaptive sampling strategies: SN-UCB (red), standard UCB (blue), and Thompson sampling (green). These are contrasted with two baselines: an oracle that chooses the arm with highest SNR and performs a standard $t$-test (black), and uniform allocation strategy that performs a $t$-test for each arm and adjusts for multiplicity (gray). In this setting, UCB and Thompson sampling outperform SN-UCB, particularly for the pooled statistic, as they more aggressively concentrate samples on the single active arm; all methods outperform uniform allocation.

Theorems & Definitions (12)

Definition 1
Lemma 1
Theorem 1: CLT for padding-regularized pooled statistic
Theorem 2: CLT for threshold-regularized pooled statistic
Corollary 1: Asymptotic validity of pooled testing
Remark 1: Comparison of regularization strategies
Lemma 2: robbins1970boundary
Theorem 3
Remark 2
Remark 3: Conservativeness
...and 2 more

Demonstration Experiments

Abstract

Demonstration Experiments

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)