Table of Contents
Fetching ...

Demonstration Experiments

Guido Imbens, Lorenzo Masoero, Alexander Rakhlin, Thomas S. Richardson, Suhas Vijaykumar

Abstract

Adaptive experiments are used extensively in online platforms, healthcare and biotechnology, and a variety of other settings. In many of these applications, the main goal is not to precisely estimate a treatment effect, but to demonstrate that at least one candidate intervention yields a positive effect, for some subpopulation, on some measured outcome. We formalize this objective in a multi-armed bandit framework and develop inference procedures for testing whether any arm's mean exceeds a given threshold under fully adaptive sampling: one which pools information across promising arms, and one which corresponds to time-uniform multiple inference on the means of individual arms. To support the latter, we establish a moderate deviations principle for the sequential t-statistic, justifying anytime-valid testing of a large number of hypotheses concurrently. To illustrate how adaptive design can target the proposed statistics, we recast experimental design as bandit optimization where an arm's reward corresponds to its signal-to-noise ratio, and analyze an adaptive allocation rule for which we establish a logarithmic regret bound.

Demonstration Experiments

Abstract

Adaptive experiments are used extensively in online platforms, healthcare and biotechnology, and a variety of other settings. In many of these applications, the main goal is not to precisely estimate a treatment effect, but to demonstrate that at least one candidate intervention yields a positive effect, for some subpopulation, on some measured outcome. We formalize this objective in a multi-armed bandit framework and develop inference procedures for testing whether any arm's mean exceeds a given threshold under fully adaptive sampling: one which pools information across promising arms, and one which corresponds to time-uniform multiple inference on the means of individual arms. To support the latter, we establish a moderate deviations principle for the sequential t-statistic, justifying anytime-valid testing of a large number of hypotheses concurrently. To illustrate how adaptive design can target the proposed statistics, we recast experimental design as bandit optimization where an arm's reward corresponds to its signal-to-noise ratio, and analyze an adaptive allocation rule for which we establish a logarithmic regret bound.
Paper Structure (19 sections, 7 theorems, 23 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 7 theorems, 23 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Under the null hypothesis $\mathcal{H}_0$, for any $v \le t$ it holds so that $H_t$ is a supermartingale adapted to $\mathcal{F}_t$.

Figures (2)

  • Figure 1: Power curves under the multi-scale alternative ($\mu_g = \delta g$, $\sigma_g^2 = g^3$) with $k=10$ arms and horizon $T=200$, and $\delta \in [0,1]$. Each panel compares three adaptive sampling strategies: SN-UCB (red), standard UCB (blue), and Thompson sampling (green). These are contrasted with two baselines: an oracle that chooses the arm with highest SNR and performs a standard $t$-test (black), and uniform allocation strategy that performs a $t$-test for each arm and adjusts for multiplicity (gray).
  • Figure 2: Power curves under the single-spike alternative ($\mu_1 = \delta$, $\mu_g = 0$ for $g > 1$, $\sigma_g = 1$) with $k=10$ arms and horizon $T=200$. Each panel compares three adaptive sampling strategies: SN-UCB (red), standard UCB (blue), and Thompson sampling (green). These are contrasted with two baselines: an oracle that chooses the arm with highest SNR and performs a standard $t$-test (black), and uniform allocation strategy that performs a $t$-test for each arm and adjusts for multiplicity (gray). In this setting, UCB and Thompson sampling outperform SN-UCB, particularly for the pooled statistic, as they more aggressively concentrate samples on the single active arm; all methods outperform uniform allocation.

Theorems & Definitions (12)

  • Definition 1
  • Lemma 1
  • Theorem 1: CLT for padding-regularized pooled statistic
  • Theorem 2: CLT for threshold-regularized pooled statistic
  • Corollary 1: Asymptotic validity of pooled testing
  • Remark 1: Comparison of regularization strategies
  • Lemma 2: robbins1970boundary
  • Theorem 3
  • Remark 2
  • Remark 3: Conservativeness
  • ...and 2 more