Table of Contents
Fetching ...

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

Biyonka Liang, Iavor Bojinov

TL;DR

The Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime-valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm, is introduced.

Abstract

Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo. As companies increasingly mandate that all changes undergo experimentation before widespread release, two challenges arise: (1) minimizing the proportion of customers assigned to the inferior treatment and (2) increasing experimentation velocity by enabling data-dependent stopping. This paper addresses both challenges by introducing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime-valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm. Intuitively, MAD "mixes" any bandit algorithm with a Bernoulli design, where at each time step, the probability of assigning a unit via the Bernoulli design is determined by a user-specified deterministic sequence that can converge to zero. This sequence lets managers directly control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime-valid and guaranteed to shrink around the true ATE. Hence, when the true ATE converges to a non-zero value, the MAD confidence sequence is guaranteed to exclude zero in finite time. Therefore, the MAD enables managers to stop experiments early while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

TL;DR

The Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime-valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm, is introduced.

Abstract

Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo. As companies increasingly mandate that all changes undergo experimentation before widespread release, two challenges arise: (1) minimizing the proportion of customers assigned to the inferior treatment and (2) increasing experimentation velocity by enabling data-dependent stopping. This paper addresses both challenges by introducing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime-valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm. Intuitively, MAD "mixes" any bandit algorithm with a Bernoulli design, where at each time step, the probability of assigning a unit via the Bernoulli design is determined by a user-specified deterministic sequence that can converge to zero. This sequence lets managers directly control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime-valid and guaranteed to shrink around the true ATE. Hence, when the true ATE converges to a non-zero value, the MAD confidence sequence is guaranteed to exclude zero in finite time. Therefore, the MAD enables managers to stop experiments early while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.
Paper Structure (30 sections, 14 theorems, 90 equations, 13 figures)

This paper contains 30 sections, 14 theorems, 90 equations, 13 figures.

Key Result

Theorem 1

Let $(\hat{\tau}_t)_{t=1}^\infty$ be the sequence of random variables where $W_t=w$ with probability $p_{t}^{\text{MAD}}(w)$, as in Definition mad_def, with respect to some treatment assignment algorithm $\mathcal{A}$. Let Then, under Assumptions a_bound and a_var2 and setting $\delta_t = \omega\left(\frac{1}{t^{1/4}}\right)$, $(\hat{\bar{{\tau}}}_{t} \pm \hat{V}_t)$ is a valid $(1-\alpha)$ asymp

Figures (13)

  • Figure 1: Empirical coverage, proportion stopped, time averaged reward, and width of the CS of Theorem \ref{['thrm1']} across $N=100$ random seeds for different experimental designs under a two-armed bandit setting with a Bernoulli outcome model using TS as the bandit algorithm; see Section \ref{['section:simulations']} for full description of the experimental setting and each metric. The dashed grey line represents $1-\alpha$. Error bands depict $\pm 2$ SEs.
  • Figure 2: All CSs generated from the experiment of Appendix \ref{['appendix:nonstationary_mad']} across $100$ random seeds (with transparency to show the overlaying of the CSs) with $(p_{0,t}, p_{1,t}) = (0.2, 0.8)$ when $t \leq 500$ and when $t > 500$, (a): $(p_{0,t}, p_{1,t}) =(0.2, 0.4)$ (ATE goes from $0.6$ to $0.1$) and (b): $(p_{0,t}, p_{1,t}) = (0.2, 0.1)$ (ATE goes from $0.6$ to $-0.1$).
  • Figure 3: Time-averaged reward of the experiment of Appendix \ref{['appendix:nonstationary_mad']} across $100$ random seeds (with transparency to show the overlaying of the CSs) with $(p_{0,t}, p_{1,t}) = (0.2, 0.8)$ when $t \leq 500$ and when $t > 500$, (a): $(p_{0,t}, p_{1,t}) =(0.2, 0.4)$ (ATE goes from $0.6$ to $0.1$) and (b): $(p_{0,t}, p_{1,t}) = (0.2, 0.1)$ (ATE goes from $0.6$ to $-0.1$). Error bars depict $\pm2$ standard errors.
  • Figure 4: Cumulative average reward generated from the experiment of Appendix \ref{['appendix:stopping_mad']} across $100$ random seeds with (a): $(p_{0}, p_{1}) =(0.2, 0.8)$ and (b): $(p_{0}, p_{1}) = (0.2, 0.3)$. Error bars depict $\pm 2$ standard errors. Note, as each experiment was run for a different time based on when the MAD for that random seed stopped, the standard errors for larger $t$ can be larger since there are fewer data points where the MAD still has not stopped at that $t$.
  • Figure 5: Histogram of the differences between the MAD and Bernoulli stopping times for the the experiment of Appendix \ref{['appendix:stopping_mad']} across $100$ random seeds with with (a): $(p_{0}, p_{1}) =(0.2, 0.8)$ and (b): $(p_{0}, p_{1}) = (0.2, 0.3)$.
  • ...and 8 more figures

Theorems & Definitions (27)

  • Definition 1: Confidence Sequence
  • Definition 2: Asymptotic Confidence Sequence
  • Definition 3: Average Treatment Effect (ATE) at time $t$
  • Definition 4: Mixture Adaptive Design (MAD)
  • Definition 5: Mixture Adaptive Design for Batched Assignment
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 3.1
  • Lemma A.1
  • ...and 17 more