Table of Contents
Fetching ...

Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power

Daniel Molitor, Samantha Gold

TL;DR

This work proposes MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision and introduces MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference.

Abstract

Adaptive experiments such as multi-armed bandits offer efficiency gains over traditional randomized experiments but pose two major challenges: invalid inference on the Average Treatment Effect (ATE) due to adaptive sampling and low statistical power for sub-optimal treatments. We address both issues by extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision. Second, we introduce MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference. Both methods retain MAD's core advantage of constructing asymptotic confidence sequences (CSs) that allow researchers to continuously monitor ATE estimates and stop data collection once a desired precision or significance criterion is met. Empirically, we validate both methods using simulations and real-world data. In simulations, MADCovar reduces CS width by up to 60% relative to MAD. In a large-scale political RCT with 32,000 participants, MADCovar achieves similar precision gains. MADMod improves statistical power and inferential precision across all treatment arms, particularly for suboptimal treatments. Simulations show that MADMod sharply reduces Type II error while preserving the efficiency benefits of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments more practical, reliable, and efficient for applied researchers across many domains. Our proposed methods are implemented through an open-source software package.

Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power

TL;DR

This work proposes MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision and introduces MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference.

Abstract

Adaptive experiments such as multi-armed bandits offer efficiency gains over traditional randomized experiments but pose two major challenges: invalid inference on the Average Treatment Effect (ATE) due to adaptive sampling and low statistical power for sub-optimal treatments. We address both issues by extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision. Second, we introduce MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference. Both methods retain MAD's core advantage of constructing asymptotic confidence sequences (CSs) that allow researchers to continuously monitor ATE estimates and stop data collection once a desired precision or significance criterion is met. Empirically, we validate both methods using simulations and real-world data. In simulations, MADCovar reduces CS width by up to 60% relative to MAD. In a large-scale political RCT with 32,000 participants, MADCovar achieves similar precision gains. MADMod improves statistical power and inferential precision across all treatment arms, particularly for suboptimal treatments. Simulations show that MADMod sharply reduces Type II error while preserving the efficiency benefits of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments more practical, reliable, and efficient for applied researchers across many domains. Our proposed methods are implemented through an open-source software package.

Paper Structure

This paper contains 29 sections, 1 theorem, 39 equations, 5 figures, 1 table.

Key Result

Theorem 1

Let $(\hat{\tau}_t)_{t=1}^\infty$ be the sequence of random variables where $W_t=w$ with probability $p_t^\text{MAD}(w)$, as in Definition def:mad_probability, with respect to some treatment assignment policy $\mathcal{A}$. Let Under Assumptions 1 and 2 of Liang and Bojinov, and Assumption ass:bounded_models, $(\hat{\bar{\tau}}_t \pm \hat{V}_t)$ is a valid $(1-\alpha)$ asymptotic CS for $\bar{\ta

Figures (5)

  • Figure 1: MAD vs. MADCovar: comparison of 95% confidence sequence widths across 10,000 units in a setting with moderate covariate signal.
  • Figure 2: Comparing the mean CS width across 100 simulations for MAD vs. MADCovar over varying covariate signal strength values and number of irrelevant covariates.
  • Figure 3: Comparing ATE estimates and 95% CS width for MADCovar vs. MAD relative to the ground truth (RCT ATE estimates) after running a simulated experiment with 32,000 participants.
  • Figure 4: Comparing the mean Type 2 error and confidence band width for the MAD vs. MADMod algorithms. Treatments 1 and 2 represent the sub-optimal treatment arms with harder-to-detect effect sizes. Treatments 3 and 4 represent the "best" treatment arms (i.e. the treatment arms that a MAB would choose to prioritize). The left panel displays mean Type II error and the right panel displays mean 95% CS width.
  • Figure 5: Comparing the mean sample size and sample size percentage ($\pm 95\%$ confidence intervals) assigned to each treatment arm under the MAD vs. MADMod algorithms. Sample size is equal to the total number of participants within each treatment arm. Sample size % is equal to the total number of participants assigned to a given treatment arm divided by the total number of participants in the entire experiment.

Theorems & Definitions (4)

  • Definition 1
  • Theorem 1
  • proof
  • proof