Table of Contents
Fetching ...

Adaptive Neyman Allocation

Jinglong Zhao

TL;DR

This work develops a competitive-analysis framework for adaptive Neyman allocation in multi-stage experiments with unknown variances. It introduces simple, batched strategies (two-stage and multi-stage) that adaptively estimate standard deviations from earlier stages and allocate units accordingly, achieving near-optimal second-order efficiency as the number of stages grows. Theoretical results include high-probability and in-expectation competitive guarantees and an information-theoretic lower bound, along with valid estimation and inference for adaptively collected data under stability conditions. Empirical validation on online A/B testing data and synthetic simulations demonstrates meaningful variance reductions and reliable inference, providing practical guidance for designing efficient multi-stage experiments in domains with heterogeneous treatment effects. This approach offers a principled path to allocate experimental units across treated and control groups when variance heterogeneity is present and only observed data can inform future allocations.

Abstract

In the experimental design literature, Neyman allocation refers to the practice of allocating units into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multi-stage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multi-stage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the information-theoretic limit of conducting experiments. We provide theory for estimation and inference using data collected from our adaptive Neyman allocation algorithm. We demonstrate the effectiveness of our adaptive Neyman allocation algorithm using both online A/B testing data from a social media site and synthetic data.

Adaptive Neyman Allocation

TL;DR

This work develops a competitive-analysis framework for adaptive Neyman allocation in multi-stage experiments with unknown variances. It introduces simple, batched strategies (two-stage and multi-stage) that adaptively estimate standard deviations from earlier stages and allocate units accordingly, achieving near-optimal second-order efficiency as the number of stages grows. Theoretical results include high-probability and in-expectation competitive guarantees and an information-theoretic lower bound, along with valid estimation and inference for adaptively collected data under stability conditions. Empirical validation on online A/B testing data and synthetic simulations demonstrates meaningful variance reductions and reliable inference, providing practical guidance for designing efficient multi-stage experiments in domains with heterogeneous treatment effects. This approach offers a principled path to allocate experimental units across treated and control groups when variance heterogeneity is present and only observed data can inform future allocations.

Abstract

In the experimental design literature, Neyman allocation refers to the practice of allocating units into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multi-stage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multi-stage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the information-theoretic limit of conducting experiments. We provide theory for estimation and inference using data collected from our adaptive Neyman allocation algorithm. We demonstrate the effectiveness of our adaptive Neyman allocation algorithm using both online A/B testing data from a social media site and synthetic data.
Paper Structure (61 sections, 36 theorems, 604 equations, 13 figures, 5 tables, 3 algorithms)

This paper contains 61 sections, 36 theorems, 604 equations, 13 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

The optimal solution to is given by $T(1) = T(0) = T / 2$. The supremum of the inner optimization problem is achieved when either the treated group or the control group has zero variance, that is, $\sigma(1) = 0$ or $\sigma(0) = 0$.

Figures (13)

  • Figure 1: Distributions of the number of clicks per million impressions at a social media site AB_testing_kaggle
  • Figure 2: Competitive ratios with respect to different numbers of stages
  • Figure 3: Simulated variances of experiments under different numbers of stages
  • Figure 4: Simulated distributions of experiments under different numbers of stages
  • Figure 5: Normalized mean squared error with respect to sample size when $\sigma(1) / \sigma(0)=5$
  • ...and 8 more figures

Theorems & Definitions (37)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Example 1: Symmetric Distribution Implies No Conditioning Bias
  • Theorem 5: Finite Sample Unbiasedness
  • Theorem 6: Asymptotic Normality
  • Proposition 1: Sample Variance Estimator Consistency
  • Corollary 1
  • Corollary 2
  • ...and 27 more