Adaptive Neyman Allocation
Jinglong Zhao
TL;DR
This work develops a competitive-analysis framework for adaptive Neyman allocation in multi-stage experiments with unknown variances. It introduces simple, batched strategies (two-stage and multi-stage) that adaptively estimate standard deviations from earlier stages and allocate units accordingly, achieving near-optimal second-order efficiency as the number of stages grows. Theoretical results include high-probability and in-expectation competitive guarantees and an information-theoretic lower bound, along with valid estimation and inference for adaptively collected data under stability conditions. Empirical validation on online A/B testing data and synthetic simulations demonstrates meaningful variance reductions and reliable inference, providing practical guidance for designing efficient multi-stage experiments in domains with heterogeneous treatment effects. This approach offers a principled path to allocate experimental units across treated and control groups when variance heterogeneity is present and only observed data can inform future allocations.
Abstract
In the experimental design literature, Neyman allocation refers to the practice of allocating units into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multi-stage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multi-stage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the information-theoretic limit of conducting experiments. We provide theory for estimation and inference using data collected from our adaptive Neyman allocation algorithm. We demonstrate the effectiveness of our adaptive Neyman allocation algorithm using both online A/B testing data from a social media site and synthetic data.
