Table of Contents
Fetching ...

Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice

Ryan Kessler, James McQueen, Miikka Rokkanen

Abstract

We develop a theoretical framework for sample splitting in A/B testing environments, where data for each test are partitioned into two splits to measure methodological performance when the true impacts of tests are unobserved. We show that sample-split estimators are generally biased for full-sample performance but consistently estimate sample-split analogues of it. We derive their asymptotic distributions, construct valid confidence intervals, and characterize the bias-variance trade-offs underlying sample-split design choices. We validate our theoretical results through simulations and provide implementation guidance for A/B testing products seeking to evaluate new estimators and decision rules.

Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice

Abstract

We develop a theoretical framework for sample splitting in A/B testing environments, where data for each test are partitioned into two splits to measure methodological performance when the true impacts of tests are unobserved. We show that sample-split estimators are generally biased for full-sample performance but consistently estimate sample-split analogues of it. We derive their asymptotic distributions, construct valid confidence intervals, and characterize the bias-variance trade-offs underlying sample-split design choices. We validate our theoretical results through simulations and provide implementation guidance for A/B testing products seeking to evaluate new estimators and decision rules.

Paper Structure

This paper contains 16 sections, 37 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Bias and coverage of 95 percent confidence intervals at different $(\tau^2, \sigma^2)$
  • Figure 2: Bias-variance tradeoff underlying choice of split fraction $\alpha$
  • Figure 3: Variance as a function of the number of tests $I$ and number of partitions $S$
  • Figure 4: Bias-variance tradeoff underlying choice of split fraction $\alpha$ (robustness)
  • Figure 5: Variance as a function of the number of tests $I$ and number of partitions $S$ (robustness)