Table of Contents
Fetching ...

Prediction-Guided Active Experiments

Ruicheng Ao, Hongyu Chen, David Simchi-Levi

TL;DR

This work introduces a new framework for active experimentation, the Prediction-Guided Active Experiment (PGAE), which leverages predictions from an existing machine learning model to guide sampling and experimentation and underscores the PGAE framework's effectiveness and superiority compared to other existing methods.

Abstract

In this work, we introduce a new framework for active experimentation, the Prediction-Guided Active Experiment (PGAE), which leverages predictions from an existing machine learning model to guide sampling and experimentation. Specifically, at each time step, an experimental unit is sampled according to a designated sampling distribution, and the actual outcome is observed based on an experimental probability. Otherwise, only a prediction for the outcome is available. We begin by analyzing the non-adaptive case, where full information on the joint distribution of the predictor and the actual outcome is assumed. For this scenario, we derive an optimal experimentation strategy by minimizing the semi-parametric efficiency bound for the class of regular estimators. We then introduce an estimator that meets this efficiency bound, achieving asymptotic optimality. Next, we move to the adaptive case, where the predictor is continuously updated with newly sampled data. We show that the adaptive version of the estimator remains efficient and attains the same semi-parametric bound under certain regularity assumptions. Finally, we validate PGAE's performance through simulations and a semi-synthetic experiment using data from the US Census Bureau. The results underscore the PGAE framework's effectiveness and superiority compared to other existing methods.

Prediction-Guided Active Experiments

TL;DR

This work introduces a new framework for active experimentation, the Prediction-Guided Active Experiment (PGAE), which leverages predictions from an existing machine learning model to guide sampling and experimentation and underscores the PGAE framework's effectiveness and superiority compared to other existing methods.

Abstract

In this work, we introduce a new framework for active experimentation, the Prediction-Guided Active Experiment (PGAE), which leverages predictions from an existing machine learning model to guide sampling and experimentation. Specifically, at each time step, an experimental unit is sampled according to a designated sampling distribution, and the actual outcome is observed based on an experimental probability. Otherwise, only a prediction for the outcome is available. We begin by analyzing the non-adaptive case, where full information on the joint distribution of the predictor and the actual outcome is assumed. For this scenario, we derive an optimal experimentation strategy by minimizing the semi-parametric efficiency bound for the class of regular estimators. We then introduce an estimator that meets this efficiency bound, achieving asymptotic optimality. Next, we move to the adaptive case, where the predictor is continuously updated with newly sampled data. We show that the adaptive version of the estimator remains efficient and attains the same semi-parametric bound under certain regularity assumptions. Finally, we validate PGAE's performance through simulations and a semi-synthetic experiment using data from the US Census Bureau. The results underscore the PGAE framework's effectiveness and superiority compared to other existing methods.

Paper Structure

This paper contains 23 sections, 9 theorems, 67 equations, 4 figures, 1 algorithm.

Key Result

Lemma 1

The semi-parametric efficient influence function for $\theta$ given $\pi(x)$ and $p(x)$ and the data generation process $Z=(X, F, \Delta, \Delta Y)$ is given by

Figures (4)

  • Figure 1: Simulation setup. Panel (a) is the distributional properties of $(X, F, Y)$, which includes the conditional variance $\operatorname{Var}(Y\mid X)$ and its decomposition according to $F$. Panel (b) is the optimal sampling density $p^*$. Panel (c) is the optimal experimental probability $\pi^*$ under this setup.
  • Figure 2: The averge mean square error of different estimators. Panel (a) plots the average MSE across different experimental proportion $\gamma$. Panel (b) plots the distribution of the result when $\gamma=0.4$.
  • Figure 3: Results for the census data using pre-trained model. Panel (a) demonstrates the average mean square error of the predictor from 200 independent trials. Panel (b) is the width of 95% confidence interval for three estimators. Panel (c) illustrates the coverage for three estimators.
  • Figure 4: Results for the census data using adaptively estimated model. Panel (a) demonstrates the average mean square error of the predictor from 1000 independent trials. Panel (b) is the width of 95% confidence interval for three estimators. Panel (c) illustrates the coverage for three estimators.

Theorems & Definitions (9)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma 2
  • Lemma 3: hamilton2020time Prop 7.9
  • Lemma 4: loeve1977elementary p.165
  • Lemma 5: hamilton2020time Prop 7.7