Table of Contents
Fetching ...

Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules

Michael Lingzhi Li, Kosuke Imai

TL;DR

This work extends Neyman's repeated sampling framework to the evaluation of individualized treatment rules (ITRs) derived from causal machine learning. It introduces population-level metrics, PAV and PAPE, and shows they can be unbiasedly estimated under minimal assumptions, even when ITRs depend on training data or cross-fitting is used. The authors compare ex-post and ex-ante evaluation designs, proving that ex-post can be more efficient in many settings, and reveal a lack of invariance to outcome shifts in the evaluation metrics. A numerical study using ACIC-2016 data demonstrates the practical implications and supports the theoretical findings, while the framework is extended to cross-fitting to handle training uncertainty. This work preserves Neyman’s relevance for modern causal inference and policy evaluation with complex ML-derived decision rules.

Abstract

A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception.

Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules

TL;DR

This work extends Neyman's repeated sampling framework to the evaluation of individualized treatment rules (ITRs) derived from causal machine learning. It introduces population-level metrics, PAV and PAPE, and shows they can be unbiasedly estimated under minimal assumptions, even when ITRs depend on training data or cross-fitting is used. The authors compare ex-post and ex-ante evaluation designs, proving that ex-post can be more efficient in many settings, and reveal a lack of invariance to outcome shifts in the evaluation metrics. A numerical study using ACIC-2016 data demonstrates the practical implications and supports the theoretical findings, while the framework is extended to cross-fitting to handle training uncertainty. This work preserves Neyman’s relevance for modern causal inference and policy evaluation with complex ML-derived decision rules.

Abstract

A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception.
Paper Structure (21 sections, 6 theorems, 70 equations, 2 figures)

This paper contains 21 sections, 6 theorems, 70 equations, 2 figures.

Key Result

Theorem 1

Under Assumptions asm:SUTVA, asm:comrand, and asm:randomsample, the expectation and variance of the PAV estimator defined equation eq:PAVest are given by, where $S_{ft}^2 = \sum_{i=1}^n (Y_{fi}(t) - \overline{Y_f(t)})^2/(n-1)$ with $Y_{fi}(t) = \mathbf{1}\{f(\bm{X}_i)=t\}Y_i(t)$, and $\overline{Y_f(t)} = \sum_{i=1}^n Y_{fi}(t)/n$ for $t=\{0,1\}$.

Figures (2)

  • Figure 1: Illustration of PAPE for two different ITRs $f$ and $g$. Here the $x$ axis is the proportion of individuals treated, and $y$ axis is PAV. The PAV of $f$ is higher than PAV of $g$, but ITR $g$ has a positive PAPE $\tau_g$ and the ITR $f$ has a negative PAPE $\tau_f$.
  • Figure 2: Numerical Experiments

Theorems & Definitions (7)

  • Theorem 1: Unbiasedness and Variance of the PAV Estimator imai2021experimental
  • Theorem 2: Unbiasedness and Variance of the PAPE Estimator imai2021experimental
  • Proposition 1: Minimum Variance Estimators
  • Theorem 3: Unbiasedness and Variance of the Ex-ante PAPE Estimator
  • Theorem 4
  • Lemma 1: Expectation and Variance of the Intermediate Estimator
  • proof