Prediction-powered Generalization of Causal Inferences

Ilker Demirel; Ahmed Alaa; Anthony Philippakis; David Sontag

Prediction-powered Generalization of Causal Inferences

Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag

TL;DR

This work tackles the external validity problem of generalizing causal effects from randomized trials to target populations with different covariate distributions. It introduces prediction-powered estimators that fuse trial data with predictive models trained on observational data, without imposing strong OS assumptions, and derives theoretical MSE insights to explain when these methods help. Two main approaches are proposed: additive bias correction (ABC), which learns a bias function from the trial to polish OS predictions, and augmented outcome modeling (AOM), which incorporates OS-based predictors as covariates or representations; both can be implemented with regression-based estimators or doubly-robust schemes. Synthetic experiments across thousands of DGPs show that the OS-augmented methods improve generalization when OS quality is high and remain robust when OS is biased or confounded, offering a practical path to more reliable generalization in medicine and related fields.

Abstract

Causal inferences from a randomized controlled trial (RCT) may not pertain to a target population where some effect modifiers have a different distribution. Prior work studies generalizing the results of a trial to a target population with no outcome but covariate data available. We show how the limited size of trials makes generalization a statistically infeasible task, as it requires estimating complex nuisance functions. We develop generalization algorithms that supplement the trial data with a prediction model learned from an additional observational study (OS), without making any assumptions on the OS. We theoretically and empirically show that our methods facilitate better generalization when the OS is high-quality, and remain robust when it is not, and e.g., have unmeasured confounding.

Prediction-powered Generalization of Causal Inferences

TL;DR

Abstract

Paper Structure (39 sections, 11 theorems, 66 equations, 14 figures, 1 table, 2 algorithms)

This paper contains 39 sections, 11 theorems, 66 equations, 14 figures, 1 table, 2 algorithms.

Introduction
Our Contributions
Related Work
Background
Notation and Objective
Assumptions for Causal Inference
Generalization Using Experimental Data
Prediction-powered Generalization Using Experimental and Observational Data
Additive Bias Correction to Predictive Model
Identification
Regression Function-based Estimation
Case Study: Polynomial Ridge Regression
Augmented Outcome Modeling
Identification
Regression Function-based Estimation
...and 24 more sections

Key Result

Proposition 3.0

Let $X$ be a categorical covariate stratifying the population into $K$ groups and denote by $n_{s=1,a,k}$ the number of trial participants from group $X=k$ assigned to treatment $A=a$, and by $\sigma^2_{a,k}$ the variance of outcome among such patients. Let us estimate the outcome function $g_a (X=k where $p_{s=0} (k) \coloneqq P(X=k \mid S=0)$ is the proportion of patients from group $X=k$ in the

Figures (14)

Figure 1: Age influences both selection into the trial and the outcome, inducing confounding bias between the population-level mean potential outcomes in the trial and target populations.
Figure 2: A biased predictor $f_a (X)$ can still capture higher order polynomials, making its bias $b_a (X)$ "easier" to learn than $g_a (X)$.
Figure 3: Data-generating process used in simulated experiments. ( Left.) $X$ (observed) induces confounding by trial participation. ( Right.) In the observational study, there is hidden confounding for treatment assignment due to $U$ (unobserved).
Figure 4: 100 different set of data-generating functions are sampled for each $(l_x^{\textnormal{FOM}_1}, \alpha_u^{\textnormal{PA}}, n_1)$. We plot the RMSE averaged over 100 scenarios. Results are reported for four different numbers of polynomial features used to fit the underlying regression functions (if any).
Figure 5: Convention same as \ref{['fig:synthetic-rmse']}. The observational predictor is not trained on any data but generates i.i.d. noise for all $X$.
...and 9 more figures

Theorems & Definitions (16)

Proposition 3.0
Theorem 3.1
Lemma 4.0
Theorem 4.1
Lemma 4.1: Adopted from wainwright2019high
Lemma 4.1
Proposition 1.0
proof
Theorem 1.1
proof
...and 6 more

Prediction-powered Generalization of Causal Inferences

TL;DR

Abstract

Prediction-powered Generalization of Causal Inferences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (16)