Prediction-powered Generalization of Causal Inferences
Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag
TL;DR
This work tackles the external validity problem of generalizing causal effects from randomized trials to target populations with different covariate distributions. It introduces prediction-powered estimators that fuse trial data with predictive models trained on observational data, without imposing strong OS assumptions, and derives theoretical MSE insights to explain when these methods help. Two main approaches are proposed: additive bias correction (ABC), which learns a bias function from the trial to polish OS predictions, and augmented outcome modeling (AOM), which incorporates OS-based predictors as covariates or representations; both can be implemented with regression-based estimators or doubly-robust schemes. Synthetic experiments across thousands of DGPs show that the OS-augmented methods improve generalization when OS quality is high and remain robust when OS is biased or confounded, offering a practical path to more reliable generalization in medicine and related fields.
Abstract
Causal inferences from a randomized controlled trial (RCT) may not pertain to a target population where some effect modifiers have a different distribution. Prior work studies generalizing the results of a trial to a target population with no outcome but covariate data available. We show how the limited size of trials makes generalization a statistically infeasible task, as it requires estimating complex nuisance functions. We develop generalization algorithms that supplement the trial data with a prediction model learned from an additional observational study (OS), without making any assumptions on the OS. We theoretically and empirically show that our methods facilitate better generalization when the OS is high-quality, and remain robust when it is not, and e.g., have unmeasured confounding.
