Prediction-Powered Causal Inferences
Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, Francesco Locatello
TL;DR
The paper addresses the challenge of obtaining valid causal inferences when the outcome is latent and measured via complex signals, by reusing predictions trained on a similar, annotated experiment. It formalizes Prediction-Powered Causal Inference (PPCI), showing that conditional calibration of the outcome predictor with respect to the treatment and a valid adjustment set suffices for identifiability and for the validity of AIPW-based estimation on prediction-powered samples. It then introduces a Causal Lifting constraint on representations and a practical implementation, Deconfounded Empirical Risk Minimization (DERM), which uses distribution reweighting to disentangle the outcome from experimental settings and enable transfer of causal validity across similar experiments. Empirically, the approach yields first-ever valid zero-shot causal inference on ISTAnt by fine-tuning on a similar annotated dataset, and demonstrates robustness on CausalMNIST under soft shifts and controlled causal effects, while identifying limitations under hard shifts. Overall, the work offers a principled pathway to scale causal analysis in scientific domains by leveraging predictive models, with meaningful implications for automatic discovery and reduced labeling burdens.
Abstract
In many scientific experiments, the data annotating cost constraints the pace for testing novel hypotheses. Yet, modern machine learning pipelines offer a promising solution, provided their predictions yield correct conclusions. We focus on Prediction-Powered Causal Inferences (PPCI), i.e., estimating the treatment effect in an unlabeled target experiment, relying on training data with the same outcome annotated but potentially different treatment or effect modifiers. We first show that conditional calibration guarantees valid PPCI at population level. Then, we introduce a sufficient representation constraint transferring validity across experiments, which we propose to enforce in practice in Deconfounded Empirical Risk Minimization, our new model-agnostic training objective. We validate our method on synthetic and real-world scientific data, solving impossible problem instances for Empirical Risk Minimization even with standard invariance constraints. In particular, for the first time, we achieve valid causal inference on a scientific experiment with complex recording and no human annotations, fine-tuning a foundational model on our similar annotated experiment.
