Do-PFN: In-Context Learning for Causal Effect Estimation
Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, Bernhard Schölkopf
TL;DR
We address causal effect estimation from observational data without requiring a known causal graph or unconfoundedness. Do-PFN pre-trains a transformer-based PFN on synthetic SCMs to perform in-context learning for predicting conditional interventional distributions and CATEs from observational data, effectively learning to adjust for causal structure during inference. The paper provides theoretical results on optimal CID approximation under the SCM prior, analyzes sources of uncertainty, and demonstrates strong empirical performance across six synthetic case studies, RealCause, and known-graph datasets, with robust uncertainty calibration and favorable inference speed. This approach broadens access to causal-effect estimation by leveraging synthetic priors and amortized inference, offering a practical tool that remains competitive when traditional assumptions fail and scales to moderately complex causal graphs. Overall, Do-PFN shows promise as a general-purpose, efficient backbone for causal inference in tabular settings.
Abstract
Estimation of causal effects is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground truth causal graph, or rely on assumptions such as unconfoundedness, restricting their applicability in real-world settings. In the domain of tabular machine learning, Prior-data fitted networks (PFNs) have achieved state-of-the-art predictive performance, having been pre-trained on synthetic data to solve tabular prediction problems via in-context learning. To assess whether this can be transferred to the harder problem of causal effect estimation, we pre-train PFNs on synthetic data drawn from a wide variety of causal structures, including interventions, to predict interventional outcomes given observational data. Through extensive experiments on synthetic case studies, we show that our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph. We also perform ablation studies that elucidate Do-PFN's scalability and robustness across datasets with a variety of causal characteristics.
