Use What You Know: Causal Foundation Models with Partial Graphs
Arik Reuter, Anish Dhir, Cristiana Diaconu, Jake Robertson, Ole Ossen, Frank Hutter, Adrian Weller, Mark van der Wilk, Bernhard Schölkopf
TL;DR
This work tackles the challenge of incorporating domain knowledge into Causal Foundation Models (CFMs) to improve causal effect estimation. It proposes conditioning CFMs on partial ancestral information via partially known ancestor matrices (PAMs) and presents architectures that combine attention biasing with graph convolution to leverage this structure. Empirical results show that partial graph conditioning significantly improves predictions, with soft attention biasing often outperforming other strategies and enabling a single CFM to approach the performance of specialized, graph-specific models. The findings advance the goal of all-in-one CFMs capable of answering causal queries in a data-driven, knowledge-informed manner, with practical benefits for complex and semi-synthetic benchmarks.
Abstract
Estimating causal quantities traditionally relies on bespoke estimators tailored to specific assumptions. Recently proposed Causal Foundation Models (CFMs) promise a more unified approach by amortising causal discovery and inference in a single step. However, in their current state, they do not allow for the incorporation of any domain knowledge, which can lead to suboptimal predictions. We bridge this gap by introducing methods to condition CFMs on causal information, such as the causal graph or more readily available ancestral information. When access to complete causal graph information is too strict a requirement, our approach also effectively leverages partial causal information. We systematically evaluate conditioning strategies and find that injecting learnable biases into the attention mechanism is the most effective method to utilise full and partial causal information. Our experiments show that this conditioning allows a general-purpose CFM to match the performance of specialised models trained on specific causal structures. Overall, our approach addresses a central hurdle on the path towards all-in-one causal foundation models: the capability to answer causal queries in a data-driven manner while effectively leveraging any amount of domain expertise.
