Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
Mathieu Chevalley, Patrick Schwab, Arash Mehrjou
TL;DR
This work tackles the problem of extracting causal order from datasets with many single-variable interventions. It introduces the $\epsilon$-interventional faithfulness$ assumption and a score-based framework that leads to Intersort, an algorithm consisting of an initialization step and a local search to maximize a causal-order score. The authors prove theoretical guarantees on the optimal score and provide finite-sample bounds for the top-order error, while empirically demonstrating that Intersort outperforms established baselines across diverse data-generating processes and remains robust to normalization. The results suggest that rich causal information is recoverable from interventional data under realistic assumptions, with potential practical impact on experimental design and causal inference in biology and related fields.
Abstract
Targeted and uniform interventions to a system are crucial for unveiling causal relationships. While several methods have been developed to leverage interventional data for causal structure learning, their practical application in real-world scenarios often remains challenging. Recent benchmark studies have highlighted these difficulties, even when large numbers of single-variable intervention samples are available. In this work, we demonstrate, both theoretically and empirically, that such datasets contain a wealth of causal information that can be effectively extracted under realistic assumptions about the data distribution. More specifically, we introduce a novel variant of interventional faithfulness, which relies on comparisons between the marginal distributions of each variable across observational and interventional settings, and we introduce a score on causal orders. Under this assumption, we are able to prove strong theoretical guarantees on the optimum of our score that also hold for large-scale settings. To empirically verify our theory, we introduce Intersort, an algorithm designed to infer the causal order from datasets containing large numbers of single-variable interventions by approximately optimizing our score. Intersort outperforms baselines (GIES, DCDI, PC and EASE) on almost all simulated data settings replicating common benchmarks in the field. Our proposed novel approach to modeling interventional datasets thus offers a promising avenue for advancing causal inference, highlighting significant potential for further enhancements under realistic assumptions.
