Federated Causal Discovery From Interventions
Amin Abyaneh, Nino Scherrer, Patrick Schwab, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou
TL;DR
The paper tackles causal discovery under privacy constraints by enabling learning of a global DAG $G$ from distributed data that include interventional samples. It introduces FedCDI, a two-phase framework where each client learns a local belief over edges using a neural LCDM and the server aggregates these beliefs with a novel proximity-based method that accounts for which covariates were intervened. Empirical results on synthetic ER graphs and real-world bnlearn graphs show FedCDI achieving performance on par with centralized approaches and outperforming prior federated methods, especially under interventional data heterogeneity. The work demonstrates scalability to multiple clients, supports both horizontal and vertical data splits, and provides code for reproducibility, highlighting its practical impact for privacy-preserving causal structure learning in distributed environments.
Abstract
Causal discovery serves a pivotal role in mitigating model uncertainty through recovering the underlying causal mechanisms among variables. In many practical domains, such as healthcare, access to the data gathered by individual entities is limited, primarily for privacy and regulatory constraints. However, the majority of existing causal discovery methods require the data to be available in a centralized location. In response, researchers have introduced federated causal discovery. While previous federated methods consider distributed observational data, the integration of interventional data remains largely unexplored. We propose FedCDI, a federated framework for inferring causal structures from distributed data containing interventional samples. In line with the federated learning framework, FedCDI improves privacy by exchanging belief updates rather than raw samples. Additionally, it introduces a novel intervention-aware method for aggregating individual updates. We analyze scenarios with shared or disjoint intervened covariates, and mitigate the adverse effects of interventional data heterogeneity. The performance and scalability of FedCDI is rigorously tested across a variety of synthetic and real-world graphs.
