Table of Contents
Fetching ...

Learning Neural Causal Models from Unknown Interventions

Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

TL;DR

This work tackles the identifiability gap in causal structure learning by proposing a neural, continuous-optimization framework (SDI) that integrates observational and interventional data, even when the intervention targets are unknown. SDI operates in three phases: fitting conditional mechanisms on observational data, scoring candidate graphs against interventional data with a target-prediction heuristic, and crediting structure updates via a REINFORCE-like gradient with an acyclicity regularizer. Across synthetic and real-world (BnLearn) datasets, SDI robustly recovers true graphs, generalizes to unseen interventions, and scales to partially known graphs, outperforming several baselines. The approach advances practical causal discovery in settings where interventions are sparse, uncertain, or partially observed, with broad implications for biology and related sciences.

Abstract

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the extension and application of methods designed for observational data to include interventions is not straightforward and remains an open problem. In this paper we provide a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data. The proposed method is even applicable in the challenging and realistic case that the identity of the intervened upon variable is unknown. We examine the proposed method in the setting of graph recovery both de novo and from a partially-known edge set. We establish strong benchmark results on several structure learning tasks, including structure recovery of both synthetic graphs as well as standard graphs from the Bayesian Network Repository.

Learning Neural Causal Models from Unknown Interventions

TL;DR

This work tackles the identifiability gap in causal structure learning by proposing a neural, continuous-optimization framework (SDI) that integrates observational and interventional data, even when the intervention targets are unknown. SDI operates in three phases: fitting conditional mechanisms on observational data, scoring candidate graphs against interventional data with a target-prediction heuristic, and crediting structure updates via a REINFORCE-like gradient with an acyclicity regularizer. Across synthetic and real-world (BnLearn) datasets, SDI robustly recovers true graphs, generalizes to unseen interventions, and scales to partially known graphs, outperforming several baselines. The approach advances practical causal discovery in settings where interventions are sparse, uncertain, or partially observed, with broad implications for biology and related sciences.

Abstract

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the extension and application of methods designed for observational data to include interventions is not straightforward and remains an open problem. In this paper we provide a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data. The proposed method is even applicable in the challenging and realistic case that the identity of the intervened upon variable is unknown. We examine the proposed method in the setting of graph recovery both de novo and from a partially-known edge set. We establish strong benchmark results on several structure learning tasks, including structure recovery of both synthetic graphs as well as standard graphs from the Bayesian Network Repository.

Paper Structure

This paper contains 57 sections, 3 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: In many areas of science, such as biology, we try to infer the underlying mechanisms and structure through experiments. We can obtain observational data plus interventional data through known (e.g. by targeting a certain variable) or unknown interventions (e.g. when it is unclear where the effect of the intervention will be). Knowledge of existing edges e.g. through previous experiments can likewise be included and be considered a special case of causal induction.
  • Figure 2: Workflow for our proposed method SDI. Phase 1 samples graphs under the model's current belief about the edge structure and fits parameters to observational data. Phase 2 scores a small set of graphs against interventional data and assigns rewards according to graphs' ability to predict interventions. Phase 3 uses the rewards from Phase 2 to update the beliefs about the edge structure. If the believed edge probabilities have all saturated near 0 or 1, the method has converged.
  • Figure 3: MLP Model Architecture for $M=3$, $N=2$ (fork3) SCM. The model computes the conditional probabilities of $A, B, C$ given their parents using a stack of three independent MLPs. The MLP input layer uses an adjacency matrix sampled from $\mathrm{Ber(\sigma(\gamma))}$ as an input mask to force the model to make use only of parent nodes to predict their child node.
  • Figure 4: Cross entropy (CE) and Area-Under-Curve (AUC/AUROC) for edge probabilities of learned graph against ground-truth for synthetic SCMs. Error bars represent $\pm 1\sigma$ over PRNG seeds 1-5. Left to right: chainM,jungleM,fullM,$M=3\ldots13$. Graphs (3-13 variables) all learn perfectly with AUROC reaching 1.0. However, denser graphs (fullM) take longer to converge.
  • Figure 5: Every possible 3-variable connected DAG.
  • ...and 12 more figures