Table of Contents
Fetching ...

Sample Efficient Bayesian Learning of Causal Graphs from Interventions

Zihan Zhou, Muhammad Qasim Elahi, Murat Kocaoglu

TL;DR

A Bayesian approach for learning causal graphs with limited interventional samples is considered, mirroring real-world scenarios where such samples are usually costly to obtain and it is shown theoretically that the proposed algorithm will return the true causal graph with high probability.

Abstract

Causal discovery is a fundamental problem with applications spanning various areas in science and engineering. It is well understood that solely using observational data, one can only orient the causal graph up to its Markov equivalence class, necessitating interventional data to learn the complete causal graph. Most works in the literature design causal discovery policies with perfect interventions, i.e., they have access to infinite interventional samples. This study considers a Bayesian approach for learning causal graphs with limited interventional samples, mirroring real-world scenarios where such samples are usually costly to obtain. By leveraging the recent result of Wienöbst et al. (2023) on uniform DAG sampling in polynomial time, we can efficiently enumerate all the cut configurations and their corresponding interventional distributions of a target set, and further track their posteriors. Given any number of interventional samples, our proposed algorithm randomly intervenes on a set of target vertices that cut all the edges in the graph and returns a causal graph according to the posterior of each target set. When the number of interventional samples is large enough, we show theoretically that our proposed algorithm will return the true causal graph with high probability. We compare our algorithm against various baseline methods on simulated datasets, demonstrating its superior accuracy measured by the structural Hamming distance between the learned DAG and the ground truth. Additionally, we present a case study showing how this algorithm could be modified to answer more general causal questions without learning the whole graph. As an example, we illustrate that our method can be used to estimate the causal effect of a variable that cannot be intervened.

Sample Efficient Bayesian Learning of Causal Graphs from Interventions

TL;DR

A Bayesian approach for learning causal graphs with limited interventional samples is considered, mirroring real-world scenarios where such samples are usually costly to obtain and it is shown theoretically that the proposed algorithm will return the true causal graph with high probability.

Abstract

Causal discovery is a fundamental problem with applications spanning various areas in science and engineering. It is well understood that solely using observational data, one can only orient the causal graph up to its Markov equivalence class, necessitating interventional data to learn the complete causal graph. Most works in the literature design causal discovery policies with perfect interventions, i.e., they have access to infinite interventional samples. This study considers a Bayesian approach for learning causal graphs with limited interventional samples, mirroring real-world scenarios where such samples are usually costly to obtain. By leveraging the recent result of Wienöbst et al. (2023) on uniform DAG sampling in polynomial time, we can efficiently enumerate all the cut configurations and their corresponding interventional distributions of a target set, and further track their posteriors. Given any number of interventional samples, our proposed algorithm randomly intervenes on a set of target vertices that cut all the edges in the graph and returns a causal graph according to the posterior of each target set. When the number of interventional samples is large enough, we show theoretically that our proposed algorithm will return the true causal graph with high probability. We compare our algorithm against various baseline methods on simulated datasets, demonstrating its superior accuracy measured by the structural Hamming distance between the learned DAG and the ground truth. Additionally, we present a case study showing how this algorithm could be modified to answer more general causal questions without learning the whole graph. As an example, we illustrate that our method can be used to estimate the causal effect of a variable that cannot be intervened.

Paper Structure

This paper contains 23 sections, 6 theorems, 46 equations, 6 figures, 3 algorithms.

Key Result

Lemma 5.1

elahi2024adaptive Assume that the faithfulness assumption holds and $\mathcal{D}^*$ is the true DAG. For any DAG $\mathcal{D}_1 \neq \mathcal{D^*}$, if $P_{\mathbf{s}}^{\mathcal{D}_1} = P_{\mathbf{s}}^{\mathcal{D}^*}$ for some $\mathbf{S} \subseteq \mathbf{V}$, they must share the same cutting edge

Figures (6)

  • Figure 1: Average KL divergence and TVD between estimated causal effect and ground truth vs number of interventional samples for random causal graphs.
  • Figure 2: SHD vs number of interventional samples for random complete graphs
  • Figure 3: SHD vs number of interventional samples for random sparse chordal graphs
  • Figure 4: SHD vs number of interventional samples for large random Erdős-Rényi chordal graphs
  • Figure 5: SHD vs number of interventional samples for scale-free graphs generated from Barabási-Albert (BA) model. We generated 50 random DAGs under two settings and plot the average SHD and standard deviation.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 3.1: Faithfulness zhang2012strong
  • Definition 4.1: $(n, k)$-Separating System katona1966separatingwegener1979separating
  • Lemma 5.1
  • Definition 5.2
  • Lemma 5.4
  • Theorem 5.5
  • Corollary 5.6
  • Lemma D.1: shanmugam2015learning
  • Lemma D.2: shanmugam2015learning
  • Definition D.3: $G$-Separating System, Definition 3 of kocaoglu2017cost