Table of Contents
Fetching ...

Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

Muralikrishnna G. Sethuraman, Faramarz Fekri

TL;DR

We address learning causal structure in systems with cycles and unobserved confounders by introducing DCCD-CONF, a differentiable framework that models nonlinear cyclic SEMs with correlated exogenous noise via contractive implicit flows. The method learns graph structure and confounder distribution by maximizing penalized data likelihood across interventional settings, with a two-stage optimization and unbiased log-determinant estimation to enable scalable training. Theoretical guarantees show that the estimated graph is I-Markov equivalent to the ground truth under appropriate assumptions, and extensive experiments demonstrate improved causal-edge recovery and confounder identification on synthetic data, plus superior predictive performance on real gene-perturbation datasets (Perturb-CITE-seq) and protein signaling benchmarks. This work advances practical causal discovery in realistic settings by jointly handling cycles, nonlinearity, and hidden confounders, with demonstrated gains in both interpretability and predictive accuracy for complex biological networks.

Abstract

Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.

Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

TL;DR

We address learning causal structure in systems with cycles and unobserved confounders by introducing DCCD-CONF, a differentiable framework that models nonlinear cyclic SEMs with correlated exogenous noise via contractive implicit flows. The method learns graph structure and confounder distribution by maximizing penalized data likelihood across interventional settings, with a two-stage optimization and unbiased log-determinant estimation to enable scalable training. Theoretical guarantees show that the estimated graph is I-Markov equivalent to the ground truth under appropriate assumptions, and extensive experiments demonstrate improved causal-edge recovery and confounder identification on synthetic data, plus superior predictive performance on real gene-perturbation datasets (Perturb-CITE-seq) and protein signaling benchmarks. This work advances practical causal discovery in realistic settings by jointly handling cycles, nonlinearity, and hidden confounders, with demonstrated gains in both interpretability and predictive accuracy for complex biological networks.

Abstract

Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.

Paper Structure

This paper contains 60 sections, 9 theorems, 59 equations, 13 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\mathcal{I} = \{I_k\}_{k=1}^K$ be a family of interventional targets, let $\mathcal{G}^\ast$ denote the ground truth directed mixed graph, let $p^{(k)}$ denote the data generating distribution for $I_k$, and $\hat{\mathcal{G}} := \arg\max_\mathcal{G} \mathcal{S}(\mathcal{G})$. Then, under the A

Figures (13)

  • Figure 1: (a) Example of a directed mixed graph $\mathcal{G}$, where the bidirectional edges represent hidden confounders, with $\sigma_{ij}$ indicating their corresponding strengths; (b) Mutilated graph, $\mathrm{do}(I_k)(\mathcal{G})$, resulting from the interventional experiment $I_k = \{X_3\}$, where all incoming edges (including bidirectional edges) to $X_3$ are removed.
  • Figure 2: Performance of causal graph and confounder recovery under varying problem dimensions. In all the cases the number of observed variables is fixed at $d=10$. (Top row, left column) number of latent confounders ranges from 2 to 8, (top row, right column) number of cycles ranges from 0 to 8. (Bottom row, left column) the degree of nonlinearity $\beta$ is varied between 0 and 1, (bottom row, right column) the number of training interventions is varied between 0 and 10.
  • Figure 3: Performance comparison between DCCD-CONF and the baselines and $d$ is varied between 10 and 80.
  • Figure 4: (Left) Illustration of a directed mixed graph that disobeys directed global Markov property. (Right) The graph on the right represents the graph $\mathcal{G}$ after the acyclification process.
  • Figure 5: Illustration of the augmented graph $\mathcal{G}^\mathcal{I}$ corresponding to the set of interventional targets $\mathcal{I} = \{\emptyset, \{X_3\}, \{X_4\}\}$. $\mathrm{do}(\{X_3\})$ and $\mathrm{do}(\{X_3\})$ corresponds to the graph obtained after hard interventions on $X_3$ and $X_4$ respectively. The augmented graph here is the union of the graphs $\mathcal{G}$, $\mathrm{do}(\{X_3\})$, $\mathrm{do}(\{X_4\})$ along with the context variables.
  • ...and 8 more figures

Theorems & Definitions (22)

  • Theorem 3.1
  • proof : Proof (Sketch)
  • Definition A.1: Collider
  • Definition A.2: $d$-separation
  • Example A.3
  • Definition A.4: Acyclification of a directed mixed graph
  • Definition A.5: $\sigma$-separation
  • Proposition A.6: forre2017markov
  • Definition A.7: General directed global Markov property forre2017markov
  • Definition A.8
  • ...and 12 more