Learning Cyclic Causal Models from Incomplete Data

Muralikrishnna G. Sethuraman; Faramarz Fekri

Learning Cyclic Causal Models from Incomplete Data

Muralikrishnna G. Sethuraman, Faramarz Fekri

TL;DR

The paper tackles learning cyclic causal graphs from incomplete data, a setting with missing data MCAR and feedback loops. It introduces MissNODAGS, an EM-based framework that alternates imputing missing values with maximizing a sparsity-penalized log-likelihood, using contractive residual flows to model the data distribution under both linear and nonlinear SEMs. The authors provide consistency guarantees for the linear Gaussian case under interventional quasi-equivalence and demonstrate improved performance over state-of-the-art imputation plus learning baselines on synthetic and real Perturb-CITE-seq data. This framework enables causal discovery in cyclical systems with missing data, requiring interventional data for cycles and offering a scalable, likelihood-based approach.

Abstract

Causal learning is a fundamental problem in statistics and science, offering insights into predicting the effects of unseen treatments on a system. Despite recent advances in this topic, most existing causal discovery algorithms operate under two key assumptions: (i) the underlying graph is acyclic, and (ii) the available data is complete. These assumptions can be problematic as many real-world systems contain feedback loops (e.g., biological systems), and practical scenarios frequently involve missing data. In this work, we propose a novel framework, named MissNODAGS, for learning cyclic causal graphs from partially missing data. Under the additive noise model, MissNODAGS learns the causal graph by alternating between imputing the missing data and maximizing the expected log-likelihood of the visible part of the data in each training step, following the principles of the expectation-maximization (EM) framework. Through synthetic experiments and real-world single-cell perturbation data, we demonstrate improved performance when compared to using state-of-the-art imputation techniques followed by causal learning on partially missing interventional data.

Learning Cyclic Causal Models from Incomplete Data

TL;DR

Abstract

Paper Structure (31 sections, 1 theorem, 37 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 37 equations, 5 figures, 2 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Causal Discovery from Complete Data.
Causal Discovery from Incomplete Data.
PROBLEM SETUP
Modeling Causal Graphs via Structural Equations
Modeling Missing Data
MISSNODAGS
Objective
The Overall Framework
E-step:
M-step:
Computing the Log-Likelihood in E-step
Modeling the causal function.
Computing the log-det of Jacobian.
...and 16 more sections

Key Result

Theorem 1

Under the assumptions stated in Appendix A, the global minimizer of eq:obj with a suitable choice of $\lambda$ outputs $\tilde{G} \cong_\mathcal{I} G$ for linear Gaussian SEM.

Figures (5)

Figure 1: Evolution of the parameters of the estimated adjacency matrix. $B^\ast$ denotes the true adjacency matrix and $B^{(n)}$ denotes the parameters of the estimated adjacency matrix after the $n$-th training iteration.
Figure 2: Results on recovery of linear Gaussian SEM with $d=20$ nodes.
Figure 3: Results on recovery of nonlinear Gaussian SEM with $d=20$ nodes.
Figure 4: Results on non-contractive data
Figure 5: Results on Perturb-CITE-seq data set frangieh2021multimodal.

Theorems & Definitions (4)

Theorem 1
Definition 2: Interventional quasi-equivalence pmlr-v206-sethuraman23a
Definition 3: Generalized Faithfulness ghassami2020characterizing
proof

Learning Cyclic Causal Models from Incomplete Data

TL;DR

Abstract

Learning Cyclic Causal Models from Incomplete Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)