Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

Shuyu Dong; Kento Uemura; Akito Fujii; Shuang Chang; Yusuke Koyanagi; Koji Maruhashi; Michèle Sebag

Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

Shuyu Dong, Kento Uemura, Akito Fujii, Shuang Chang, Yusuke Koyanagi, Koji Maruhashi, Michèle Sebag

TL;DR

This work addresses learning causal DAGs from observational data in linear SEMs by exploiting the inverse covariance Θ and proposing ICID, a constrained, sparsity-promoting matrix decomposition that preserves Θ’s nonzero pattern. By formulating a continuous optimization, ICID recovers B from Θ via Θ = (I−B) D (I−B)^{T} with a diagonal D, enabling identifiability provided Ω is known and offering favorable computational complexity. A regularized variant with skewness information improves performance on skewed data, and an augmented Lagrangian/FISTA-based algorithm delivers scalable inference with empirical speedups over state-of-the-art methods on synthetic and simulated fMRI data. The approach advances scalable causal discovery in high dimensions, showing robustness to noise-variance misspecification and opening avenues for ensemble and CEP-informed extensions.

Abstract

Learning causal structures from observational data is a fundamental problem facing important computational challenges when the number of variables is large. In the context of linear structural equation models (SEMs), this paper focuses on learning causal structures from the inverse covariance matrix. The proposed method, called ICID for Independence-preserving Decomposition from Inverse Covariance matrix, is based on continuous optimization of a matrix decomposition model that preserves the nonzero patterns of the inverse covariance matrix. Through theoretical and empirical evidences, we show that ICID efficiently identifies the sought directed acyclic graph (DAG) assuming the knowledge of noise variances. Moreover, ICID is shown empirically to be robust under bounded misspecification of noise variances in the case where the noise variances are non-equal. The proposed method enjoys a low complexity, as reflected by its time efficiency in the experiments, and also enables a novel regularization scheme that yields highly accurate solutions on the Simulated fMRI data (Smith et al., 2011) in comparison with state-of-the-art algorithms.

Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

TL;DR

Abstract

Paper Structure (46 sections, 9 theorems, 45 equations, 10 figures, 7 tables, 4 algorithms)

This paper contains 46 sections, 9 theorems, 45 equations, 10 figures, 7 tables, 4 algorithms.

Introduction
Related work.
Contributions.
Background
Definitions and notation
Structural equation models
Independence-preserving decomposition of the precision matrix
Matrix decomposition within a sparse support
ICID model learning via constrained optimization
Additional regularization.
Algorithm
Computational cost.
Discussion.
Experiments
Experimental setting
...and 31 more sections

Key Result

Lemma 1

[lemma]lemm:loh14-lemm1 Let $\hbox{\bf X}$ be a random variable following the SEM eq:sem of $(B, {\Omega})$. Then the coefficients of the inverse covariance matrix $\Theta$ of $\hbox{\bf X}$ are as follows, for all $i$ and $j \neq i$ in $[d]$: $\Theta_{ij} = -\frac{B_{ij}}{\omega_{j}^{2}} - \frac{B

Figures (10)

Figure 1: Causal discovery results vs number of nodes. The number of nodes $d$ range from 100 to 2000. The SHD scores in the middle subplot are normalized by $\text{nnz}(B^\star)$.
Figure 2: Causal structure learning by $\mathcal{O}$-ICID compared to GH18 and GES. The true DAGs are drawn from the ER1 and ER3 sets with $d=$ 50 nodes. In (a)--(b): the x-axis indicates the scaling parameter $\lambda$ in \ref{['eq:def-xlam']}. In (c)--(d): the SHD plots use the log scale for the x-axis of ${\sigma}$ while the TPR plots use the linear scale for the same values of ${\sigma}$.
Figure 3: Results on the Simulated fMRI datasets (Sim3 and Sim4): (a)--(b) ROC curves. (c) Running time on Sim3 (upper) and Sim4 (lower).
Figure 4: Grid search of $\lambda_1$ with \ref{['alg:ice-emp']} based on criterion $C(\lambda_1)$\ref{['eq:crit-c1']}. Data $X$ is from linear SEM with Gaussian noise, on ER2 graph with $d=200$ nodes.
Figure 5: ROC curves of ICID (skew) with different values of $\lambda_2$ on the fMRI Sim4 dataset.. The x-axis is the FDR scores instead of FPRs.
...and 5 more figures

Theorems & Definitions (17)

Lemma 1: loh2014high
Theorem 2: loh2014high
Definition 4
Proposition 5
Theorem 6
Corollary 7
Proposition 8
proof
Lemma 9: rose1970triangulatedpaulsen1989schur
proof
...and 7 more

Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

TL;DR

Abstract

Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (17)