Table of Contents
Fetching ...

Dagma-DCE: Interpretable, Non-Parametric Differentiable Causal Discovery

Daniel Waxman, Kurt Butler, Petar M. Djuric

TL;DR

Dagma-DCE addresses the interpretability gap in differentiable causal discovery by redefining the adjacency between variables via the L2 derivative norm of the child functions, measured with respect to the input distribution. It presents a model-agnostic, differentiable optimization framework that enforces acyclicity through a central-path constraint and promotes sparsity via an L1 penalty on derivatives. The approach yields an interpretable, non-parametric measure of causal strength based on the differential causal effect (DCE), leading to adjacency matrices whose nonzero entries reflect true local causal influence and whose magnitudes correspond to interaction energy. Empirically, Dagma-DCE achieves competitive or state-of-the-art performance on synthetic benchmarks while enabling principled thresholding and expert-driven sparsity choices, with open-source code available for broad adoption.

Abstract

We introduce Dagma-DCE, an interpretable and model-agnostic scheme for differentiable causal discovery. Current non- or over-parametric methods in differentiable causal discovery use opaque proxies of ``independence'' to justify the inclusion or exclusion of a causal relationship. We show theoretically and empirically that these proxies may be arbitrarily different than the actual causal strength. Juxtaposed to existing differentiable causal discovery algorithms, \textsc{Dagma-DCE} uses an interpretable measure of causal strength to define weighted adjacency matrices. In a number of simulated datasets, we show our method achieves state-of-the-art level performance. We additionally show that \textsc{Dagma-DCE} allows for principled thresholding and sparsity penalties by domain-experts. The code for our method is available open-source at https://github.com/DanWaxman/DAGMA-DCE, and can easily be adapted to arbitrary differentiable models.

Dagma-DCE: Interpretable, Non-Parametric Differentiable Causal Discovery

TL;DR

Dagma-DCE addresses the interpretability gap in differentiable causal discovery by redefining the adjacency between variables via the L2 derivative norm of the child functions, measured with respect to the input distribution. It presents a model-agnostic, differentiable optimization framework that enforces acyclicity through a central-path constraint and promotes sparsity via an L1 penalty on derivatives. The approach yields an interpretable, non-parametric measure of causal strength based on the differential causal effect (DCE), leading to adjacency matrices whose nonzero entries reflect true local causal influence and whose magnitudes correspond to interaction energy. Empirically, Dagma-DCE achieves competitive or state-of-the-art performance on synthetic benchmarks while enabling principled thresholding and expert-driven sparsity choices, with open-source code available for broad adoption.

Abstract

We introduce Dagma-DCE, an interpretable and model-agnostic scheme for differentiable causal discovery. Current non- or over-parametric methods in differentiable causal discovery use opaque proxies of ``independence'' to justify the inclusion or exclusion of a causal relationship. We show theoretically and empirically that these proxies may be arbitrarily different than the actual causal strength. Juxtaposed to existing differentiable causal discovery algorithms, \textsc{Dagma-DCE} uses an interpretable measure of causal strength to define weighted adjacency matrices. In a number of simulated datasets, we show our method achieves state-of-the-art level performance. We additionally show that \textsc{Dagma-DCE} allows for principled thresholding and sparsity penalties by domain-experts. The code for our method is available open-source at https://github.com/DanWaxman/DAGMA-DCE, and can easily be adapted to arbitrary differentiable models.
Paper Structure (21 sections, 2 theorems, 25 equations, 2 figures, 1 table)

This paper contains 21 sections, 2 theorems, 25 equations, 2 figures, 1 table.

Key Result

Lemma 1

Let $\sigma(\cdot)$ denote the sigmoid activation function. Then for any $\delta, \epsilon> 0$, there exists an MLP $f_j$ with weight matrices $\mathbf{A}^{(1)}, \dots, \mathbf{A}^{(M)}$ such that $\lVert \mathbf{A}^{(1)}_{i\cdot} \rVert_{L^2} < \epsilon$ but $\lVert \partial_i f_j \rVert_{L^2}> \de

Figures (2)

  • Figure 1: The difference between the magnitude of the true derivatives in a linear causal model to the magnitude of the weighted graph in Dagma for a random $10 \times 10$ Erdös-Rényi directed graph with $20$ expected edges. Gray boxes surrounding each cell denote the magnitude of the ground-truth linear coefficient.
  • Figure 2: Resulting SID (top left), SHD (top right), $F_1$ Score (bottom left), and time elapsed (bottom right) for random data generated from (\ref{['fig:GP_Add_Results']}) the ER-4 GP-additive model and (\ref{['fig:MLP_Results']}) the ER-4 MLP model, as detailed in \ref{['sec:synthetic_data']}. Boxes show the median and quartiles across $T=10$ trials for Dagma and Dagma-DCE, and $T=5$ trials for Notears, with whiskers showing the minimum and maximum values.

Theorems & Definitions (2)

  • Lemma 1
  • Lemma 2