Table of Contents
Fetching ...

Differentiable Causal Discovery For Latent Hierarchical Causal Models

Parjanya Prashant, Ignavier Ng, Kun Zhang, Biwei Huang

TL;DR

This work tackles causal discovery when latent confounders form nonlinear, hierarchical structures. It introduces a differentiable framework that learns both the latent graph and the data-generating process by matching the observed distribution with a variational autoencoder while enforcing a structured, block upper-triangular latent graph via Gumbel-softmax. A key theoretical contribution is identifiability of nonlinear latent hierarchical models under relaxed conditions, formalized through a rank Jacobian criterion that links the observed distribution to latent d-separation, plus supporting lemmas and a measurement theorem. Empirically, the method shows improved accuracy and scalability over existing approaches on synthetic graphs and real-world image data (MNIST, CMNIST, CelebA), yielding interpretable hierarchical latent representations that transfer well across domains. The results suggest practical impact for learning interpretable, high-dimensional causal structures in vision and related domains, with avenues for further relaxation of assumptions and broader applicability.

Abstract

Discovering causal structures with latent variables from observational data is a fundamental challenge in causal discovery. Existing methods often rely on constraint-based, iterative discrete searches, limiting their scalability to large numbers of variables. Moreover, these methods frequently assume linearity or invertibility, restricting their applicability to real-world scenarios. We present new theoretical results on the identifiability of nonlinear latent hierarchical causal models, relaxing previous assumptions in literature about the deterministic nature of latent variables and exogenous noise. Building on these insights, we develop a novel differentiable causal discovery algorithm that efficiently estimates the structure of such models. To the best of our knowledge, this is the first work to propose a differentiable causal discovery method for nonlinear latent hierarchical models. Our approach outperforms existing methods in both accuracy and scalability. We demonstrate its practical utility by learning interpretable hierarchical latent structures from high-dimensional image data and demonstrate its effectiveness on downstream tasks.

Differentiable Causal Discovery For Latent Hierarchical Causal Models

TL;DR

This work tackles causal discovery when latent confounders form nonlinear, hierarchical structures. It introduces a differentiable framework that learns both the latent graph and the data-generating process by matching the observed distribution with a variational autoencoder while enforcing a structured, block upper-triangular latent graph via Gumbel-softmax. A key theoretical contribution is identifiability of nonlinear latent hierarchical models under relaxed conditions, formalized through a rank Jacobian criterion that links the observed distribution to latent d-separation, plus supporting lemmas and a measurement theorem. Empirically, the method shows improved accuracy and scalability over existing approaches on synthetic graphs and real-world image data (MNIST, CMNIST, CelebA), yielding interpretable hierarchical latent representations that transfer well across domains. The results suggest practical impact for learning interpretable, high-dimensional causal structures in vision and related domains, with avenues for further relaxation of assumptions and broader applicability.

Abstract

Discovering causal structures with latent variables from observational data is a fundamental challenge in causal discovery. Existing methods often rely on constraint-based, iterative discrete searches, limiting their scalability to large numbers of variables. Moreover, these methods frequently assume linearity or invertibility, restricting their applicability to real-world scenarios. We present new theoretical results on the identifiability of nonlinear latent hierarchical causal models, relaxing previous assumptions in literature about the deterministic nature of latent variables and exogenous noise. Building on these insights, we develop a novel differentiable causal discovery algorithm that efficiently estimates the structure of such models. To the best of our knowledge, this is the first work to propose a differentiable causal discovery method for nonlinear latent hierarchical models. Our approach outperforms existing methods in both accuracy and scalability. We demonstrate its practical utility by learning interpretable hierarchical latent structures from high-dimensional image data and demonstrate its effectiveness on downstream tasks.

Paper Structure

This paper contains 41 sections, 17 theorems, 28 equations, 7 figures, 5 tables.

Key Result

Proposition 1

The probability of a distribution $P$ generated by a structural model with respect to $\mathcal{G}$ violating Generalized Faithfulness is zero.

Figures (7)

  • Figure 1: Example of a graph we consider. Note that we allow multiple paths between two nodes and hence generalize trees. The latent variables are shaded.
  • Figure 2: Performance vs. Time for different causal discovery methods. Time is plotted on a logarithmic scale.
  • Figure 3: Figures for the Image experiments. (a) Latent causal graph for digit images (b) Visualization of subgraph of the learnt latent causal graph on MNIST (c) Samples from the CMNIST dataset illustrating digit-label associations under different conditions. Top row: training set samples with default color-label mapping. Middle row: test set samples with reversed color-label mapping. Bottom row: test set samples with a consistent blue color irrespective of labels.
  • Figure 4: Ground truth causal graphs for Synthetic Experiments. (a) and (b) are trees (only one path between any two nodes). (c) and (d) allow v-structures (multiple paths between two nodes)
  • Figure 5: Evolution of different loss components during training
  • ...and 2 more figures

Theorems & Definitions (27)

  • Definition 1: Pure Children
  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 3
  • Lemma 4
  • Proposition \ref{proposition:lebesgue_measure}
  • ...and 17 more