Table of Contents
Fetching ...

Score matching through the roof: linear, nonlinear, and latent variables causal discovery

Francesco Montagna, Philipp M. Faller, Patrick Bloebaum, Elke Kirschbaum, Francesco Locatello

TL;DR

This work develops a score-based framework for causal discovery that leverages the gradient and Hessian of the log-density to identify causal structure, including in the presence of latent variables. It extends identifiability results beyond nonlinear additive-noise models, introducing AdaScore, a flexible algorithm capable of yielding a Markov equivalence class, a DAG, or a mixed graph depending on assumptions. Theoretically, the score’s Jacobian encodes m-separation information for visible variables, and under additive-noise assumptions it can identify direct edges in latent settings (nonlinear case) or ancestral relations (linear case). Empirically, AdaScore performs competitively with state-of-the-art baselines on synthetic and real datasets, scales to moderate graph sizes, and provides robust guarantees across linear, nonlinear, and latent-variable regimes, marking a step toward broadly applicable, score-based causal discovery. The work thus offers a unifying, theory-backed approach to causal discovery that can adapt to varying degrees of latent confounding and mechanism complexity, with practical implications for scalable structure learning in complex domains.

Abstract

Causal discovery from observational data holds great promise, but existing methods rely on strong assumptions about the underlying causal structure, often requiring full observability of all relevant variables. We tackle these challenges by leveraging the score function $\nabla \log p(X)$ of observed variables for causal discovery and propose the following contributions. First, we fine-tune the existing identifiability results with the score on additive noise models, showing that their assumption of nonlinearity of the causal mechanisms is not necessary. Second, we establish conditions for inferring causal relations from the score even in the presence of hidden variables; this result is two-faced: we demonstrate the score's potential to infer the equivalence class of causal graphs with hidden variables (while previous results are restricted to the fully observable setting), and we provide sufficient conditions for identifying direct causes in latent variable models. Building on these insights, we propose a flexible algorithm suited for causal discovery on linear, nonlinear, and latent variable models, which we empirically validate.

Score matching through the roof: linear, nonlinear, and latent variables causal discovery

TL;DR

This work develops a score-based framework for causal discovery that leverages the gradient and Hessian of the log-density to identify causal structure, including in the presence of latent variables. It extends identifiability results beyond nonlinear additive-noise models, introducing AdaScore, a flexible algorithm capable of yielding a Markov equivalence class, a DAG, or a mixed graph depending on assumptions. Theoretically, the score’s Jacobian encodes m-separation information for visible variables, and under additive-noise assumptions it can identify direct edges in latent settings (nonlinear case) or ancestral relations (linear case). Empirically, AdaScore performs competitively with state-of-the-art baselines on synthetic and real datasets, scales to moderate graph sizes, and provides robust guarantees across linear, nonlinear, and latent-variable regimes, marking a step toward broadly applicable, score-based causal discovery. The work thus offers a unifying, theory-backed approach to causal discovery that can adapt to varying degrees of latent confounding and mechanism complexity, with practical implications for scalable structure learning in complex domains.

Abstract

Causal discovery from observational data holds great promise, but existing methods rely on strong assumptions about the underlying causal structure, often requiring full observability of all relevant variables. We tackle these challenges by leveraging the score function of observed variables for causal discovery and propose the following contributions. First, we fine-tune the existing identifiability results with the score on additive noise models, showing that their assumption of nonlinearity of the causal mechanisms is not necessary. Second, we establish conditions for inferring causal relations from the score even in the presence of hidden variables; this result is two-faced: we demonstrate the score's potential to infer the equivalence class of causal graphs with hidden variables (while previous results are restricted to the fully observable setting), and we provide sufficient conditions for identifying direct causes in latent variable models. Building on these insights, we propose a flexible algorithm suited for causal discovery on linear, nonlinear, and latent variable models, which we empirically validate.
Paper Structure (59 sections, 12 theorems, 70 equations, 20 figures, 9 tables, 2 algorithms)

This paper contains 59 sections, 12 theorems, 70 equations, 20 figures, 9 tables, 2 algorithms.

Key Result

Proposition 2

In their Lemma 4.1 spantini2018inference provides the connection between vanishing cross-partial derivatives of the log-likelihood and conditional independence of random variables. Note that this result does not depend on the assumption of a generative model, thus holding beyond the set of structura

Figures (20)

  • Figure 1: mpirical results on sparse graphs with different numbers of nodes, on fully observable (no hidden variables) and latent variable models. We report the SHD accuracy (lower is better). We note that Adascore is comparable to the other methods in all settings (except for DirectLiNGAM on linear data), and always significantly better than random.
  • Figure 2: mpirical results on real and pseudo-real datasets from sachs2005causal, bache2013uci , and smith2011network. We report the SHD accuracy (lower is better). AdaScore has the lowest SHD among all tested methods on the gene dataset and appears to be competitive compared to other methods on the fuel consumption and FMRI data.
  • Figure 3: Examples where the orientations of Proposition \ref{['prop:causal_dir_1']} is more informative than FCI.
  • Figure 4: Examples where the output of FCI is more informative than Proposition \ref{['prop:causal_dir_1']} due to hidden variables.
  • Figure 5: Visible edges are different from the edges that can be oriented via Proposition \ref{['prop:causal_dir_1']}.
  • ...and 15 more figures

Theorems & Definitions (32)

  • Definition 1: Marginal graph, zhang2008causal
  • Proposition 2: Corollary of spantini2018inference
  • Proposition 3: Generalization of Lemma 1 in montagna23_nogam
  • Proposition 4
  • Definition 5: Ancestor
  • Definition 6: Inducing path
  • Example 1: Examples of inducing paths
  • Definition 7: MAG
  • Definition 8: active paths and m-separation
  • Definition 9: Markov equivalence class of a DAG
  • ...and 22 more