Table of Contents
Fetching ...

Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery

Mateusz Olko, Mateusz Gajewski, Joanna Wojciechowska, Mikołaj Morzy, Piotr Sankowski, Piotr Miłoś

TL;DR

The paper investigates the limits of neural causal discovery under finite samples, arguing that neural networks cannot reliably distinguish ground-truth causal links from non-links due to $\lambda$-strong faithfulness being a brittle bottleneck. It introduces a unified benchmarking protocol and an empirical framework using $\hat{\lambda}$ to quantify task difficulty, supported by synthetic nonlinear SCMs. Results show that convergence and accuracy improve with larger $\hat{\lambda}$ but degrade as graph size and density increase, aligning with theory that the fraction of $\lambda$-strong faithful distributions shrinks in larger graphs. The findings suggest fundamental constraints in the current neural-discovery paradigm and advocate a paradigm shift toward new data regimes or modeling assumptions beyond standard neural-function approximators.

Abstract

Neural causal discovery methods have recently improved in terms of scalability and computational efficiency. However, our systematic evaluation highlights significant room for improvement in their accuracy when uncovering causal structures. We identify a fundamental limitation: neural networks cannot reliably distinguish between existing and non-existing causal relationships in the finite sample regime. Our experiments reveal that neural networks, as used in contemporary causal discovery approaches, lack the precision needed to recover ground-truth graphs, even for small graphs and relatively large sample sizes. Furthermore, we identify the faithfulness property as a critical bottleneck: (i) it is likely to be violated across any reasonable dataset size range, and (ii) its violation directly undermines the performance of neural discovery methods. These findings lead us to conclude that progress within the current paradigm is fundamentally constrained, necessitating a paradigm shift in this domain.

Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery

TL;DR

The paper investigates the limits of neural causal discovery under finite samples, arguing that neural networks cannot reliably distinguish ground-truth causal links from non-links due to -strong faithfulness being a brittle bottleneck. It introduces a unified benchmarking protocol and an empirical framework using to quantify task difficulty, supported by synthetic nonlinear SCMs. Results show that convergence and accuracy improve with larger but degrade as graph size and density increase, aligning with theory that the fraction of -strong faithful distributions shrinks in larger graphs. The findings suggest fundamental constraints in the current neural-discovery paradigm and advocate a paradigm shift toward new data regimes or modeling assumptions beyond standard neural-function approximators.

Abstract

Neural causal discovery methods have recently improved in terms of scalability and computational efficiency. However, our systematic evaluation highlights significant room for improvement in their accuracy when uncovering causal structures. We identify a fundamental limitation: neural networks cannot reliably distinguish between existing and non-existing causal relationships in the finite sample regime. Our experiments reveal that neural networks, as used in contemporary causal discovery approaches, lack the precision needed to recover ground-truth graphs, even for small graphs and relatively large sample sizes. Furthermore, we identify the faithfulness property as a critical bottleneck: (i) it is likely to be violated across any reasonable dataset size range, and (ii) its violation directly undermines the performance of neural discovery methods. These findings lead us to conclude that progress within the current paradigm is fundamentally constrained, necessitating a paradigm shift in this domain.

Paper Structure

This paper contains 51 sections, 20 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: (a) Estimated fraction of $\lambda$-unfaithful distributions for Erdos-Renyi graphs with various number of nodes. (b) Estimated fraction of $\lambda$-unfaithful distributions for Erdos-Renyi graphs with 6 nodes and varying density. Colored lines correspond to specific values of $\lambda$.
  • Figure 2: (a) Relation of sample needed to for a distribution to converge and the $\hat{\lambda}$. (b) Comparison of the performance of NN-opt method depending on data size. Averaged over 90 samples. For definition of $\text{ESHD}_{\text{CPDAG}}$ please refer to Section \ref{['sec:benchmark']}.
  • Figure 3: Exemplary results of score evaluation using our robust neural network based approximation approach. In red --- score of the target structure, in green --- scores of structures with statistically significantly better scores, in blue --- scores of structures with comparable scores. The error bars signify 95% confidence intervals. Note that considerable number of structures has significantly better (lower) score that the target structure.
  • Figure 4: (a) Linear regression fit between the average performance of neural causal discovery methods and $ln(\hat{\lambda}$ measure. The p-value for spearman rank correlation between $\hat{\lambda}$ and $\text{ESHD}_{\text{CPDAG}}$ and is 4e-6, signifying anti-monotonic correlation. b) Performance of benchmarked methods in terms of $\text{ESHD}_{\text{CPDAG}}$ with resect to dataset size for $\text{ER(10, 2)}$ graphs, averaged over 30 samples.
  • Figure 5: Simple 3 nodes graph $G$.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Remark 3.1
  • Remark 3.2
  • Remark 3.3