Table of Contents
Fetching ...

Self-Compatibility: Evaluating Causal Discovery without Ground Truth

Philipp M. Faller, Leena Chennuru Vankadara, Atalanti A. Mastakouri, Francesco Locatello, Dominik Janzing

TL;DR

Self-Compatibility proposes falsifying causal discovery outputs by testing consistency across variable subsets (self-compatibility) rather than ground-truth data. It defines interventional and graphical notions of compatibility, proves population-level guarantees under standard causal assumptions, and introduces an incompatibility score to quantify cross-subset disagreements. Through synthetic and real data experiments, the score correlates with SHD and can aid model selection, illustrating practical value when ground truth is unavailable. The approach highlights a falsification-centric view of causal discovery, and discusses limitations including lack of guarantees and dependence on how subsets are chosen.

Abstract

As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect preconceptions about generating processes regarding noise distributions, model classes, and more. In this work, we propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth. Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, we argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution. We also demonstrate experimentally that detection of incompatibilities can aid in causal model selection.

Self-Compatibility: Evaluating Causal Discovery without Ground Truth

TL;DR

Self-Compatibility proposes falsifying causal discovery outputs by testing consistency across variable subsets (self-compatibility) rather than ground-truth data. It defines interventional and graphical notions of compatibility, proves population-level guarantees under standard causal assumptions, and introduces an incompatibility score to quantify cross-subset disagreements. Through synthetic and real data experiments, the score correlates with SHD and can aid model selection, illustrating practical value when ground truth is unavailable. The approach highlights a falsification-centric view of causal discovery, and discusses limitations including lack of guarantees and dependence on how subsets are chosen.

Abstract

As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect preconceptions about generating processes regarding noise distributions, model classes, and more. In this work, we propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth. Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, we argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution. We also demonstrate experimentally that detection of incompatibilities can aid in causal model selection.
Paper Structure (56 sections, 12 theorems, 45 equations, 33 figures, 1 table)

This paper contains 56 sections, 12 theorems, 45 equations, 33 figures, 1 table.

Key Result

lemma 1

Let $S_1\dots, S_k$ be $k\in {\mathbb N}$ sets of variables and $P_V$ be a probability distribution over $V\supseteq \bigcup_{i\in [k]} S_i$ such that all $P_{S_i}$ with $i\in [k]$ fulfil the assumptions of $\mathcal{A}$. Then for every $\epsilon> 0$ there is an $m\in {\mathbb N}$ such that ${\cal A

Figures (33)

  • Figure 1: Each marginal causal models over $S$ and $T$ graphical implies a constraint for the edge $X-Y$ as it can only be directed in one way.
  • Figure 2: RCD on 100 datasets that fulfill its assumptions. The plot shows structural Hamming distance of estimated graphs $\hat{G}$ to the respective true graph $G$ versus the interventional incompatibility score $\kappa^I$. As both are influenced by the degree of the true graph, we also calculated the partial correlation given the average node degree of the true graph, which is $0.52$ with $p$-value $3\cdot 10^{-8}$.
  • Figure 3: We chose between the hyperparameters $\alpha=0.1$ and $\alpha=0.001$ of RCD according to the incompatibility $\kappa^I$ for 100 datasets. For 72% of datasets we picked the better model or an equally good model. In most cases where we picked the worse model in terms of SHD the difference in $\kappa^I$ is small.
  • Figure 4: This modification of \ref{['fig:motivation_SandT_false']} renders the edge $X\to Y$ visible if FCI is applied to $S'$ and thus shows that FCI is falsifiable.
  • Figure 5: Two causal models that fulfil the LiNGAM assumption, have the same marginal over $X_1$ and $X_2$ and different coefficient from $X_1$ to $X_2$.
  • ...and 28 more figures

Theorems & Definitions (54)

  • definition 1: ADMG
  • definition 2: causal discovery algorithm
  • definition 3: compatibility notion
  • definition 4: interventional compatibility
  • definition 5: latent ADMG
  • definition 6: graphical compatibility
  • definition 7: observational falsifiability
  • remark 1
  • lemma 1
  • theorem 1
  • ...and 44 more