Table of Contents
Fetching ...

Cross-validating causal discovery via Leave-One-Variable-Out

Daniela Schkoda, Philipp Faller, Patrick Blöbaum, Dominik Janzing

TL;DR

Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method's effectiveness.

Abstract

We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a pair of variables that has been dropped when learning the causal model. To this end, we use the "Leave-One-Variable-Out (LOVO)" prediction where $Y$ is inferred from $X$ without any joint observations of $X$ and $Y$, given only training data from $X,Z_1,\dots,Z_k$ and from $Z_1,\dots,Z_k,Y$. We demonstrate that causal models on the two subsets, in the form of Acyclic Directed Mixed Graphs (ADMGs), often entail conclusions on the dependencies between $X$ and $Y$, enabling this type of prediction. The prediction error can then be estimated since the joint distribution $P(X, Y)$ is assumed to be available, and $X$ and $Y$ have only been omitted for the purpose of falsification. After presenting this graphical method, which is applicable to general causal discovery algorithms, we illustrate how to construct a LOVO predictor tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method's effectiveness.

Cross-validating causal discovery via Leave-One-Variable-Out

TL;DR

Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method's effectiveness.

Abstract

We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a pair of variables that has been dropped when learning the causal model. To this end, we use the "Leave-One-Variable-Out (LOVO)" prediction where is inferred from without any joint observations of and , given only training data from and from . We demonstrate that causal models on the two subsets, in the form of Acyclic Directed Mixed Graphs (ADMGs), often entail conclusions on the dependencies between and , enabling this type of prediction. The prediction error can then be estimated since the joint distribution is assumed to be available, and and have only been omitted for the purpose of falsification. After presenting this graphical method, which is applicable to general causal discovery algorithms, we illustrate how to construct a LOVO predictor tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method's effectiveness.

Paper Structure

This paper contains 36 sections, 7 theorems, 32 equations, 9 figures, 2 tables.

Key Result

Lemma 1

Let $X,Y$ be real-valued variables whose conditional distributions $P(Y|{\boldsymbol{Z}}={\boldsymbol{z}})$$P(Y|{\boldsymbol{Z}}={\boldsymbol{z}})$ have densities $p(x|{\boldsymbol{z}})$ and $p(y|{\boldsymbol{z}})$ with respect to the Lebesque measure. Let ${\boldsymbol{Z}}=\{Z_1,\dots,Z_k\}$ be var

Figures (9)

  • Figure 1: Exclude edges based on the marginal graphs.
  • Figure 2: For Lemma \ref{['lem:nolinks']} and small values of $q$, Lemma \ref{['lem:exclude_links_directed_part']} and $p \in [0.3, 0.7]$, and Lemma \ref{['lem:determine_edge_type_DAG']} regardless of $p$, only in few graphs no single unlinked pair can be detected, so that LOVO is realizable in most cases.
  • Figure 3: When provided with the true marginal graphs $G_X$ and $G_Y$, the parent adjustment LOVO predictor and the LiNGAM LOVO predictor outperform the baseline.
  • Figure 4: The scatter plots show LOVO versus baseline loss for parent adjustment LOVO applied to graphs estimated with DirectLiNGAM, and RCD; and for DL LOVO prediction.
  • Figure 5: The scatter plots show how LOVO performance correlates with causal discovery performance. The LOVO error increases with the number of pairs misidentified as unlinked and with the SHD. The corresponding Spearman correlation coefficients included in the titles all significantly deviate from zero, with $p$-values $0.0,0.0,0.0,$ and $4\cdot10^{-44}$.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Lemma 1: No probabilistic law enables LOVO prediction
  • Lemma 2: excluding links in ADMGs
  • Lemma 3: excluding links from directed part
  • Lemma 4: determining edge types in DAGs
  • Theorem 5: LOVO by adjusting union of parents
  • Theorem 6: LOVO via LiNGAM
  • Definition 7: MaxEnt Baseline predictor
  • Lemma 8: MaxEnt baseline
  • proof