Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework

Fabrizio Russo; Francesca Toni

Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework

Fabrizio Russo, Francesca Toni

TL;DR

Shapley-PC introduces a Shapley-value–based decision rule to orient v-structures in constraint-based CSL, integrated into PC-Stable. It provides theoretical guarantees of soundness, completeness, and asymptotic consistency, and demonstrates empirical gains over existing PC-based methods, especially on denser graphs. The approach aggregates evidence across multiple CI tests via Shapley Independence Values, reducing sensitivity to individual test errors and improving robustness to near-unfaithful distributions. Extensive simulations and pseudo-real data experiments show consistent improvements in collider identification and overall graph recovery, with manageable computational overhead. These results suggest a practical, theoretically grounded enhancement to constraint-based causal discovery that can be extended to related CSL frameworks and alternative CI metrics.

Abstract

Causal Structure Learning (CSL), also referred to as causal discovery, amounts to extracting causal relations among variables in data. CSL enables the estimation of causal effects from observational data alone, avoiding the need to perform real life experiments. Constraint-based CSL leverages conditional independence tests to perform causal discovery. We propose Shapley-PC, a novel method to improve constraint-based CSL algorithms by using Shapley values over the possible conditioning sets, to decide which variables are responsible for the observed conditional (in)dependences. We prove soundness, completeness and asymptotic consistency of Shapley-PC and run a simulation study showing that our proposed algorithm is superior to existing versions of PC.

Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework

TL;DR

Abstract

Paper Structure (49 sections, 4 theorems, 11 equations, 10 figures, 22 tables, 1 algorithm)

This paper contains 49 sections, 4 theorems, 11 equations, 10 figures, 22 tables, 1 algorithm.

Introduction
Preliminaries
Graph Notions
Statistical Notions
Shapley Values
PC-based Methods: State-of-the-art
Shapley-PC
Shapley Decision Rule
The Shapley-PC algorithm
Theoretical Guarantees
Additional Properties
Empirical Evaluation
Data Generating Process (DGP)
Evaluation Metrics
Results
...and 34 more sections

Key Result

Lemma 2

Given a skeleton $\mathcal{C}$, a UT $X_i-X_j-X_k \in \mathcal{C}$, $X_i, X_j, X_k \in \mathbf{V}$, and a perfect CIT $I_{\infty}$, the SIV of variable $X_j$$\phi_{I_{\infty}}(X_j, \{X_i, X_k\}) < 0$ if and only if $X_j$ is a collider for $X_i$ and $X_k$.

Figures (10)

Figure 1: Mean and standard deviation ArrowHead F1 score for four datasets generated from pseudo-real Bayesian Networks from the bnlearn repository.
Figure 2: ArrowHead F1 scores by proportional sample size ($s \in \{100, 500, 1000\}$) for the fully synthetic data in §\ref{['sec:experiments']}.
Figure 3: V-structure F1 scores by proportional sample size ($s \in \{100, 500, 1000\}$) for the fully synthetic data in §\ref{['sec:experiments']}.
Figure 4: V-structure F1 for the datasets in Fig. \ref{['fig:pseudoreal']} in the main text.
Figure 5: SHD, the lower the better, for the datasets in Fig. \ref{['fig:pseudoreal']} in the main text.
...and 5 more figures

Theorems & Definitions (6)

Example 1
Definition 1
Lemma 2
Theorem 3
Lemma 2
Theorem 3

Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework

TL;DR

Abstract

Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)