Scalable Contrastive Causal Discovery under Unknown Soft Interventions

Mingxuan Zhang; Khushi Desai; Sopho Kevlishvili; Elham Azizi

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

Mingxuan Zhang, Khushi Desai, Sopho Kevlishvili, Elham Azizi

TL;DR

A scalable causal discovery model for paired observational and interventional settings with shared underlying causal structure and unknown soft interventions with improved causal structure recovery, generalization to unseen graphs with held-out causal mechanisms, and scalability to larger graphs is proposed.

Abstract

Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only a single intervention regime is observed. We propose a scalable causal discovery model for paired observational and interventional settings with shared underlying causal structure and unknown soft interventions. The model aggregates subset-level PDAGs and applies contrastive cross-regime orientation rules to construct a globally consistent maximal PDAG under Meek closure, enabling generalization to both in-distribution and out-of-distribution settings. Theoretically, we prove that our model is sound with respect to a restricted $Ψ$ equivalence class induced solely by the information available in the subset-restricted setting. We further show that the model asymptotically recovers the corresponding identifiable PDAG and can orient additional edges compared to non-contrastive subset-restricted methods. Experiments on synthetic data demonstrate improved causal structure recovery, generalization to unseen graphs with held-out causal mechanisms, and scalability to larger graphs, with ablations supporting the theoretical results.

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

TL;DR

Abstract

equivalence class induced solely by the information available in the subset-restricted setting. We further show that the model asymptotically recovers the corresponding identifiable PDAG and can orient additional edges compared to non-contrastive subset-restricted methods. Experiments on synthetic data demonstrate improved causal structure recovery, generalization to unseen graphs with held-out causal mechanisms, and scalability to larger graphs, with ablations supporting the theoretical results.

Paper Structure (107 sections, 17 theorems, 153 equations, 4 figures, 8 tables)

This paper contains 107 sections, 17 theorems, 153 equations, 4 figures, 8 tables.

Introduction
Related Work
Classic Causal Discovery
Differentiable Causal Discovery
Generalizable and scalable architectures
$\Psi$-Markov equivalence and Environment Invariance Models
Theoretical Results
Defining Restricted Equivalence Class and I-EG
Restricted $\Psi$-Sound Orientation Rules
Single Sided Invariance
Contrastive V-structure
Contrastive Discriminating Path
Global Restricted $\Psi$ I-EG Estimation
Model Architecture
Constructing marginal causal structures
...and 92 more sections

Key Result

Proposition 1

Let $S\in\mathcal{S}_{\mathrm{a}}$ with $\{i,j\}\subseteq S$ such that $i-j$ is an undirected adjacency in both population local PDAGs $E_S^{(0)}$ and $E_S^{(1)}$ (Definition def::pdag). Let $Z:=S\setminus\{i,j\}$ such that $(j, Z), (i, Z) \in \mathcal{T}_{\mathrm{a}}$ and $Z \in \mathcal{W}(j)$, $Z then $i\to j$ is invariant across the restricted $\Psi$-equivalence class and thus is oriented iden

Figures (4)

Figure 1: F1 (harmonic mean of precision and recall) and SHD (structural hamming distance) for state-of-the-art methods that only perform in distribution graph prediction on 20 nodes 20 edges. 10 graphs were generated using the polynomial mechanism. The horizontal red dashed line indicates mean across 10 graphs on predictions from running classic causal discovery baseline.
Figure 2: F1 and SHD for state-of-the-art methods compared to SCONE on predictions over 10 graphs with 20 nodes 20 edges, 20 nodes 30 edges, and 20 graphs for 50 nodes 50 edges. SEA-FCI obtains an F1 score of 0.0 in the 20 node 20 edge example, and is represented by a circle on the bottom of the axis.
Figure 3: Precision, recall, F1 and SHD for state-of-the-art methods that only perform in-distribution graph prediction on 20 nodes 20 edges. 10 graphs were generated using the polynomial mechanism. The horizontal red dashed line indicates mean across 10 graphs on predictions from running Polynomial-BIC baseline only.
Figure 4: Precision, recall, F1 and SHD for state-of-the-art methods compared to SCONE on predictions over 10 graphs with 20 nodes 20 edges, 20 nodes 30 edges and 20 graphs for 50 nodes 50 edges. SEA-FCI obtains a precision, recall and F1 score of 0.0 in the 20 node 20 edge example, and is represented by a circle on the bottom of the axis.

Theorems & Definitions (43)

Proposition 1
Proposition 2
Theorem 1: Contrastive aggregation guarantee and $G_{\mathrm{test}}$-soundness
Theorem 2: Consistency for the test-induced restricted $\Psi$ essential graph
Definition 1: Local PDAG
Definition 2: Restricted $\Psi$-Markov and Constraint Family
Definition 3: Restricted $\Psi$-equivalence
Definition 4: Test-induced $\Psi$ essential graph
Definition 5: Admissible witness family
Lemma 1: Two-regime conditional invariance as $C$-conditional independence
...and 33 more

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

TL;DR

Abstract

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (43)