Table of Contents
Fetching ...

Adjustment Identification Distance: A gadjid for Causal Structure Learning

Leonard Henckel, Theo Würtzen, Sebastian Weichwald

TL;DR

Adjustment Identification Distance introduces gadjid, a principled framework that quantifies how learned causal graphs affect effect identification rather than merely counting edge differences. By pairing sound, complete identification strategies with verifiers, the authors define scalar distances (e.g., Parent-AID, Ancestor-AID, Oset-AID) that extend SID to CPDAGs and causal orders, with polynomial-time algorithms implemented in Rust. The approach yields interpretable, task-aligned metrics and practical runtimes suitable for large graphs, enabling more meaningful benchmarking of causal discovery methods. The framework also provides CPDAG-specific distances and cross-type (DAG/CPDAG/order) distances, offering a versatile toolkit for evaluating causal structure learning in real-world settings.

Abstract

Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop new reachability algorithms to compute the distances efficiently and to prove their low polynomial time complexity. In our package gadjid (open source at https://github.com/CausalDisco/gadjid), we provide implementations of our distances; they are orders of magnitude faster with proven lower time complexity than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.

Adjustment Identification Distance: A gadjid for Causal Structure Learning

TL;DR

Adjustment Identification Distance introduces gadjid, a principled framework that quantifies how learned causal graphs affect effect identification rather than merely counting edge differences. By pairing sound, complete identification strategies with verifiers, the authors define scalar distances (e.g., Parent-AID, Ancestor-AID, Oset-AID) that extend SID to CPDAGs and causal orders, with polynomial-time algorithms implemented in Rust. The approach yields interpretable, task-aligned metrics and practical runtimes suitable for large graphs, enabling more meaningful benchmarking of causal discovery methods. The framework also provides CPDAG-specific distances and cross-type (DAG/CPDAG/order) distances, offering a versatile toolkit for evaluating causal structure learning in real-world settings.

Abstract

Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop new reachability algorithms to compute the distances efficiently and to prove their low polynomial time complexity. In our package gadjid (open source at https://github.com/CausalDisco/gadjid), we provide implementations of our distances; they are orders of magnitude faster with proven lower time complexity than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.
Paper Structure (51 sections, 15 theorems, 14 equations, 8 figures, 5 tables, 3 algorithms)

This paper contains 51 sections, 15 theorems, 14 equations, 8 figures, 5 tables, 3 algorithms.

Key Result

Proposition 7

Let $\mathcal{I}$ be a sound and complete identification strategy for DAGs. For any DAG $\mathcal{G}_\text{true}$, it holds that if $\mathcal{G}_\text{guess}$ is a super-DAG of $\mathcal{G}_\text{true}$, then $d^{\mathcal{I}}(\mathcal{G}_\text{true},\mathcal{G}_\text{guess})=0$.

Figures (8)

  • Figure 1: A CPDAG (left) and the two DAGs in the corresponding Markov equivalence class (right).
  • Figure 2: Fully connected and chain DAG in \ref{['example: causal order']}.
  • Figure 3: Empirical results on the algorithmic time complexity of calculating the Ancestor-AID $d^{\mathcal{I}_A}$ and the Oset-AID $d^{\mathcal{I}_O}$ between random sparse and dense graphs. We project the runtime under the different time complexities based on the smallest graphs in each panel and visualize the projected runtime as a fraction of the observed empirical runtime; if the relative projected runtime increases/decreases with increasing number of nodes, the considered time complexity suggests a faster/slower increase of runtime than empirically observed. The empirical analysis suggests that our implementation of the Ancestor-AID achieves the time complexity of $O(p^2)$ for sparse and $O(p^3)$ for dense graphs, and that the implementation of the Oset-AID achieves the time complexity of $O(p^3)$ for sparse and $O(p^4)$ for dense graphs. See \ref{['app:complexityexperiment']} for details.
  • Figure 4: DAGs for \ref{['example: zero optimal distance']}.
  • Figure 5: Scatter plot for the case that $\mathcal{G}_\text{true}$ is a random $30$-node dense graph and $\mathcal{G}_\text{guess}$ is $\mathcal{G}_\text{true}$ with one edge removed.
  • ...and 3 more figures

Theorems & Definitions (39)

  • Definition 1: Identification Strategy
  • Example 2: Parent Adjustment Strategy
  • Definition 3: Verifier
  • Example 4: Adjustment-Verifier for DAGs
  • Definition 5: $\mathcal{I}$-Specific Identification Distance
  • Example 6: SID is the $\mathcal{I}_{P}$-Specific Distance for DAGs
  • Proposition 7: Distance to Super-DAG is Zero
  • Corollary 8: Identification Distances Differ from the SHD
  • Lemma 9: Ancestors are Valid Adjustment Sets
  • Lemma 10: Parent-AID Misrepresents Causal Order
  • ...and 29 more