Table of Contents
Fetching ...

Graph Distance Based on Cause-Effect Estimands with Latents

Zhufeng Li, Niki Kilbertus

TL;DR

The paper tackles evaluating causal graphs under latent confounding by introducing Fixing Identification Distance (FID), an estimand-based distance for acyclic directed mixed graphs (ADMGs). FID combines a fixing-based identification strategy with a canonicalization step and a symbolic verifier to compare how two graphs imply different cause-effect estimands, such as $p(y \mid do(t))$, across a set of treatment-outcome pairs. It defines directional, normalized, and symmetrized variants, enabling assessment of causal discovery methods and robustness of downstream causal conclusions. Empirical results on simulated ADMGs show FID behaves coherently under perturbations and provides complementary information to traditional distances like SHD and SD, including extensions to CPDAGs via a range approach.

Abstract

Causal discovery aims to recover graphs that represent causal relations among given variables from observations, and new methods are constantly being proposed. Increasingly, the community raises questions about how much progress is made, because properly evaluating discovered graphs remains notoriously difficult, particularly under latent confounding. We propose a graph distance measure for acyclic directed mixed graphs (ADMGs) based on the downstream task of cause-effect estimation under unobserved confounding. Our approach uses identification via fixing and a symbolic verifier to quantify how graph differences distort cause-effect estimands for different treatment-outcome pairs. We analyze the behavior of the measure under different graph perturbations and compare it against existing distance metrics.

Graph Distance Based on Cause-Effect Estimands with Latents

TL;DR

The paper tackles evaluating causal graphs under latent confounding by introducing Fixing Identification Distance (FID), an estimand-based distance for acyclic directed mixed graphs (ADMGs). FID combines a fixing-based identification strategy with a canonicalization step and a symbolic verifier to compare how two graphs imply different cause-effect estimands, such as , across a set of treatment-outcome pairs. It defines directional, normalized, and symmetrized variants, enabling assessment of causal discovery methods and robustness of downstream causal conclusions. Empirical results on simulated ADMGs show FID behaves coherently under perturbations and provides complementary information to traditional distances like SHD and SD, including extensions to CPDAGs via a range approach.

Abstract

Causal discovery aims to recover graphs that represent causal relations among given variables from observations, and new methods are constantly being proposed. Increasingly, the community raises questions about how much progress is made, because properly evaluating discovered graphs remains notoriously difficult, particularly under latent confounding. We propose a graph distance measure for acyclic directed mixed graphs (ADMGs) based on the downstream task of cause-effect estimation under unobserved confounding. Our approach uses identification via fixing and a symbolic verifier to quantify how graph differences distort cause-effect estimands for different treatment-outcome pairs. We analyze the behavior of the measure under different graph perturbations and compare it against existing distance metrics.

Paper Structure

This paper contains 31 sections, 1 theorem, 16 equations, 6 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

For an observational distribution $p(\mathbf{x}_{\mathbf{V}})$ from a DAG model with latents $G(\mathbf{V} \cup \mathbf{L})$ and its latent projection $G(\mathbf{V})$, $T,Y \in \mathbf{V}$ different nodes, let $\mathbf{Y}^* = \mathrm{an}^{ G(\mathbf{V})_{\mathbf{V} \setminus \{T\}}}_{Y} \cup \{Y\}$ If not (i.e., there exists $\mathbf{D} \in \mathcal{D}( G(\mathbf{V})_{\mathbf{Y}^*})$ that is not

Figures (6)

  • Figure 1: Example graphs: (a) A DAG without latents. (b) A DAG with latent variable $X_4$. (c) An ADMG, the latent projection of (b). (d) The CADMG resulting from fixing node $X_3$ in (c).
  • Figure 2: Left: Given graphs $G$ and $H$ and a fixed pair of nodes $(x_1, x_2)$, identification via fixing may branch according to different valid fixing sequences and lead to different symbolic expressions for $p(x_2 \mid do(x_1))$ (e.g., expr 1, …, expr n), where also the number of expressions can differ. The red shaded nodes indicate non-fixable variables so not all possible node sequences are valid fixing sequences. Right: For a set of tuples of distinct nodes, we thus obtain a list of symbolic expressions for the corresponding causal effect with respect to the two graphs. Intuitively, our FID distance then measures the aggregate/averaged overlap in the expressions for each tuple across the two graphs.
  • Figure 3: Boxplots show $\bar{d}_{\mathcal{I}}(G,H;\mathbf{S})$ (blue) and $\bar{d}_{\mathcal{I}}(H,G;\mathbf{S})$ (teal) under different aggregations as we vary the bidirected edge counts (left), total number of edits (middle), and directed edge probability (right).
  • Figure 4: FID ($\bar{d}^{\mathrm{sym}}_{\mathcal{I}}$) distributions for the different edit types.
  • Figure 5: Comparison of our FID ($\bar{d}^{\mathrm{sym}}_{\mathcal{I}}$) with SHD and ZL-SD on MAGs. FID shows positive correlation with both, but still adds new information beyond existing metrics.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1: Richardson2023nested
  • Definition 1: fixing identification $\mathcal{I}$
  • Definition 2: Symbolic Verifier $\mathcal{V}$
  • Definition 3: Fixing Identification Distance (FID)