Table of Contents
Fetching ...

Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data

Gongxu Luo, Loka Li, Guangyi Chen, Haoyue Dai, Kun Zhang

TL;DR

The paper tackles post-treatment selection as a core obstacle in interventional causal discovery with latent confounders. It extends the DAG framework with latent variables and a selection mechanism, introducing Augmented DAGs and a fine-grained FI-Markov equivalence concept, represented compactly by the F-PAG. A provably sound and complete algorithm, F-FCI, then identifies causal relations, latent confounders, and post-treatment selection from observational and interventional data. Empirical results on synthetic and real-world data show that the method recovers true causal structure where traditional approaches fail, including distinguishing selection effects from genuine causation. This work broadens the applicability of interventional causal discovery to settings with post-intervention quality-control constraints common in biological studies and clinical research.

Abstract

Interventional causal discovery seeks to identify causal relations by leveraging distributional changes introduced by interventions, even in the presence of latent confounders. Beyond the spurious dependencies induced by latent confounders, we highlight a common yet often overlooked challenge in the problem due to post-treatment selection, in which samples are selectively included in datasets after interventions. This fundamental challenge widely exists in biological studies; for example, in gene expression analysis, both observational and interventional samples are retained only if they meet quality control criteria (e.g., highly active cells). Neglecting post-treatment selection may introduce spurious dependencies and distributional changes under interventions, which can mimic causal responses, thereby distorting causal discovery results and challenging existing causal formulations. To address this, we introduce a novel causal formulation that explicitly models post-treatment selection and reveals how its differential reactions to interventions can distinguish causal relations from selection patterns, allowing us to go beyond traditional equivalence classes toward the underlying true causal structure. We then characterize its Markov properties and propose a Fine-grained Interventional equivalence class, named FI-Markov equivalence, represented by a new graphical diagram, F-PAG. Finally, we develop a provably sound and complete algorithm, F-FCI, to identify causal relations, latent confounders, and post-treatment selection up to $\mathcal{FI}$-Markov equivalence, using both observational and interventional data. Experimental results on synthetic and real-world datasets demonstrate that our method recovers causal relations despite the presence of both selection and latent confounders.

Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data

TL;DR

The paper tackles post-treatment selection as a core obstacle in interventional causal discovery with latent confounders. It extends the DAG framework with latent variables and a selection mechanism, introducing Augmented DAGs and a fine-grained FI-Markov equivalence concept, represented compactly by the F-PAG. A provably sound and complete algorithm, F-FCI, then identifies causal relations, latent confounders, and post-treatment selection from observational and interventional data. Empirical results on synthetic and real-world data show that the method recovers true causal structure where traditional approaches fail, including distinguishing selection effects from genuine causation. This work broadens the applicability of interventional causal discovery to settings with post-intervention quality-control constraints common in biological studies and clinical research.

Abstract

Interventional causal discovery seeks to identify causal relations by leveraging distributional changes introduced by interventions, even in the presence of latent confounders. Beyond the spurious dependencies induced by latent confounders, we highlight a common yet often overlooked challenge in the problem due to post-treatment selection, in which samples are selectively included in datasets after interventions. This fundamental challenge widely exists in biological studies; for example, in gene expression analysis, both observational and interventional samples are retained only if they meet quality control criteria (e.g., highly active cells). Neglecting post-treatment selection may introduce spurious dependencies and distributional changes under interventions, which can mimic causal responses, thereby distorting causal discovery results and challenging existing causal formulations. To address this, we introduce a novel causal formulation that explicitly models post-treatment selection and reveals how its differential reactions to interventions can distinguish causal relations from selection patterns, allowing us to go beyond traditional equivalence classes toward the underlying true causal structure. We then characterize its Markov properties and propose a Fine-grained Interventional equivalence class, named FI-Markov equivalence, represented by a new graphical diagram, F-PAG. Finally, we develop a provably sound and complete algorithm, F-FCI, to identify causal relations, latent confounders, and post-treatment selection up to -Markov equivalence, using both observational and interventional data. Experimental results on synthetic and real-world datasets demonstrate that our method recovers causal relations despite the presence of both selection and latent confounders.

Paper Structure

This paper contains 29 sections, 13 theorems, 1 equation, 13 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

For positive interventional distributions $p^{(k)}(X)$ and observational distribution $p^{(0)}(X)$ generated from the DAG $\mathcal{G}$ in the presence of latent confounders $L$ and selection $S$ with intervention targets $\{I^{(k)}\}_{k\in \{0\} \cup [K]}$, let $\{Aug_{I_K}(\mathcal{G})\}_{k\in \{0

Figures (13)

  • Figure 1: Motivation examples. (a) $\&$ (b) exhibit same dependence with tails from $X_1$ and arrowheads into $X_2$, regardless of direct causation; (c) $\&$ (d) exhibit same dependence with tails on both $X_1$ and $X_2$, regardless of direct selection. Existing methods cannot distinguish these cases, whereas ours can.
  • Figure 2: Examples of graphical representations. (a) Augmented DAG with explicit intervention indicators ($\psi$). (b) Extension of the augmented DAG to include latent confounders. (c) Modeling post-treatment selection using the augmented DAG, with toy examples of selection on observational data (d) and selection after intervention (e), where the positively invariant p($X_2|X_1$) is marked in red.
  • Figure 3: Illustration of a structural causal model (SCM) represented by an augmented DAG.
  • Figure 4: haracterizing Markov properties with CI patterns (i) of augmented DAGs (a)-(h). Red dashed lines indicate that CI patterns persist regardless of whether $X_1 \textcolor{red}{\dashrightarrow} X_2$ (b) or $X_1 \textcolor{red}{\dashrightarrow} S \textcolor{red}{\dashleftarrow} X_2$ (f), whereas $\psi_3 \not\!\perp\!\!\!\perp X_2$ provides evidence of the presence of a direct causal link or selection.
  • Figure 5: Illustrations of $\mathcal{F}$-PAG graphical representation for the $\mathcal{FI}$-Markov equivalence class.
  • ...and 8 more figures

Theorems & Definitions (30)

  • Definition 1: Augmented DAG
  • Theorem 1: CI and invariance implementation
  • Lemma 1: Additional dependencies induced by selections
  • Definition 2: $\mathcal{FI}$-Markov equivalence
  • Definition 3: Inducing path
  • Lemma 2: When are two variables dependent in observational data?
  • Lemma 3: When does intervention always alter marginal distribution?
  • Lemma 4: When does intervention always alter conditional distribution?
  • Theorem 2: Graphical criteria for $\mathcal{FI}$-Markov equivalence
  • Definition 4: Partial ancestral graph
  • ...and 20 more