Table of Contents
Fetching ...

Causal Discovery under Off-Target Interventions

Davin Choo, Kirankumar Shiragur, Caroline Uhler

TL;DR

The paper studies causal graph discovery under stochastic off-target interventions, where each attempted intervention on action $A_i$ affects a random subset drawn from $\\mathcal{D}_i$ with the goal of minimizing interventions. It proves a formal equivalence between off-target verification and stochastic set cover, enabling a polynomial-time adaptive policy with cost $O(\\overline{\\nu}(G^*) \\log n)$ in expectation for verification and establishing NP-hardness for better-than-$\\log n$ approximations; for search, it shows a hardness barrier and provides a polylogarithmic-approximation algorithm against $\\overline{\\nu}^{\\max}(G^*)$ with total cost $O(\\overline{\\nu}^{\\max}(G^*) \\log^4 n)$. The OffTargetSearch algorithm exploits 1/2-clique separators and recursive partitioning to orient edges within the MEC, running in polynomial time while competing against the max benchmark. Empirical results on synthetic and real graphs corroborate the theory, demonstrating competitive performance under various off-target distributions. The work lays a theoretical foundation for causal discovery with off-target interventions and outlines avenues for extending guarantees to unknown distributions and finite-sample settings.

Abstract

Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions with the natural goal of minimizing the number of interventions performed. We propose the following stochastic intervention model which subsumes existing adaptive noiseless interventions in the literature while capturing scenarios such as fat-hand interventions and CRISPR gene knockouts: any intervention attempt results in an actual intervention on a random subset of vertices, drawn from a distribution dependent on attempted action. Under this model, we study the two fundamental problems in causal discovery of verification and search and provide approximation algorithms with polylogarithmic competitive ratios and provide some preliminary experimental results.

Causal Discovery under Off-Target Interventions

TL;DR

The paper studies causal graph discovery under stochastic off-target interventions, where each attempted intervention on action affects a random subset drawn from with the goal of minimizing interventions. It proves a formal equivalence between off-target verification and stochastic set cover, enabling a polynomial-time adaptive policy with cost in expectation for verification and establishing NP-hardness for better-than- approximations; for search, it shows a hardness barrier and provides a polylogarithmic-approximation algorithm against with total cost . The OffTargetSearch algorithm exploits 1/2-clique separators and recursive partitioning to orient edges within the MEC, running in polynomial time while competing against the max benchmark. Empirical results on synthetic and real graphs corroborate the theory, demonstrating competitive performance under various off-target distributions. The work lays a theoretical foundation for causal discovery with off-target interventions and outlines avenues for extending guarantees to unknown distributions and finite-sample settings.

Abstract

Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions with the natural goal of minimizing the number of interventions performed. We propose the following stochastic intervention model which subsumes existing adaptive noiseless interventions in the literature while capturing scenarios such as fat-hand interventions and CRISPR gene knockouts: any intervention attempt results in an actual intervention on a random subset of vertices, drawn from a distribution dependent on attempted action. Under this model, we study the two fundamental problems in causal discovery of verification and search and provide approximation algorithms with polylogarithmic competitive ratios and provide some preliminary experimental results.
Paper Structure (30 sections, 26 theorems, 4 equations, 9 figures, 4 algorithms)

This paper contains 30 sections, 26 theorems, 4 equations, 9 figures, 4 algorithms.

Key Result

Theorem 2

Any verifying set of a DAG $G$ must cut all the covered edges.

Figures (9)

  • Figure 1: A moral DAG $G^*$ on $n=9$ nodes illustrating graphical concepts such as $\mathcal{E}(G^*)$, $\textrm{skel}(G^*)$, $C(G^*)$, 1/2-clique separators, and large chain components. The essential graph $\mathcal{E}(G^*)$ has a single chain component $H$ with $K_H$ as one possible 1/2-clique separator of $H$. Suppose we oriented $d \sim g$ through an intervention on $\{g\}$ due to action stochasticity in \ref{['fig:off-target-intervention']}. The resulting chain component $L_H$ after intervening on $\{g\}$ is large since $|V(L_H)| = |\{a,b,c,d,e,f,h,i\}| = 8 > n/2$. PerformPartitioning (\ref{['alg:perform-partitioning']}) breaks $L_H$ up by trying to intervene within $L_H[V(L_H) \cap N({\color{red}u_H})]$, where $d \equiv {\color{red}u_H}$ is the unique vertex from $K_H$ within $V({\color{red}K_H}) \cap V({\color{red}L_H})$. There are two chain components $H'_1$ and $H'_2$ in the induced subgraph $L_H[V(L_H) \cap N({\color{red}u_H})]$. Since $f$ is not a neighbor of $d$, the vertex $f$ is not part of $H'_2$. If we pick 1/2-clique separators $Z'_1 = \{b,c\}$ and $Z'_2 = \{e,h\}$ for $H'_1$ and $H'_2$, then $c \equiv {\color{purple}z'_1}$ and $e \equiv {\color{purple}z'_2}$ are the sources of $Z'_1$ and $Z'_2$ respectively.
  • Figure 2: Illustrating the difference between cutting an edge and intervening on one of the endpoints of that edge on the moral graph $u \sim v \sim w \sim u$. In both examples, the non-atomic intervention $\{v,w\}$ cuts the edges $u \sim v$ and $u \sim w$ but not $v \sim w$. In the first case, $v \sim w$ remains unoriented. In the second case, $v \sim w$ is oriented due to Meek rules.
  • Figure 3: An illustration of the four Meek rules
  • Figure 4: A star graph with $v_n$ as the center.
  • Figure 5: Recall the moral DAG $G^*$ on $n=9$ nodes given in \ref{['fig:toy-example']}. Suppose we oriented $d \sim g$ through an intervention on $\{g\}$ due to action stochasticity in \ref{['fig:appendix-off-target-intervention']}. The resulting chain component $L_H$ after intervening on $\{g\}$ is large since $|V(L_H)| = |\{a,b,c,d,e,f,h,i\}| = 8 > n/2$. PerformPartitioning (\ref{['alg:perform-partitioning']}) breaks $L_H$ up by trying to intervene within $L_H[V(L_H) \cap N({\color{red}u_H})]$, where $d \equiv {\color{red}u_H}$ is the unique vertex from $K_H$ within $V({\color{red}K_H}) \cap V({\color{red}L_H})$. There are two chain components $H'_1$ and $H'_2$ in the induced subgraph $L_H[V(L_H) \cap N({\color{red}u_H})]$. Let us focus our remaining discussion on $H'_1$. If we pick 1/2-clique separator $Z'_1 = \{b,c\}$ for $H'_1$, then $c \equiv {\color{purple}z'_1}$ is the source of $Z'_1$. Now, suppose in \ref{['fig:appendix-off-target-intervention2']} while trying to orient $b \sim c$, we intervened on $\{a,c\}$. Then, red edges $\{a \to b, a \to d, c \to b, c \to d\}$ will be cut and oriented which then triggers Meek R2 to orient the blue edges $\{d \to e, d \to h, d \to i, e \to f\}$. \ref{['fig:appendix-off-target-intervention3']} illustrates the resulting chain component without the newly oriented edges. Observe that $u_H \not\sim z_{H'_1} \equiv d \not\sim c$ as expected while $u_H$ is still connected to a chain compoment $C = \{b\} \subseteq V(H'_1)$. As proven in \ref{['lem:perform-partitioning-iterations']}, we have $|V(C)| = |\{b\}| = 1 \leq 1.5 = |V(H')|/2$. Although $H'_1$ now has two components $\{b\}$ and $\{a,c\}$, Line 9 of PerformPartitioning will restrict $H'_1$ to just $\{b\}$ going forward.
  • ...and 4 more figures

Theorems & Definitions (42)

  • Definition 1: Minimum size/cost verifying set and verification number/cost
  • Theorem 2: Theorem 9 of choo2022verification
  • Theorem 3: Proposition 16 of hauser2012characterization, Theorem 7 of choo2023subset
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Lemma 7
  • proof
  • Lemma 7
  • proof
  • ...and 32 more