Causal Discovery under Off-Target Interventions

Davin Choo; Kirankumar Shiragur; Caroline Uhler

Causal Discovery under Off-Target Interventions

Davin Choo, Kirankumar Shiragur, Caroline Uhler

TL;DR

The paper studies causal graph discovery under stochastic off-target interventions, where each attempted intervention on action $A_i$ affects a random subset drawn from $\\mathcal{D}_i$ with the goal of minimizing interventions. It proves a formal equivalence between off-target verification and stochastic set cover, enabling a polynomial-time adaptive policy with cost $O(\\overline{\\nu}(G^*) \\log n)$ in expectation for verification and establishing NP-hardness for better-than-$\\log n$ approximations; for search, it shows a hardness barrier and provides a polylogarithmic-approximation algorithm against $\\overline{\\nu}^{\\max}(G^*)$ with total cost $O(\\overline{\\nu}^{\\max}(G^*) \\log^4 n)$. The OffTargetSearch algorithm exploits 1/2-clique separators and recursive partitioning to orient edges within the MEC, running in polynomial time while competing against the max benchmark. Empirical results on synthetic and real graphs corroborate the theory, demonstrating competitive performance under various off-target distributions. The work lays a theoretical foundation for causal discovery with off-target interventions and outlines avenues for extending guarantees to unknown distributions and finite-sample settings.

Abstract

Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions with the natural goal of minimizing the number of interventions performed. We propose the following stochastic intervention model which subsumes existing adaptive noiseless interventions in the literature while capturing scenarios such as fat-hand interventions and CRISPR gene knockouts: any intervention attempt results in an actual intervention on a random subset of vertices, drawn from a distribution dependent on attempted action. Under this model, we study the two fundamental problems in causal discovery of verification and search and provide approximation algorithms with polylogarithmic competitive ratios and provide some preliminary experimental results.

Causal Discovery under Off-Target Interventions

TL;DR

The paper studies causal graph discovery under stochastic off-target interventions, where each attempted intervention on action

affects a random subset drawn from

with the goal of minimizing interventions. It proves a formal equivalence between off-target verification and stochastic set cover, enabling a polynomial-time adaptive policy with cost

in expectation for verification and establishing NP-hardness for better-than-

approximations; for search, it shows a hardness barrier and provides a polylogarithmic-approximation algorithm against

with total cost

. The OffTargetSearch algorithm exploits 1/2-clique separators and recursive partitioning to orient edges within the MEC, running in polynomial time while competing against the max benchmark. Empirical results on synthetic and real graphs corroborate the theory, demonstrating competitive performance under various off-target distributions. The work lays a theoretical foundation for causal discovery with off-target interventions and outlines avenues for extending guarantees to unknown distributions and finite-sample settings.

Abstract

Paper Structure (30 sections, 26 theorems, 4 equations, 9 figures, 4 algorithms)

This paper contains 30 sections, 26 theorems, 4 equations, 9 figures, 4 algorithms.

Introduction
Our off-target intervention model
Our contributions
Preliminaries and related work
Results
Verification
Search
Why compare against the max?
Remark (What if a covered edge is never cut?)
A search algorithm with polylogarithmic approximation to the max
Experiments
Conclusion and discussion
Augmenting the preliminaries
Unknown off-target distributions
Meek rules
...and 15 more sections

Key Result

Theorem 2

Any verifying set of a DAG $G$ must cut all the covered edges.

Figures (9)

Figure 1: A moral DAG $G^*$ on $n=9$ nodes illustrating graphical concepts such as $\mathcal{E}(G^*)$, $\textrm{skel}(G^*)$, $C(G^*)$, 1/2-clique separators, and large chain components. The essential graph $\mathcal{E}(G^*)$ has a single chain component $H$ with $K_H$ as one possible 1/2-clique separator of $H$. Suppose we oriented $d \sim g$ through an intervention on $\{g\}$ due to action stochasticity in \ref{['fig:off-target-intervention']}. The resulting chain component $L_H$ after intervening on $\{g\}$ is large since $|V(L_H)| = |\{a,b,c,d,e,f,h,i\}| = 8 > n/2$. PerformPartitioning (\ref{['alg:perform-partitioning']}) breaks $L_H$ up by trying to intervene within $L_H[V(L_H) \cap N({\color{red}u_H})]$, where $d \equiv {\color{red}u_H}$ is the unique vertex from $K_H$ within $V({\color{red}K_H}) \cap V({\color{red}L_H})$. There are two chain components $H'_1$ and $H'_2$ in the induced subgraph $L_H[V(L_H) \cap N({\color{red}u_H})]$. Since $f$ is not a neighbor of $d$, the vertex $f$ is not part of $H'_2$. If we pick 1/2-clique separators $Z'_1 = \{b,c\}$ and $Z'_2 = \{e,h\}$ for $H'_1$ and $H'_2$, then $c \equiv {\color{purple}z'_1}$ and $e \equiv {\color{purple}z'_2}$ are the sources of $Z'_1$ and $Z'_2$ respectively.
Figure 2: Illustrating the difference between cutting an edge and intervening on one of the endpoints of that edge on the moral graph $u \sim v \sim w \sim u$. In both examples, the non-atomic intervention $\{v,w\}$ cuts the edges $u \sim v$ and $u \sim w$ but not $v \sim w$. In the first case, $v \sim w$ remains unoriented. In the second case, $v \sim w$ is oriented due to Meek rules.
Figure 3: An illustration of the four Meek rules
Figure 4: A star graph with $v_n$ as the center.
Figure 5: Recall the moral DAG $G^*$ on $n=9$ nodes given in \ref{['fig:toy-example']}. Suppose we oriented $d \sim g$ through an intervention on $\{g\}$ due to action stochasticity in \ref{['fig:appendix-off-target-intervention']}. The resulting chain component $L_H$ after intervening on $\{g\}$ is large since $|V(L_H)| = |\{a,b,c,d,e,f,h,i\}| = 8 > n/2$. PerformPartitioning (\ref{['alg:perform-partitioning']}) breaks $L_H$ up by trying to intervene within $L_H[V(L_H) \cap N({\color{red}u_H})]$, where $d \equiv {\color{red}u_H}$ is the unique vertex from $K_H$ within $V({\color{red}K_H}) \cap V({\color{red}L_H})$. There are two chain components $H'_1$ and $H'_2$ in the induced subgraph $L_H[V(L_H) \cap N({\color{red}u_H})]$. Let us focus our remaining discussion on $H'_1$. If we pick 1/2-clique separator $Z'_1 = \{b,c\}$ for $H'_1$, then $c \equiv {\color{purple}z'_1}$ is the source of $Z'_1$. Now, suppose in \ref{['fig:appendix-off-target-intervention2']} while trying to orient $b \sim c$, we intervened on $\{a,c\}$. Then, red edges $\{a \to b, a \to d, c \to b, c \to d\}$ will be cut and oriented which then triggers Meek R2 to orient the blue edges $\{d \to e, d \to h, d \to i, e \to f\}$. \ref{['fig:appendix-off-target-intervention3']} illustrates the resulting chain component without the newly oriented edges. Observe that $u_H \not\sim z_{H'_1} \equiv d \not\sim c$ as expected while $u_H$ is still connected to a chain compoment $C = \{b\} \subseteq V(H'_1)$. As proven in \ref{['lem:perform-partitioning-iterations']}, we have $|V(C)| = |\{b\}| = 1 \leq 1.5 = |V(H')|/2$. Although $H'_1$ now has two components $\{b\}$ and $\{a,c\}$, Line 9 of PerformPartitioning will restrict $H'_1$ to just $\{b\}$ going forward.
...and 4 more figures

Theorems & Definitions (42)

Definition 1: Minimum size/cost verifying set and verification number/cost
Theorem 2: Theorem 9 of choo2022verification
Theorem 3: Proposition 16 of hauser2012characterization, Theorem 7 of choo2023subset
Theorem 5
Theorem 6
Theorem 7
Lemma 7
proof
Lemma 7
proof
...and 32 more

Causal Discovery under Off-Target Interventions

TL;DR

Abstract

Causal Discovery under Off-Target Interventions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (42)