Table of Contents
Fetching ...

Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery

Mateusz Olko, Michał Zając, Aleksandra Nowak, Nino Scherrer, Yashas Annadani, Stefan Bauer, Łukasz Kuciński, Piotr Miłoś

TL;DR

Causal discovery from observational data is often underdetermined; interventions can improve identifiability but are costly. The authors propose Gradient-based Intervention Targeting (GIT), a plug-and-play algorithm that scores potential interventions by the expected gradient magnitude of the causal-structure loss, computed using imagined interventional data from the current model. When paired with gradient-based discovery like ENCO, GIT accelerates convergence, especially in low-data settings, and outperforms mutual-information–based baselines while closely matching the oracle-like GIT-privileged decisions. The work provides theoretical convergence arguments and extensive empirical results on synthetic and real graphs, highlighting GIT’s data efficiency and robustness. Overall, GIT offers a principled, gradient-driven design for active interventions that reduces experimental burdens in causal discovery and improves practical applicability across domains.

Abstract

Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.

Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery

TL;DR

Causal discovery from observational data is often underdetermined; interventions can improve identifiability but are costly. The authors propose Gradient-based Intervention Targeting (GIT), a plug-and-play algorithm that scores potential interventions by the expected gradient magnitude of the causal-structure loss, computed using imagined interventional data from the current model. When paired with gradient-based discovery like ENCO, GIT accelerates convergence, especially in low-data settings, and outperforms mutual-information–based baselines while closely matching the oracle-like GIT-privileged decisions. The work provides theoretical convergence arguments and extensive empirical results on synthetic and real graphs, highlighting GIT’s data efficiency and robustness. Overall, GIT offers a principled, gradient-driven design for active interventions that reduces experimental burdens in causal discovery and improves practical applicability across domains.

Abstract

Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.
Paper Structure (64 sections, 11 theorems, 20 equations, 20 figures, 13 tables, 2 algorithms)

This paper contains 64 sections, 11 theorems, 20 equations, 20 figures, 13 tables, 2 algorithms.

Key Result

Proposition 1

If the causal discovery algorithm $\mathcal{A}$ is guaranteed to converge given an infinite amount of samples from each possible intervention target, then $\mathcal{A}$ with ${\epsilon}$-greedy GIT is also guaranteed to converge.

Figures (20)

  • Figure 1: Overview of GIT's usage in a gradient-based causal discovery framework. The framework infers a posterior distribution over graphs from observational and interventional data (denoted as $\mathcal{D}_{obs}$ and $\mathcal{D}_{int}$) through gradient-based optimization. The distribution over graphs and the gradient estimator $\nabla \mathcal{L}(\cdot)$ are then used by GIT in order to score the intervention targets based on the magnitude of the estimated gradients. The intervention target with the highest score is then selected, upon which the intervention is performed. New interventional data $\mathcal{D}_{int}^{new}$ are then collected and the procedure is repeated.
  • Figure 2: The distribution of SAUSHD (see equation \ref{['eq:eaushd']}), calculated using 25 seeds, for synthetic graphs (lower is better). The intense color (left-hand side of each violin plot) indicates the low data regime ($N=1056$ samples). The faded color (right-hand side of each violin plot) represents a higher amount of data ($N=3200$ samples). Note that even though the solution quality is improved when more samples are available, typically, SAUSHD is smaller in the low data regime. This is because it measures relative improvement over the random baseline, which is most visible for the small number of samples in most methods.
  • Figure 3: The distribution of SAUSHD (see equation \ref{['eq:eaushd']}), calculated using 25 seeds, for real-world graphs (lower is better). The intense color (left-hand side of each violin plot) indicates the low data regime ($N=1056$ samples). The faded color (right-hand side of each violin plot) represents a higher amount of data ($N=3200$ samples). Notice that the two plots have different scales.
  • Figure 4: The interventional target distributions obtained by different strategies on real-world data. The probability is represented by the intensity of the node's color. The green color represents the edges for which there exists a graph in the Markov Equivalence Class that has the corresponding connection reversed. The number below each graph denotes the entropy of the distribution.
  • Figure 5: Histograms of intervention targets chosen by GIT. In this experiment, a node $v$ was chosen (denoted by a red color; $v$'s parents are indicated by green). Parameters were initialized so that the model is only unsure about the neighborhood of $v$. The solid lines denote known edges and dashed ones are to be discovered.
  • ...and 15 more figures

Theorems & Definitions (22)

  • Proposition 1
  • proof
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Theorem 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • ...and 12 more