CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

Andreas W. M. Sauter; Nicolò Botteghi; Erman Acar; Aske Plaat

CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

Andreas W. M. Sauter, Nicolò Botteghi, Erman Acar, Aske Plaat

TL;DR

CORE formalizes causal discovery with interventions as a partially observable Markov decision process and learns a dual-branch Q-learning policy to jointly identify causal graphs and select informative interventions. It demonstrates strong generalization to unseen structures up to 10 variables and achieves high sample efficiency with per-graph inference times in the millisecond range. The approach outperforms the prior state-of-the-art on structure estimation while highlighting the importance of jointly learning interventions; it also discusses real-world applicability and limitations related to function class and confounding. Overall, CORE represents a scalable, data-efficient framework for active causal discovery that leverages reinforcement learning to plan interventions and reconstruct causal graphs, with potential impact on automated CD in complex domains.

Abstract

Causal discovery is the challenging task of inferring causal structure from data. Motivated by Pearl's Causal Hierarchy (PCH), which tells us that passive observations alone are not enough to distinguish correlation from causation, there has been a recent push to incorporate interventions into machine learning research. Reinforcement learning provides a convenient framework for such an active approach to learning. This paper presents CORE, a deep reinforcement learning-based approach for causal discovery and intervention planning. CORE learns to sequentially reconstruct causal graphs from data while learning to perform informative interventions. Our results demonstrate that CORE generalizes to unseen graphs and efficiently uncovers causal structures. Furthermore, CORE scales to larger graphs with up to 10 variables and outperforms existing approaches in structure estimation accuracy and sample efficiency. All relevant code and supplementary material can be found at https://github.com/sa-and/CORE

CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

TL;DR

Abstract

Paper Structure (37 sections, 13 equations, 4 figures, 3 tables)

This paper contains 37 sections, 13 equations, 4 figures, 3 tables.

Introduction and Related Work
Preliminaries and Notation
Causal Models
Interventions
Reinforcement Learning
POMDP
RL:
Deep Q-Learning:
Learning a Causal Discovery Policy with Informative Interventions
POMDP Formulation of Causal Discovery Through Interventions
State Space:
Action Space:
Transition Dynamics:
Observations:
State Representation:
...and 22 more sections

Figures (4)

Figure 1: A simple graphical illustration of a (hard) intervention. Given the causal graph $G$ with endogenous variables $\mathcal{X}=\{X, Y, Z\}$ and the corresponding noise variables $\mathcal{U}=\{U_X, U_Y, U_Z\}$, intervening on variable $X$ (i.e., $do(X=x)$) results in modifying $G$ into $G'$ by pruning the incoming edges to node $X$ and assigning the value $x$.
Figure 2: Overview of COREs training setup (right) and a minimal example of the transition dynamics for an SCM with two endogenous variables (left). At each step, the agent picks the intervention/structural actions according to an $\epsilon$-greedy policy on $Q_{in}$ and $Q_{st}$ respectively. The intervention is applied to the SCM $M^i$ leading to a post-interventional distribution $P_{M_{do(.)}}$ from which an observation is sampled. The agent receives a reward based on the structure action and the induced graph of $M_{do(\emptyset)}^i$. The observation is added to the history of observations and serves as input to the agent. At the beginning of each episode a new $M^j$ is drawn from the training set and the observation history is cleared.
Figure 3: Two examples of how the learned CORE policy estimates the causal structure of two unseen SCMs described in Equations \ref{['eq:scm1']} and \ref{['eq:scm2']}. Green elements indicate intervention (do (c = 20)) and structural update (adding an edge) in the current step, respectively. The red arrow indicates the deletion of an edge.
Figure 4: Plot of the average SHD on the test set (lower is better). We present the means over three training runs of CORE with random interventions (blue) and when jointly learning an intervention policy (red) over graphs with 4 variables.

CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

TL;DR

Abstract

CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)