Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

Victor-Alexandru Darvariu; Stephen Hailes; Mirco Musolesi

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

TL;DR

This work addresses causal discovery by reframing DAG construction as a sequential, model-based RL problem. CD-UCT uses Monte Carlo Tree Search with a cycle-aware action space to incrementally build DAGs, backed by an efficient incremental algorithm for excluding cycle-inducing edges. Empirically, CD-UCT outperforms model-free baselines like RL-BIC and greedy methods across real and synthetic datasets, scales to graphs with up to 50 nodes, and offers substantial speedups. The results advance combinatorial causal discovery by enabling deeper, more informed search in DAG space with broad applicability to discrete and continuous Bayesian networks. The approach also highlights the value of model-based planning in structured, NP-hard search problems beyond causal discovery.

Abstract

Identifying causal structure is central to many fields ranging from strategic decision-making to biology and economics. In this work, we propose CD-UCT, a model-based reinforcement learning method for causal discovery based on tree search that builds directed acyclic graphs incrementally. We also formalize and prove the correctness of an efficient algorithm for excluding edges that would introduce cycles, which enables deeper discrete search and sampling in DAG space. The proposed method can be applied broadly to causal Bayesian networks with both discrete and continuous random variables. We conduct a comprehensive evaluation on synthetic and real-world datasets, showing that CD-UCT substantially outperforms the state-of-the-art model-free reinforcement learning technique and greedy search, constituting a promising advancement for combinatorial methods.

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

TL;DR

Abstract

Paper Structure (23 sections, 2 theorems, 8 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 23 sections, 2 theorems, 8 equations, 5 figures, 5 tables, 2 algorithms.

INTRODUCTION
RELATED WORK
METHODS
Problem formulation
DAG construction as MDP
Incremental algorithm for detecting cycle-inducing edges
The CD-UCT Method
EXPERIMENTS
RESULTS
CONCLUSION
PROOF OF THEOREM \ref{['th:cyc']}
MONTE CARLO TREE SEARCH AND RELATIONSHIP TO GREEDY SEARCH AND RL-BIC
Search Problems and Shortcomings of Greedy Search
Monte Carlo Tree Search
Model-based versus Model-free: Why and How CD-UCT Outperforms RL-BIC
...and 8 more sections

Key Result

Theorem 1

Let $G_\tau$ denote a directed acyclic graph and known cycle-inducing candidate edges $\mathcal{C}_\tau$. Given that edge $e_{i,j}$ is chosen for addition at timestep $\tau$$(e_{i,j} \in E_{\tau+1})$, the set $\mathcal{C}_{\tau+1}$ is equal to $\mathcal{C}_\tau \cup \Phi_{i,j}$, where $\Phi_{i,j} =

Figures (5)

Figure 1: Left: schematic comparison of Greedy Search and CD-UCT, which build shallow and deeper trees respectively to search in DAG space. Right: we propose an incremental algorithm to exclude cycle-inducing edges. It relies on the insight that, after adding edge $A \to B$, connecting a descendant of $B$ to an ancestor of $A$ would introduce a cycle in all subsequent timesteps. An illustration of all the algorithm steps can be found in the Supplementary Material.
Figure 2: Varying the $b_\text{sims}$ simulation budget parameter on Sachs. The subfigures show the construction reward, Structural Hamming Distance (SHD) for construction and pruning, and wall clock time. CD-UCT and Random Search both outperform RL-BIC, even when given $100\times$ fewer score function evaluations.
Figure 3: Runtimes of CD-UCT with the incremental Algorithm \ref{['alg:cyc']} against a naı̈ve implementation that performs traversals to detect cycles.
Figure 4: Results with problem instances of varying graph density and number of datapoints.
Figure 5: A full illustration of the proposed algorithm for tracking cycle-inducing edges. At each timestep $\tau$, an edge (shown with a dashed line) is introduced, and the candidate edges that would connect a descendant of the endpoint to an ancestor of the starting point are added to the set $\mathcal{C}_\tau$. This eliminates the need to explicitly check for cycles. We also note that this algorithm simply determines the invalid edges, with the choice of which edge to add being left to the higher-level causal discovery method.

Theorems & Definitions (3)

Theorem 1
Theorem 1
proof

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

TL;DR

Abstract

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)