Table of Contents
Fetching ...

MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery

Dong Li, Zhengzhang Chen, Xujiang Zhao, Linlin Yu, Zhong Chen, Yi He, Haifeng Chen, Chen Zhao

Abstract

Uncovering causal structures from observational data is crucial for understanding complex systems and making informed decisions. While reinforcement learning (RL) has shown promise in identifying these structures in the form of a directed acyclic graph (DAG), existing methods often lack efficiency, making them unsuitable for online applications. In this paper, we propose MARLIN, an efficient multi agent RL based approach for incremental DAG learning. MARLIN uses a DAG generation policy that maps a continuous real valued space to the DAG space as an intra batch strategy, then incorporates two RL agents state specific and state invariant to uncover causal relationships and integrates these agents into an incremental learning framework. Furthermore, the framework leverages a factored action space to enhance parallelization efficiency. Extensive experiments on synthetic and real datasets demonstrate that MARLIN outperforms state of the art methods in terms of both efficiency and effectiveness.

MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery

Abstract

Uncovering causal structures from observational data is crucial for understanding complex systems and making informed decisions. While reinforcement learning (RL) has shown promise in identifying these structures in the form of a directed acyclic graph (DAG), existing methods often lack efficiency, making them unsuitable for online applications. In this paper, we propose MARLIN, an efficient multi agent RL based approach for incremental DAG learning. MARLIN uses a DAG generation policy that maps a continuous real valued space to the DAG space as an intra batch strategy, then incorporates two RL agents state specific and state invariant to uncover causal relationships and integrates these agents into an incremental learning framework. Furthermore, the framework leverages a factored action space to enhance parallelization efficiency. Extensive experiments on synthetic and real datasets demonstrate that MARLIN outperforms state of the art methods in terms of both efficiency and effectiveness.
Paper Structure (21 sections, 9 equations, 4 figures, 3 tables)

This paper contains 21 sections, 9 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison of the learning processes of offline (top) and online (bottom) RL-based DAG learning methods in an online data stream. The gradient colors on the RL agent represent its learning progress, with white indicating the initial state. When the color aligns with a data batch, it signifies that the agent has learned the current causal mechanism. Instead of learning from scratch, online DAG learning needs to incrementally and efficiently adapt to continuously arriving data batches and non-stationary data distributions.
  • Figure 2: (a) The pipeline of MARLIN across three consecutive system states. For each batch, MARLIN learns the DAG using the intra-batch single-step RL algorithm, which includes state-specific and state-invariant RL agents optimizing their policies through an actor-critic approach. Detailed network architecture and variables are explained in Section \ref{['sec:incremental_learning']}. MARLIN facilitates efficient incremental DAG learning by disentangling state-specific and state-invariant causal relationships.
  • Figure 3: Average performance of DAG learning across all states on synthetic Linear-Gaussian datasets, varying (a) DAG scale ($d=\{20,50,100\}$) and (b) transition noise rate ($e=\{0,1,5\}$) for MARLIN and other baselines. The shaded area represents the standard deviation; $\uparrow$ indicates that higher values are better, while $\downarrow$ indicates that lower values are better.
  • Figure 4: Overall performance on the SWaT dataset across (a) PR@$K$, (b) AP@$K$, and (c) MRR metrics.