Table of Contents
Fetching ...

Explainable Reinforcement Learning via a Causal World Model

Zhongwei Yu, Jingqing Ruan, Dengpeng Xing

TL;DR

This work tackles explainable reinforcement learning by learning a causal world model that requires no prior causal structure. It combines causal discovery to identify a sparse SCM with attention-based inference networks to extract action influences, enabling causal-chain explanations that trace how actions affect states and rewards. The approach yields an AIM-like explanation mechanism while remaining accurate enough to support model-based RL, demonstrated on continuous and discrete domains and showing competitive performance relative to dense baselines. By linking explainability with policy learning through a shared causal model, the method offers faithful explanations and practical improvements for safe, interpretable RL in unknown environments.

Abstract

Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.

Explainable Reinforcement Learning via a Causal World Model

TL;DR

This work tackles explainable reinforcement learning by learning a causal world model that requires no prior causal structure. It combines causal discovery to identify a sparse SCM with attention-based inference networks to extract action influences, enabling causal-chain explanations that trace how actions affect states and rewards. The approach yields an AIM-like explanation mechanism while remaining accurate enough to support model-based RL, demonstrated on continuous and discrete domains and showing competitive performance relative to dense baselines. By linking explainability with policy learning through a shared causal model, the method offers faithful explanations and practical improvements for safe, interpretable RL in unknown environments.

Abstract

Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
Paper Structure (32 sections, 3 theorems, 38 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 3 theorems, 38 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Assume $\mathcal{G}$ is a DAG of a set of variables. Assume $\bm{x}$, $\bm{y}$, and $\bm{z}$ are disjoint subsets of variables. The following propositions hold:

Figures (8)

  • Figure 1: The illustration of causal models in the Vacuum world. (a) illustrates the Vacuum world, where $position=1$, $clean_1=True$, and $clean_2=False$. (b) and (c) respectively illustrate the causal graphs of the SCM and the AIM of the Vacuum world.
  • Figure 2: The illustration of the proposed framework. (a) shows an example of the causal graph identified by causal discovery. (b) illustrates the structure of the proposed model. (c) shows the inference network that approximates the structural equation of $s_3'$. (d) illustrates the causal chain analysis, where the causal chain is highlighted in bold and green.
  • Figure 3: The discovered causal graphs of two environments.
  • Figure 4: An example of a 4-step causal chain on Lunarlander-Continuous
  • Figure 5: An example of a 4-step causal chain on Build-Marine
  • ...and 3 more figures

Theorems & Definitions (12)

  • Definition 1: Markov Compatibility
  • Definition 2: d-separation
  • Theorem 1: d-separation Criterion
  • Definition 3: Causal Faithfulness
  • Theorem 2: Causal Discovery for Factorized MDP
  • proof
  • Definition 4: Explanan
  • Example 1: Complete Explanation for Build-Marine
  • Example 2: Minimally Complete Explanation for Build-Marine
  • Definition 5: Minimally Complete Contrastive Explanation
  • ...and 2 more