Table of Contents
Fetching ...

Towards Empowerment Gain through Causal Structure Learning in Model-Based RL

Hongye Cao, Fan Feng, Meng Fang, Shaokang Dong, Tianpei Yang, Jing Huo, Yang Gao

TL;DR

The paper addresses inefficiencies in model-based RL caused by spurious correlations by proposing Empowerment through Causal Learning (ECL), a framework that actively leverages causal structure to maximize empowerment and guide exploration. ECL combines three components: learning a causal dynamics model with a causal mask, empowerment-driven exploration to refine the causal structure, and policy learning with a curiosity-based reward to balance causality and task objectives. It is method-agnostic with respect to causal discovery, and is evaluated across six environments—including pixel-based tasks—showing superior causal discovery accuracy, sample efficiency, and asymptotic performance compared to baselines. The work demonstrates that integrating empowerment with active causal structure discovery yields robust, generalizable control, and it provides a scalable approach for improving downstream RL policies in complex environments.

Abstract

In Model-Based Reinforcement Learning (MBRL), incorporating causal structures into dynamics models provides agents with a structured understanding of the environments, enabling efficient decision. Empowerment as an intrinsic motivation enhances the ability of agents to actively control their environments by maximizing the mutual information between future states and actions. We posit that empowerment coupled with causal understanding can improve controllability, while enhanced empowerment gain can further facilitate causal reasoning in MBRL. To improve learning efficiency and controllability, we propose a novel framework, Empowerment through Causal Learning (ECL), where an agent with the awareness of causal dynamics models achieves empowerment-driven exploration and optimizes its causal structure for task learning. Specifically, ECL operates by first training a causal dynamics model of the environment based on collected data. We then maximize empowerment under the causal structure for exploration, simultaneously using data gathered through exploration to update causal dynamics model to be more controllable than dense dynamics model without causal structure. In downstream task learning, an intrinsic curiosity reward is included to balance the causality, mitigating overfitting. Importantly, ECL is method-agnostic and is capable of integrating various causal discovery methods. We evaluate ECL combined with 3 causal discovery methods across 6 environments including pixel-based tasks, demonstrating its superior performance compared to other causal MBRL methods, in terms of causal discovery, sample efficiency, and asymptotic performance.

Towards Empowerment Gain through Causal Structure Learning in Model-Based RL

TL;DR

The paper addresses inefficiencies in model-based RL caused by spurious correlations by proposing Empowerment through Causal Learning (ECL), a framework that actively leverages causal structure to maximize empowerment and guide exploration. ECL combines three components: learning a causal dynamics model with a causal mask, empowerment-driven exploration to refine the causal structure, and policy learning with a curiosity-based reward to balance causality and task objectives. It is method-agnostic with respect to causal discovery, and is evaluated across six environments—including pixel-based tasks—showing superior causal discovery accuracy, sample efficiency, and asymptotic performance compared to baselines. The work demonstrates that integrating empowerment with active causal structure discovery yields robust, generalizable control, and it provides a scalable approach for improving downstream RL policies in complex environments.

Abstract

In Model-Based Reinforcement Learning (MBRL), incorporating causal structures into dynamics models provides agents with a structured understanding of the environments, enabling efficient decision. Empowerment as an intrinsic motivation enhances the ability of agents to actively control their environments by maximizing the mutual information between future states and actions. We posit that empowerment coupled with causal understanding can improve controllability, while enhanced empowerment gain can further facilitate causal reasoning in MBRL. To improve learning efficiency and controllability, we propose a novel framework, Empowerment through Causal Learning (ECL), where an agent with the awareness of causal dynamics models achieves empowerment-driven exploration and optimizes its causal structure for task learning. Specifically, ECL operates by first training a causal dynamics model of the environment based on collected data. We then maximize empowerment under the causal structure for exploration, simultaneously using data gathered through exploration to update causal dynamics model to be more controllable than dense dynamics model without causal structure. In downstream task learning, an intrinsic curiosity reward is included to balance the causality, mitigating overfitting. Importantly, ECL is method-agnostic and is capable of integrating various causal discovery methods. We evaluate ECL combined with 3 causal discovery methods across 6 environments including pixel-based tasks, demonstrating its superior performance compared to other causal MBRL methods, in terms of causal discovery, sample efficiency, and asymptotic performance.

Paper Structure

This paper contains 74 sections, 18 equations, 31 figures, 6 tables, 1 algorithm.

Figures (31)

  • Figure 1: (a). An example of a robot manipulation task with three trajectories and three nodes: one target node (movable) and two noisy nodes (one movable, one unmovable). (b). Underlying causal structures with a factored MDP. Different nodes represent different dimensional states and actions.
  • Figure 2: The framework overview of ECL. Gold lines: model learning. Blue lines: model optimization alternating with empowerment-driven exploration (yellow lines). Green lines: policy learning.
  • Figure 3: The task learning of episodic reward in three environments of ECL-Con (ECL-C) and ECL-Sco (ECL-S).
  • Figure 4: The learning curves of episodic reward in three different environments and the shadow is the standard error.
  • Figure 5: Success rate in collider and manipulation environments and the shadow is the standard error.
  • ...and 26 more figures