Table of Contents
Fetching ...

A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference

Giovanni Briglia, Stefano Mariani, Franco Zambonelli

TL;DR

This work tackles the challenge of transferring causal reasoning to multi-agent reinforcement learning by proposing Causality-Driven Reinforcement Learning (CDRL), which learns a minimal Structural Causal Model and uses $do$-based causal inference to filter actions. The approach is algorithm-agnostic and aims to improve policy efficacy, learning efficiency, and safety across MARL tasks with partial observability and continuous spaces. Empirical results across navigation, flocking, and give-way scenarios show conditional gains and notable failures, underscoring the importance of cooperation structure and the need for sophisticated causal discovery and collaborative learning among agents. The paper further maps a roadmap for advancing causal MARL, including informative interventions, robust model assessment, soft interventions for continuous domains, and formal convergence considerations.

Abstract

Causal reasoning is increasingly used in Reinforcement Learning (RL) to improve the learning process in several dimensions: efficacy of learned policies, efficiency of convergence, generalisation capabilities, safety and interpretability of behaviour. However, applications of causal reasoning to Multi-Agent RL (MARL) are still mostly unexplored. In this paper, we take the first step in investigating the opportunities and challenges of applying causal reasoning in MARL. We measure the impact of a simple form of causal augmentation in state-of-the-art MARL scenarios increasingly requiring cooperation, and with state-of-the-art MARL algorithms exploiting various degrees of collaboration between agents. Then, we discuss the positive as well as negative results achieved, giving us the chance to outline the areas where further research may help to successfully transfer causal RL to the multi-agent setting.

A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference

TL;DR

This work tackles the challenge of transferring causal reasoning to multi-agent reinforcement learning by proposing Causality-Driven Reinforcement Learning (CDRL), which learns a minimal Structural Causal Model and uses -based causal inference to filter actions. The approach is algorithm-agnostic and aims to improve policy efficacy, learning efficiency, and safety across MARL tasks with partial observability and continuous spaces. Empirical results across navigation, flocking, and give-way scenarios show conditional gains and notable failures, underscoring the importance of cooperation structure and the need for sophisticated causal discovery and collaborative learning among agents. The paper further maps a roadmap for advancing causal MARL, including informative interventions, robust model assessment, soft interventions for continuous domains, and formal convergence considerations.

Abstract

Causal reasoning is increasingly used in Reinforcement Learning (RL) to improve the learning process in several dimensions: efficacy of learned policies, efficiency of convergence, generalisation capabilities, safety and interpretability of behaviour. However, applications of causal reasoning to Multi-Agent RL (MARL) are still mostly unexplored. In this paper, we take the first step in investigating the opportunities and challenges of applying causal reasoning in MARL. We measure the impact of a simple form of causal augmentation in state-of-the-art MARL scenarios increasingly requiring cooperation, and with state-of-the-art MARL algorithms exploiting various degrees of collaboration between agents. Then, we discuss the positive as well as negative results achieved, giving us the chance to outline the areas where further research may help to successfully transfer causal RL to the multi-agent setting.

Paper Structure

This paper contains 21 sections, 1 equation, 14 figures, 1 algorithm.

Figures (14)

  • Figure 1: Causal augmentation architecture: the agent interacts with the environment first to learn a minimal causal model, then to learn an action policy. There, in the action selection step, causal inference modulates the action space by acting as a filter (action mask).
  • Figure 2: Experimented multi-agent tasks from the VMAS simulator by DBLP:conf/dars/BettiniKBP22.
  • Figure 3: Aggregate scores of median, IQM, mean of normalized reward (all the higher -- to the right -- the better), and optimality gap (the lower -- to the left -- the better), for each algorithm, and for each task.
  • Figure 4: Data distribution for metric $pos_{rew}$ measuring the closeness of agents to goal in the Navigation task (the higher the better).
  • Figure 5: Data distribution for reward metrics considering the inverse of the distance to goal and the collisions between agents, in the Flocking task.
  • ...and 9 more figures