Learning Reward Machines in Cooperative Multi-Agent Tasks

Leo Ardon; Daniel Furelos-Blanco; Alessandra Russo

Learning Reward Machines in Cooperative Multi-Agent Tasks

Leo Ardon, Daniel Furelos-Blanco, Alessandra Russo

TL;DR

This work tackles non-Markovian rewards in cooperative multi-agent RL by learning Reward Machines for sub-tasks and integrating them with decentralized Q-learning. Each agent maintains its own RM and Q-function, and synchronization on shared propositions ensures alignment toward the global objective. Experiments on ThreeButtons and Rendezvous demonstrate that learned per-agent RMs can achieve maximal collective rewards and speed up learning, while naive global RM learning can be intractable for larger problems. The approach enhances interpretability of learned policies and offers a scalable path for complex multi-agent environments.

Abstract

This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.

Learning Reward Machines in Cooperative Multi-Agent Tasks

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 6 figures, 1 algorithm)

This paper contains 15 sections, 1 equation, 6 figures, 1 algorithm.

Introduction
Background
Reinforcement Learning
Reward Machines
Definitions
RL Algorithm
Multi-Agent Decomposition
Learning Reward Machines in Cooperative Multi-Agent Tasks
Learn a Global Reward Machine
Learn Individual Reward Machines
Experiments
ThreeButtons Task
Rendezvous Task
Related Work
Conclusion and Future Work

Figures (6)

Figure 1: Illustration of the ThreeButtons grid (a) and a reward machine modeling the task's structure (b) Neary_Xu_Wu_Topcu_2021.
Figure 2: RMs for each of the agents in ThreeButtonsNeary_Xu_Wu_Topcu_2021.
Figure 3: Comparison between handcrafted RMs (RM Provided) and our approach learning the RMs from traces (RM Learnt) in the ThreeButtons environment.
Figure 4: Learnt RM for $A_2$.
Figure 5: Example of the Rendezvous task where $2$ agents must meet on the RDV point (green) before reaching their goal state $G1$ and $G2$ for agents $A1$ and $A2$ respectively.
...and 1 more figures

Learning Reward Machines in Cooperative Multi-Agent Tasks

TL;DR

Abstract

Learning Reward Machines in Cooperative Multi-Agent Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)