Table of Contents
Fetching ...

Advances in Multi-agent Reinforcement Learning: Persistent Autonomy and Robot Learning Lab Report 2024

Reza Azadeh

TL;DR

The paper addresses core MARL challenges in cooperative tasks with constraints, highlighting non-stationarity, dimensionality, and exploration difficulties. It presents three core contributions from the PeARL lab: (i) RA-VDN, a relational-network–augmented CTDE framework that alters team reward contributions to steer coordination without reward sharing, validated in Switch gridworlds and real-robot Turtlebot4 experiments; (ii) Mixed Q-Functionals (MQF), a value-based approach for continuous-action MARL that leverages Q-Functionals to enable parallel action evaluation and demonstrates superior convergence over policy-based baselines across six experiments; and (iii) relational-weight optimization for Multi-Agent Multi-Armed Bandits (MAMAB), which convexly optimizes graph edge weights to speed consensus on arm means, showing clear gains in large constrained teams. The results include both simulated and real-world robotic platforms, such as MaMuJoCo-Ant and Turtlebot4, underscoring improvements in adaptation to malfunctions and in cooperative learning under continuous-action constraints. Collectively, these advances advance persistent autonomy by enabling relationally principled coordination and efficient learning in complex, constrained multi-robot systems.

Abstract

Multi-Agent Reinforcement Learning (MARL) approaches have emerged as popular solutions to address the general challenges of cooperation in multi-agent environments, where the success of achieving shared or individual goals critically depends on the coordination and collaboration between agents. However, existing cooperative MARL methods face several challenges intrinsic to multi-agent systems, such as the curse of dimensionality, non-stationarity, and the need for a global exploration strategy. Moreover, the presence of agents with constraints (e.g., limited battery life, restricted mobility) or distinct roles further exacerbates these challenges. This document provides an overview of recent advances in Multi-Agent Reinforcement Learning (MARL) conducted at the Persistent Autonomy and Robot Learning (PeARL) lab at the University of Massachusetts Lowell. We briefly discuss various research directions and present a selection of approaches proposed in our most recent publications. For each proposed approach, we also highlight potential future directions to further advance the field.

Advances in Multi-agent Reinforcement Learning: Persistent Autonomy and Robot Learning Lab Report 2024

TL;DR

The paper addresses core MARL challenges in cooperative tasks with constraints, highlighting non-stationarity, dimensionality, and exploration difficulties. It presents three core contributions from the PeARL lab: (i) RA-VDN, a relational-network–augmented CTDE framework that alters team reward contributions to steer coordination without reward sharing, validated in Switch gridworlds and real-robot Turtlebot4 experiments; (ii) Mixed Q-Functionals (MQF), a value-based approach for continuous-action MARL that leverages Q-Functionals to enable parallel action evaluation and demonstrates superior convergence over policy-based baselines across six experiments; and (iii) relational-weight optimization for Multi-Agent Multi-Armed Bandits (MAMAB), which convexly optimizes graph edge weights to speed consensus on arm means, showing clear gains in large constrained teams. The results include both simulated and real-world robotic platforms, such as MaMuJoCo-Ant and Turtlebot4, underscoring improvements in adaptation to malfunctions and in cooperative learning under continuous-action constraints. Collectively, these advances advance persistent autonomy by enabling relationally principled coordination and efficient learning in complex, constrained multi-robot systems.

Abstract

Multi-Agent Reinforcement Learning (MARL) approaches have emerged as popular solutions to address the general challenges of cooperation in multi-agent environments, where the success of achieving shared or individual goals critically depends on the coordination and collaboration between agents. However, existing cooperative MARL methods face several challenges intrinsic to multi-agent systems, such as the curse of dimensionality, non-stationarity, and the need for a global exploration strategy. Moreover, the presence of agents with constraints (e.g., limited battery life, restricted mobility) or distinct roles further exacerbates these challenges. This document provides an overview of recent advances in Multi-Agent Reinforcement Learning (MARL) conducted at the Persistent Autonomy and Robot Learning (PeARL) lab at the University of Massachusetts Lowell. We briefly discuss various research directions and present a selection of approaches proposed in our most recent publications. For each proposed approach, we also highlight potential future directions to further advance the field.
Paper Structure (5 sections, 9 figures)

This paper contains 5 sections, 9 figures.

Figures (9)

  • Figure 1: Switch environment results with varying number of agents and different relational networks.
  • Figure 2: (a) multi-agent grid-world environment with four agents (circles) and four resources (dark green). The agent's action set includes moving in four directions as well as remaining idle. The agents can create the push behavior when one agent remains idle and the other moves towards the idle agent; The Relational networks used by RA-VDN (b) before the malfunction and (c) after the green agent malfunctions.
  • Figure 3: (left) Independent DQN, (middle) VDN, (right) RA-VDN. The malfunction happened at the 5000th episode.
  • Figure 4: Switch environment with four agents (circles) and four stations (colored boxes) and a bridge.
  • Figure 5: Results using (left) VDN (left) and RA-VDN (right) with the shown relational network.
  • ...and 4 more figures