Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains

Yasin Findik; Hunter Hasenfus; Reza Azadeh

Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains

Yasin Findik, Hunter Hasenfus, Reza Azadeh

TL;DR

This work addresses the challenge of rapid adaptation to unforeseen malfunctions in cooperative multi-agent reinforcement learning. It introduces Collaborative Adaptation (CA), a framework that embeds a relational network within the CTDE paradigm to guide inter-agent collaboration and accelerate recovery, yielding CA-VDN for discrete tasks and CA-MQF for continuous tasks. Empirical results in both a multi-agent grid-world and the MaMuJoCo ant domain show that CA improves teamwork and resilience after failures, outperforming strong baselines such as IDQN, VDN, IQF, and MADDPG. By steering collaboration through inter-agent relationships, CA offers a robust mechanism for malfunction recovery with practical implications for real-world robotic teams and autonomous systems.

Abstract

Cooperative multi-agent learning plays a crucial role for developing effective strategies to achieve individual or shared objectives in multi-agent teams. In real-world settings, agents may face unexpected failures, such as a robot's leg malfunctioning or a teammate's battery running out. These malfunctions decrease the team's ability to accomplish assigned task(s), especially if they occur after the learning algorithms have already converged onto a collaborative strategy. Current leading approaches in Multi-Agent Reinforcement Learning (MARL) often recover slowly -- if at all -- from such malfunctions. To overcome this limitation, we present the Collaborative Adaptation (CA) framework, highlighting its unique capability to operate in both continuous and discrete domains. Our framework enhances the adaptability of agents to unexpected failures by integrating inter-agent relationships into their learning processes, thereby accelerating the recovery from malfunctions. We evaluated our framework's performance through experiments in both discrete and continuous environments. Empirical results reveal that in scenarios involving unforeseen malfunction, although state-of-the-art algorithms often converge on sub-optimal solutions, the proposed CA framework mitigates and recovers more effectively.

Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains

TL;DR

Abstract

Paper Structure (10 sections, 14 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 14 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Background and Related Work
Proposed Method
Environments
Multi-agent Grid-world Environment
Multi-agent MuJoCo
Models and Hyperparameters
Experimental Results
Conclusion and Future Work
Acknowledgments

Figures (4)

Figure 1: (a) multi-agent grid-world environment with four agents. (b-c) Relational networks employed in CA-VDN.
Figure 2: (a) Representation of an ant featuring four agents, each distinguished by a different color and (b) The MaMuJoCo-Ant simulation environment. (c-d) Relational networks used in CA-MQF.
Figure 3: Multi-agent Grid-world results: Average individual rewards of agents before and after the green agent's malfunction at the 5000th episode for (a) IDQN, (b) VDN and (c) CA-VDN.
Figure 4: MaMuJoCo-Ant results: (a) Average team rewards before and after malfunction occurred at the $30000$th episode. (b-c) Robot trajectories in x-y plane: (b) before and (c) after malfunction, upon completing 30k and 60k training episodes, respectively. It can be seen that the robot can cover more distance using CA-MQF with higher rewards.

Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains

TL;DR

Abstract

Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains

Authors

TL;DR

Abstract

Table of Contents

Figures (4)