Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains
Yasin Findik, Hunter Hasenfus, Reza Azadeh
TL;DR
This work addresses the challenge of rapid adaptation to unforeseen malfunctions in cooperative multi-agent reinforcement learning. It introduces Collaborative Adaptation (CA), a framework that embeds a relational network within the CTDE paradigm to guide inter-agent collaboration and accelerate recovery, yielding CA-VDN for discrete tasks and CA-MQF for continuous tasks. Empirical results in both a multi-agent grid-world and the MaMuJoCo ant domain show that CA improves teamwork and resilience after failures, outperforming strong baselines such as IDQN, VDN, IQF, and MADDPG. By steering collaboration through inter-agent relationships, CA offers a robust mechanism for malfunction recovery with practical implications for real-world robotic teams and autonomous systems.
Abstract
Cooperative multi-agent learning plays a crucial role for developing effective strategies to achieve individual or shared objectives in multi-agent teams. In real-world settings, agents may face unexpected failures, such as a robot's leg malfunctioning or a teammate's battery running out. These malfunctions decrease the team's ability to accomplish assigned task(s), especially if they occur after the learning algorithms have already converged onto a collaborative strategy. Current leading approaches in Multi-Agent Reinforcement Learning (MARL) often recover slowly -- if at all -- from such malfunctions. To overcome this limitation, we present the Collaborative Adaptation (CA) framework, highlighting its unique capability to operate in both continuous and discrete domains. Our framework enhances the adaptability of agents to unexpected failures by integrating inter-agent relationships into their learning processes, thereby accelerating the recovery from malfunctions. We evaluated our framework's performance through experiments in both discrete and continuous environments. Empirical results reveal that in scenarios involving unforeseen malfunction, although state-of-the-art algorithms often converge on sub-optimal solutions, the proposed CA framework mitigates and recovers more effectively.
