Table of Contents
Fetching ...

Relational Q-Functionals: Multi-Agent Learning to Recover from Unforeseen Robot Malfunctions in Continuous Action Domains

Yasin Findik, Paul Robinette, Kshitij Jerath, Reza Azadeh

TL;DR

The paper tackles rapid adaptation to unforeseen robot malfunctions in cooperative multi-agent learning with continuous actions. It introduces Relational Q-Functionals (RQF), which embed a directed, weighted relational graph among agents into the Q-functionals framework under Centralized Training with Decentralized Execution (CTDE). Empirical results in MaMuJoCo-Ant show that RQF improves cooperative coordination and enables swift recovery after a malfunction by reweighting agent relationships, outperforming Independent Q-Functionals (IQF) in adaptive scenarios. The approach is sample-efficient and well-suited for robotic modules that must maintain coordination in the presence of failures, with potential extension to more complex multi-agent malfunctions and comparisons to alternative MARL methods.

Abstract

Cooperative multi-agent learning methods are essential in developing effective cooperation strategies in multi-agent domains. In robotics, these methods extend beyond multi-robot scenarios to single-robot systems, where they enable coordination among different robot modules (e.g., robot legs or joints). However, current methods often struggle to quickly adapt to unforeseen failures, such as a malfunctioning robot leg, especially after the algorithm has converged to a strategy. To overcome this, we introduce the Relational Q-Functionals (RQF) framework. RQF leverages a relational network, representing agents' relationships, to enhance adaptability, providing resilience against malfunction(s). Our algorithm also efficiently handles continuous state-action domains, making it adept for robotic learning tasks. Our empirical results show that RQF enables agents to use these relationships effectively to facilitate cooperation and recover from an unexpected malfunction in single-robot systems with multiple interacting modules. Thus, our approach offers promising applications in multi-agent systems, particularly in scenarios with unforeseen malfunctions.

Relational Q-Functionals: Multi-Agent Learning to Recover from Unforeseen Robot Malfunctions in Continuous Action Domains

TL;DR

The paper tackles rapid adaptation to unforeseen robot malfunctions in cooperative multi-agent learning with continuous actions. It introduces Relational Q-Functionals (RQF), which embed a directed, weighted relational graph among agents into the Q-functionals framework under Centralized Training with Decentralized Execution (CTDE). Empirical results in MaMuJoCo-Ant show that RQF improves cooperative coordination and enables swift recovery after a malfunction by reweighting agent relationships, outperforming Independent Q-Functionals (IQF) in adaptive scenarios. The approach is sample-efficient and well-suited for robotic modules that must maintain coordination in the presence of failures, with potential extension to more complex multi-agent malfunctions and comparisons to alternative MARL methods.

Abstract

Cooperative multi-agent learning methods are essential in developing effective cooperation strategies in multi-agent domains. In robotics, these methods extend beyond multi-robot scenarios to single-robot systems, where they enable coordination among different robot modules (e.g., robot legs or joints). However, current methods often struggle to quickly adapt to unforeseen failures, such as a malfunctioning robot leg, especially after the algorithm has converged to a strategy. To overcome this, we introduce the Relational Q-Functionals (RQF) framework. RQF leverages a relational network, representing agents' relationships, to enhance adaptability, providing resilience against malfunction(s). Our algorithm also efficiently handles continuous state-action domains, making it adept for robotic learning tasks. Our empirical results show that RQF enables agents to use these relationships effectively to facilitate cooperation and recover from an unexpected malfunction in single-robot systems with multiple interacting modules. Thus, our approach offers promising applications in multi-agent systems, particularly in scenarios with unforeseen malfunctions.
Paper Structure (13 sections, 6 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 6 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: (a) Representation of an ant featuring four agents, each distinguished by a different color. (b) The MaMuJoCo-Ant simulation environment. (c-d) Relational networks used in RQF.
  • Figure 2: Average team rewards before and after malfunction occurred at the $30000$th episode.
  • Figure 3: Robot trajectories in x-y plane: (a) before and (b) after malfunction, upon completing 30k and 60k training episodes, respectively. It can be seen that the robot can cover more distance using RQF even after malfunction.