Table of Contents
Fetching ...

Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm

Chak Lam Shek, Amrit Singh Bedi, Anjon Basak, Ellen Novoseller, Nick Waytowich, Priya Narayanan, Dinesh Manocha, Pratap Tokekar

TL;DR

This work tackles the challenge of relying on global rewards in decentralized multi-agent reinforcement learning for cooperative robotics. It proposes Loc-FACMAC, a locality-based factorized actor-critic method that first constructs a dependency graph to identify strongly related robot groups and then trains multiple mixers corresponding to partitions to estimate local joint-action values. The algorithm separates actor and critic updates to prevent distorted policy gradients and uses partition-level targets to refine policy updates. Empirical results across Hallway, Coupled Multi-Cart-Pole, and BCN demonstrate up to 108% improvements over baselines, with faster convergence when locality structure is properly defined, highlighting the method's scalability and applicability to real-world multi-robot coordination.

Abstract

In this work, we present a novel cooperative multi-agent reinforcement learning method called \textbf{Loc}ality based \textbf{Fac}torized \textbf{M}ulti-Agent \textbf{A}ctor-\textbf{C}ritic (Loc-FACMAC). Existing state-of-the-art algorithms, such as FACMAC, rely on global reward information, which may not accurately reflect the quality of individual robots' actions in decentralized systems. We integrate the concept of locality into critic learning, where strongly related robots form partitions during training. Robots within the same partition have a greater impact on each other, leading to more precise policy evaluation. Additionally, we construct a dependency graph to capture the relationships between robots, facilitating the partitioning process. This approach mitigates the curse of dimensionality and prevents robots from using irrelevant information. Our method improves existing algorithms by focusing on local rewards and leveraging partition-based learning to enhance training efficiency and performance. We evaluate the performance of Loc-FACMAC in three environments: Hallway, Multi-cartpole, and Bounded-Cooperative-Navigation. We explore the impact of partition sizes on the performance and compare the result with baseline MARL algorithms such as LOMAQ, FACMAC, and QMIX. The experiments reveal that, if the locality structure is defined properly, Loc-FACMAC outperforms these baseline algorithms up to 108\%, indicating that exploiting the locality structure in the actor-critic framework improves the MARL performance.

Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm

TL;DR

This work tackles the challenge of relying on global rewards in decentralized multi-agent reinforcement learning for cooperative robotics. It proposes Loc-FACMAC, a locality-based factorized actor-critic method that first constructs a dependency graph to identify strongly related robot groups and then trains multiple mixers corresponding to partitions to estimate local joint-action values. The algorithm separates actor and critic updates to prevent distorted policy gradients and uses partition-level targets to refine policy updates. Empirical results across Hallway, Coupled Multi-Cart-Pole, and BCN demonstrate up to 108% improvements over baselines, with faster convergence when locality structure is properly defined, highlighting the method's scalability and applicability to real-world multi-robot coordination.

Abstract

In this work, we present a novel cooperative multi-agent reinforcement learning method called \textbf{Loc}ality based \textbf{Fac}torized \textbf{M}ulti-Agent \textbf{A}ctor-\textbf{C}ritic (Loc-FACMAC). Existing state-of-the-art algorithms, such as FACMAC, rely on global reward information, which may not accurately reflect the quality of individual robots' actions in decentralized systems. We integrate the concept of locality into critic learning, where strongly related robots form partitions during training. Robots within the same partition have a greater impact on each other, leading to more precise policy evaluation. Additionally, we construct a dependency graph to capture the relationships between robots, facilitating the partitioning process. This approach mitigates the curse of dimensionality and prevents robots from using irrelevant information. Our method improves existing algorithms by focusing on local rewards and leveraging partition-based learning to enhance training efficiency and performance. We evaluate the performance of Loc-FACMAC in three environments: Hallway, Multi-cartpole, and Bounded-Cooperative-Navigation. We explore the impact of partition sizes on the performance and compare the result with baseline MARL algorithms such as LOMAQ, FACMAC, and QMIX. The experiments reveal that, if the locality structure is defined properly, Loc-FACMAC outperforms these baseline algorithms up to 108\%, indicating that exploiting the locality structure in the actor-critic framework improves the MARL performance.

Paper Structure

This paper contains 15 sections, 4 equations, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: In a warehouse environment, the task is to deliver a package to the target location for a reward of +1 per delivery. Two robots are involved in the process while the other robots are not. (a) In the single mixer approach, all robots share the reward, even those not involved. (b) In Loc-FACMAC (ours), mixers are assigned to partitions, and only agents within the partition or k-hop neighborhood share the reward.
  • Figure 2: This figure considers a five-robot network and shows how different robots are connected. It also shows two ways of partitioning the network to leverage locality in the learning process. Robot 4 and robot 5 are away from robot 1, robot 2, and robot 3. Robot 4 and robot 5 can be grouped separately.
  • Figure 3: This figure presents the architecture of the proposed Loc-FACMAC. Our proposed framework, Loc-FACMAC, consists of Actors, Critics, and Mixers.
  • Figure 4: This figure describes the partition for one specific environment of multi-cartpole. We can divide six carts into two possible partitions (2-2-2 and 3-3). There exist other possible partitions, such as 1-2-3.
  • Figure 5: This figure describes the partition for one specific environment of Hallway. We can divide twelve robots into five partitions (2-2-2-3-3).
  • ...and 5 more figures