Table of Contents
Fetching ...

GTDE: Grouped Training with Decentralized Execution for Multi-agent Actor-Critic

Mengxian Li, Qi Wang, Yongjun Xu

TL;DR

GTDE addresses the scalability bottlenecks of traditional MARL training paradigms by introducing grouped training with decentralized execution. It learns dynamic agent groupings from local observation histories via a Gumbel-Sigmoid-based adaptive grouping module and aggregates intra-group information through matrix multiplication or a graph attention network, removing the need for centralized modules. Empirically, GTDE outperforms DTDE and CTDE across SMACv2, Battle, and Gather, especially as the number of agents grows, while reducing training information requirements by significant margins. The approach offers a practical path to scalable multi-agent coordination in partially observable environments with limited communication.

Abstract

The rapid advancement of multi-agent reinforcement learning (MARL) has given rise to diverse training paradigms to learn the policies of each agent in the multi-agent system. The paradigms of decentralized training and execution (DTDE) and centralized training with decentralized execution (CTDE) have been proposed and widely applied. However, as the number of agents increases, the inherent limitations of these frameworks significantly degrade the performance metrics, such as win rate, total reward, etc. To reduce the influence of the increasing number of agents on the performance metrics, we propose a novel training paradigm of grouped training decentralized execution (GTDE). This framework eliminates the need for a centralized module and relies solely on local information, effectively meeting the training requirements of large-scale multi-agent systems. Specifically, we first introduce an adaptive grouping module, which divides each agent into different groups based on their observation history. To implement end-to-end training, GTDE uses Gumbel-Sigmoid for efficient point-to-point sampling on the grouping distribution while ensuring gradient backpropagation. To adapt to the uncertainty in the number of members in a group, two methods are used to implement a group information aggregation module that merges member information within the group. Empirical results show that in a cooperative environment with 495 agents, GTDE increased the total reward by an average of 382\% compared to the baseline. In a competitive environment with 64 agents, GTDE achieved a 100\% win rate against the baseline.

GTDE: Grouped Training with Decentralized Execution for Multi-agent Actor-Critic

TL;DR

GTDE addresses the scalability bottlenecks of traditional MARL training paradigms by introducing grouped training with decentralized execution. It learns dynamic agent groupings from local observation histories via a Gumbel-Sigmoid-based adaptive grouping module and aggregates intra-group information through matrix multiplication or a graph attention network, removing the need for centralized modules. Empirically, GTDE outperforms DTDE and CTDE across SMACv2, Battle, and Gather, especially as the number of agents grows, while reducing training information requirements by significant margins. The approach offers a practical path to scalable multi-agent coordination in partially observable environments with limited communication.

Abstract

The rapid advancement of multi-agent reinforcement learning (MARL) has given rise to diverse training paradigms to learn the policies of each agent in the multi-agent system. The paradigms of decentralized training and execution (DTDE) and centralized training with decentralized execution (CTDE) have been proposed and widely applied. However, as the number of agents increases, the inherent limitations of these frameworks significantly degrade the performance metrics, such as win rate, total reward, etc. To reduce the influence of the increasing number of agents on the performance metrics, we propose a novel training paradigm of grouped training decentralized execution (GTDE). This framework eliminates the need for a centralized module and relies solely on local information, effectively meeting the training requirements of large-scale multi-agent systems. Specifically, we first introduce an adaptive grouping module, which divides each agent into different groups based on their observation history. To implement end-to-end training, GTDE uses Gumbel-Sigmoid for efficient point-to-point sampling on the grouping distribution while ensuring gradient backpropagation. To adapt to the uncertainty in the number of members in a group, two methods are used to implement a group information aggregation module that merges member information within the group. Empirical results show that in a cooperative environment with 495 agents, GTDE increased the total reward by an average of 382\% compared to the baseline. In a competitive environment with 64 agents, GTDE achieved a 100\% win rate against the baseline.
Paper Structure (26 sections, 9 equations, 7 figures, 4 tables)

This paper contains 26 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Each circle represents the observation of an agent, and a unidirectional arrow such as $o_i\rightarrow o_k$ indicates that agent $i$ needs the observation of agent $k$ during training.
  • Figure 2: Overview of GTDE framework. The red dashed line represents gradient flow.
  • Figure 3: Test curve of average win rate. The shaded area represents the range between the minimum and maximum values across 5 seeds, while the solid line in the center denotes the average value.
  • Figure 4: The total reward curve of the Battle scenario(64 v.s. 64).
  • Figure 5: Partial links in the Battle scenario.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1