Table of Contents
Fetching ...

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

Hugo Garrido-Lestache Belinchon, Jeremy Kedziora

TL;DR

The paper tackles scaling collaboration in cooperative multi-agent reinforcement learning by addressing the exponential growth of joint action spaces. It proposes Team-Attention-Actor-Critic (TAAC), a Centralized Training/Centralized Execution method that injects multi-headed attention into both the actor and critic and includes a penalized loss to encourage diverse, complementary roles, enabling explicit inter-agent communication. TAAC is benchmarked against PPO and MAAC across Soccer, BoxJump, and Level-Based Foraging, showing superior performance in coordination-intensive tasks and competitive performance elsewhere. The results demonstrate TAAC's scalability with increasing agent counts and its ability to foster richer collaborative behaviors such as passing and synchronized positioning, highlighting its practical potential for large-scale cooperative systems.

Abstract

This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

TL;DR

The paper tackles scaling collaboration in cooperative multi-agent reinforcement learning by addressing the exponential growth of joint action spaces. It proposes Team-Attention-Actor-Critic (TAAC), a Centralized Training/Centralized Execution method that injects multi-headed attention into both the actor and critic and includes a penalized loss to encourage diverse, complementary roles, enabling explicit inter-agent communication. TAAC is benchmarked against PPO and MAAC across Soccer, BoxJump, and Level-Based Foraging, showing superior performance in coordination-intensive tasks and competitive performance elsewhere. The results demonstrate TAAC's scalability with increasing agent counts and its ability to foster richer collaborative behaviors such as passing and synchronized positioning, highlighting its practical potential for large-scale cooperative systems.

Abstract

This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).

Paper Structure

This paper contains 18 sections, 9 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: Architecture of Actor. Note that the output is a set of $n$ distributions over actions, one for each agent.
  • Figure 2: Architecture of Critic. Note that the output is a set of $n$ state-action values, one for each agent.
  • Figure 3: Evaluation environments.
  • Figure 4: BoxJump: This figure showcases the maximum height achieved for each agent count and algorithm during inference. TAAC (ours) achieves more height as the number of agents increases, in expectation. A Kolmogorov-Smirnov Test and a Bootstrap test suggest that the distributions for MAAC and TAAC are significantly different at 8 agents and above.
  • Figure 5: Top: LBF - Task: 2p-4f-5x5-c. Two players gather four food in a 5x5 grid world; each gathering requires all players. Bottom: LBF - Task: 3p-6f-6x6. Three players gather six food in a 6x6 grid world. Reward is normalized by food levels in both experiments.
  • ...and 12 more figures