Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies
Hugo Garrido-Lestache Belinchon, Jeremy Kedziora
TL;DR
The paper tackles scaling collaboration in cooperative multi-agent reinforcement learning by addressing the exponential growth of joint action spaces. It proposes Team-Attention-Actor-Critic (TAAC), a Centralized Training/Centralized Execution method that injects multi-headed attention into both the actor and critic and includes a penalized loss to encourage diverse, complementary roles, enabling explicit inter-agent communication. TAAC is benchmarked against PPO and MAAC across Soccer, BoxJump, and Level-Based Foraging, showing superior performance in coordination-intensive tasks and competitive performance elsewhere. The results demonstrate TAAC's scalability with increasing agent counts and its ability to foster richer collaborative behaviors such as passing and synchronized positioning, highlighting its practical potential for large-scale cooperative systems.
Abstract
This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).
