Symmetries-enhanced Multi-Agent Reinforcement Learning
Nikolaos Bousias, Stefanos Pertigkiozoglou, Kostas Daniilidis, George Pappas
TL;DR
This work tackles generalization and scalability challenges in multi-agent reinforcement learning by introducing extrinsic symmetries as a policy inductive bias and formalizing them within a geometric framework. It proposes the Group Equivariant Graphormer, a modular, group-canonicalized architecture that can realize $G$-equivariance on tensorial graph features for distributed swarming tasks, including $SE(3)$-based scenarios. The authors show that, under a $G$-equivariant formulation, the optimal policy is itself $G$-equivariant and that non-equivariant dynamics can be lifted to an extended $G$-equivariant system with a symmetry-breaking projection to recover the original policy. Empirically, the method yields substantial gains in generalization and zero-shot scalability, achieving lower collision rates and higher task success across varying swarm sizes for symmetry-breaking quadrotors.
Abstract
Multi-agent reinforcement learning has emerged as a powerful framework for enabling agents to learn complex, coordinated behaviors but faces persistent challenges regarding its generalization, scalability and sample efficiency. Recent advancements have sought to alleviate those issues by embedding intrinsic symmetries of the systems in the policy. Yet, most dynamical systems exhibit little to no symmetries to exploit. This paper presents a novel framework for embedding extrinsic symmetries in multi-agent system dynamics that enables the use of symmetry-enhanced methods to address systems with insufficient intrinsic symmetries, expanding the scope of equivariant learning to a wide variety of MARL problems. Central to our framework is the Group Equivariant Graphormer, a group-modular architecture specifically designed for distributed swarming tasks. Extensive experiments on a swarm of symmetry-breaking quadrotors validate the effectiveness of our approach, showcasing its potential for improved generalization and zero-shot scalability. Our method achieves significant reductions in collision rates and enhances task success rates across a diverse range of scenarios and varying swarm sizes.
