Leveraging Partial Symmetry for Multi-Agent Reinforcement Learning
Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Simin Li, Shuhao Liao, Wenjun Wu
TL;DR
This work tackles the challenge of leveraging symmetry in multi-agent reinforcement learning when symmetry is partial rather than perfect. It formalizes the partially symmetric Markov game, proves a bounded performance error for symmetry-based training, and introduces the Partial Symmetry Exploitation (PSE) framework to adaptively exploit symmetry via quantification, annealing, augmentation, and a symmetry-consistency loss. Empirical results across classic MARL benchmarks and real-world robot experiments show that PSE consistently improves sample efficiency and outcomes under symmetry-breaking conditions, outperforming strong baselines. The approach offers a practical pathway to incorporate inductive biases in realistic MARL deployments, with implications for robust coordination in heterogeneous, partially symmetric systems.
Abstract
Incorporating symmetry as an inductive bias into multi-agent reinforcement learning (MARL) has led to improvements in generalization, data efficiency, and physical consistency. While prior research has succeeded in using perfect symmetry prior, the realm of partial symmetry in the multi-agent domain remains unexplored. To fill in this gap, we introduce the partially symmetric Markov game, a new subclass of the Markov game. We then theoretically show that the performance error introduced by utilizing symmetry in MARL is bounded, implying that the symmetry prior can still be useful in MARL even in partial symmetry situations. Motivated by this insight, we propose the Partial Symmetry Exploitation (PSE) framework that is able to adaptively incorporate symmetry prior in MARL under different symmetry-breaking conditions. Specifically, by adaptively adjusting the exploitation of symmetry, our framework is able to achieve superior sample efficiency and overall performance of MARL algorithms. Extensive experiments are conducted to demonstrate the superior performance of the proposed framework over baselines. Finally, we implement the proposed framework in real-world multi-robot testbed to show its superiority.
