R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali
TL;DR
R3DM tackles the limitation of role-based MARL by tying an agent’s role to its future behavior through a learned dynamics model, formalized with a mutual-information objective. It decomposes role learning into intermediate role embeddings learned via contrastive learning and intrinsic rewards that promote diverse, role-consistent futures, optimized within a CTDE framework. Empirical results on SMAC and SMACv2 show improved coordination and sample efficiency, with notable gains on hard scenarios and robust qualitative evidence of distinct role differentiation. This approach advances MARL by integrating model-based dynamics with information-theoretic role discovery to achieve more reliable, cooperative multi-agent behavior.
Abstract
Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.
