Revisiting Communication Efficiency in Multi-Agent Reinforcement Learning from the Dimensional Analysis Perspective
Chuxiong Sun, Peng He, Rui Wang, Changwen Zheng
TL;DR
This work addresses the inefficiency of communication in multi-agent reinforcement learning by introducing dimensional analysis. DRMAC combines a redundancy-reduction objective $\mathcal{L}_{RR}$ to decorrelate embedded message dimensions with a learnable Information Selective Network (ISN) that applies a dimensional mask to emphasize decision-relevant information, the latter trained via meta-learning with second-order gradients. The method is designed to be plug-and-play, improving existing MARL baselines (e.g., MASIA, SMS, TarMAC) and even non-communication baselines when combined, across Hallway and StarCraft II SMAC environments. Empirically, DRMAC reduces dimensional redundancy and suppresses confounders, yielding superior performance and robustness in diverse, complex tasks, with strong generalizability to various baselines and settings. This approach provides a practical pathway to more efficient inter-agent communication by focusing on the information architecture of embeddings at the receiving end, rather than solely on the sender side.
Abstract
In this work, we introduce a novel perspective, i.e., dimensional analysis, to address the challenge of communication efficiency in Multi-Agent Reinforcement Learning (MARL). Our findings reveal that simply optimizing the content and timing of communication at sending end is insufficient to fully resolve communication efficiency issues. Even after applying optimized and gated messages, dimensional redundancy and confounders still persist in the integrated message embeddings at receiving end, which negatively impact communication quality and decision-making. To address these challenges, we propose Dimensional Rational Multi-Agent Communication (DRMAC), designed to mitigate both dimensional redundancy and confounders in MARL. DRMAC incorporates a redundancy-reduction regularization term to encourage the decoupling of information across dimensions within the learned representations of integrated messages. Additionally, we introduce a dimensional mask that dynamically adjusts gradient weights during training to eliminate the influence of decision-irrelevant dimensions. We evaluate DRMAC across a diverse set of multi-agent tasks, demonstrating its superior performance over existing state-of-the-art methods in complex scenarios. Furthermore, the plug-and-play nature of DRMAC's key modules highlights its generalizable performance, serving as a valuable complement rather than a replacement for existing multi-agent communication strategies.
