Robust Coordination under Misaligned Communication via Power Regularization
Nancirose Piazza, Amirhossein Karimia, Behnia Soleymanib, Vahid Behzadan, Stefan Sarkadi
TL;DR
The paper addresses the vulnerability of communication-enabled multi-agent reinforcement learning to misaligned or adversarial messages. It introduces Communicative Power Regularization (CPR), which augments standard power regularization by explicitly accounting for the power of communication and incorporating adversarial messages during training. CPR combines standard power with a communication-specific term, integrating it into the agent value function via $V_i(s, a) = V_i^ ext{\pi}(s, a) + \lambda V_i^{\pi, \rho_{ij}}(s, a)$ and defining $\rho_{ij}^{\text{CPR}} = \rho_{ij}^{\text{Standard}} + \rho_{ij}^{\text{Communication}}$ with $R_i^{\text{power}}(s, \pi) = -\rho_{ij}^{\text{CPR}}(\pi^i, \pi^j, s^i, m^j)$. Empirical results in Grid Coverage, Predator-Prey, and Red-Door-Blue-Door demonstrate that CPR enhances robustness to adversarial communication while preserving cooperative performance, and in some cases yields large improvements and better stability. Overall, CPR offers a scalable, practical approach to secure and resilient cooperative MARL under misaligned communication.
Abstract
Effective communication in Multi-Agent Reinforcement Learning (MARL) can significantly enhance coordination and collaborative performance in complex and partially observable environments. However, reliance on communication can also introduce vulnerabilities when agents are misaligned, potentially leading to adversarial interactions that exploit implicit assumptions of cooperative intent. Prior work has addressed adversarial behavior through power regularization through controlling the influence one agent exerts over another, but has largely overlooked the role of communication in these dynamics. This paper introduces Communicative Power Regularization (CPR), extending power regularization specifically to communication channels. By explicitly quantifying and constraining agents' communicative influence during training, CPR actively mitigates vulnerabilities arising from misaligned or adversarial communications. Evaluations across benchmark environments Red-Door-Blue-Door, Predator-Prey, and Grid Coverage demonstrate that our approach significantly enhances robustness to adversarial communication while preserving cooperative performance, offering a practical framework for secure and resilient cooperative MARL systems.
