Robust Coordination under Misaligned Communication via Power Regularization

Nancirose Piazza; Amirhossein Karimia; Behnia Soleymanib; Vahid Behzadan; Stefan Sarkadi

Robust Coordination under Misaligned Communication via Power Regularization

Nancirose Piazza, Amirhossein Karimia, Behnia Soleymanib, Vahid Behzadan, Stefan Sarkadi

TL;DR

The paper addresses the vulnerability of communication-enabled multi-agent reinforcement learning to misaligned or adversarial messages. It introduces Communicative Power Regularization (CPR), which augments standard power regularization by explicitly accounting for the power of communication and incorporating adversarial messages during training. CPR combines standard power with a communication-specific term, integrating it into the agent value function via $V_i(s, a) = V_i^ ext{\pi}(s, a) + \lambda V_i^{\pi, \rho_{ij}}(s, a)$ and defining $\rho_{ij}^{\text{CPR}} = \rho_{ij}^{\text{Standard}} + \rho_{ij}^{\text{Communication}}$ with $R_i^{\text{power}}(s, \pi) = -\rho_{ij}^{\text{CPR}}(\pi^i, \pi^j, s^i, m^j)$. Empirical results in Grid Coverage, Predator-Prey, and Red-Door-Blue-Door demonstrate that CPR enhances robustness to adversarial communication while preserving cooperative performance, and in some cases yields large improvements and better stability. Overall, CPR offers a scalable, practical approach to secure and resilient cooperative MARL under misaligned communication.

Abstract

Effective communication in Multi-Agent Reinforcement Learning (MARL) can significantly enhance coordination and collaborative performance in complex and partially observable environments. However, reliance on communication can also introduce vulnerabilities when agents are misaligned, potentially leading to adversarial interactions that exploit implicit assumptions of cooperative intent. Prior work has addressed adversarial behavior through power regularization through controlling the influence one agent exerts over another, but has largely overlooked the role of communication in these dynamics. This paper introduces Communicative Power Regularization (CPR), extending power regularization specifically to communication channels. By explicitly quantifying and constraining agents' communicative influence during training, CPR actively mitigates vulnerabilities arising from misaligned or adversarial communications. Evaluations across benchmark environments Red-Door-Blue-Door, Predator-Prey, and Grid Coverage demonstrate that our approach significantly enhances robustness to adversarial communication while preserving cooperative performance, offering a practical framework for secure and resilient cooperative MARL systems.

Robust Coordination under Misaligned Communication via Power Regularization

TL;DR

and defining

with

. Empirical results in Grid Coverage, Predator-Prey, and Red-Door-Blue-Door demonstrate that CPR enhances robustness to adversarial communication while preserving cooperative performance, and in some cases yields large improvements and better stability. Overall, CPR offers a scalable, practical approach to secure and resilient cooperative MARL under misaligned communication.

Abstract

Paper Structure (9 sections, 13 equations, 2 figures, 6 tables)

This paper contains 9 sections, 13 equations, 2 figures, 6 tables.

Related Work
Preliminaries
Communicative MARL
Implicit Communication via Graph Neural Networks
Power
Power Regularization Over Communication
Communicative Power Regularization (CPR)
Experiment Results
Conclusion

Figures (2)

Figure 1: Grid Coverage, Average cooperative coverage percentage (over 100 trials) across varying team compositions, comparing agents trained with CPR (blue) vs without CPR (red).
Figure 2: Grid Coverage, Cumulative average score of CPR-trained agents (5 cooperative, 100 trials), comparing performance with (blue triangles) and without (orange circles) communication

Robust Coordination under Misaligned Communication via Power Regularization

TL;DR

Abstract

Robust Coordination under Misaligned Communication via Power Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (2)