Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

Guang Yang, Tianpei Yang, Jingwen Qiao, Yanqing Wu, Jing Huo, Xingguo Chen, Yang Gao

TL;DR

Real-world MARL faces lossy, bandwidth-constrained communications that degrade cooperative policy learning. The authors propose a generalized communication-constrained prior model and a dual mutual information estimator (Du-MIE) to differentiate lossy from lossless messages and to quantify their impact on behavior. They integrate these signals into a communication-constrained MARL framework (CC-MADDPG) with reward shaping to emphasize reliable messages and suppress corrupted ones. Empirical results across Markov-based and distance-based constraints demonstrate robustness and improved performance over baselines.

Abstract

Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

TL;DR

Real-world MARL faces lossy, bandwidth-constrained communications that degrade cooperative policy learning. The authors propose a generalized communication-constrained prior model and a dual mutual information estimator (Du-MIE) to differentiate lossy from lossless messages and to quantify their impact on behavior. They integrate these signals into a communication-constrained MARL framework (CC-MADDPG) with reward shaping to emphasize reliable messages and suppress corrupted ones. Empirical results across Markov-based and distance-based constraints demonstrate robustness and improved performance over baselines.

Abstract

Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.

Paper Structure

This paper contains 23 sections, 8 equations, 1 figure, 4 tables, 1 algorithm.

Figures (1)

  • Figure 1: The overall framework for communication-constrained MARL. It can be divided into three main steps: ➀ Distinguishing between lossy and lossless messages by constructing communication link priors; ➁ Shaping the global reward through learning Du-MIE for constrained communication; ➂ Stably optimizing multi-agent policies based on MARL algorithms.