Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

Changxi Zhu; Mehdi Dastani; Shihan Wang

Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

Changxi Zhu, Mehdi Dastani, Shihan Wang

TL;DR

This work analyzes how communication among critics in Decentralized Communicating Critics and Decentralized Actors (DCCDA) MADRL affects policy-gradient variance, proving $Var(\hat{g}^i_{DCCDA}) \geq Var(\hat{g}^i_{CTDE})$ under ideal and noisy conditions. It then introduces a modular variance-reduction framework comprising a message-dependent baseline (OB) and KL-based policy regularization to align actors with critic guidance, showing these techniques can be plugged into existing DCCDA methods. The authors validate their approach on StarCraft Multi-Agent Challenge and Traffic Junction, reporting reduced gradient variance and improved learning performance for OB-KL variants (e.g., GAAC-OB-KL, IPPO-Comm-OB-KL). The results suggest practical benefits for robust coordination in partially observable multi-agent systems with communication, with potential extension to continuous message spaces and broader Comm-MADRL settings.

Abstract

In decentralized multi-agent deep reinforcement learning (MADRL), communication can help agents to gain a better understanding of the environment to better coordinate their behaviors. Nevertheless, communication may involve uncertainty, which potentially introduces variance to the learning of decentralized agents. In this paper, we focus on a specific decentralized MADRL setting with communication and conduct a theoretical analysis to study the variance that is caused by communication in policy gradients. We propose modular techniques to reduce the variance in policy gradients during training. We adopt our modular techniques into two existing algorithms for decentralized MADRL with communication and evaluate them on multiple tasks in the StarCraft Multi-Agent Challenge and Traffic Junction domains. The results show that decentralized MADRL communication methods extended with our proposed techniques not only achieve high-performing agents but also reduce variance in policy gradients during training.

Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

TL;DR

This work analyzes how communication among critics in Decentralized Communicating Critics and Decentralized Actors (DCCDA) MADRL affects policy-gradient variance, proving

under ideal and noisy conditions. It then introduces a modular variance-reduction framework comprising a message-dependent baseline (OB) and KL-based policy regularization to align actors with critic guidance, showing these techniques can be plugged into existing DCCDA methods. The authors validate their approach on StarCraft Multi-Agent Challenge and Traffic Junction, reporting reduced gradient variance and improved learning performance for OB-KL variants (e.g., GAAC-OB-KL, IPPO-Comm-OB-KL). The results suggest practical benefits for robust coordination in partially observable multi-agent systems with communication, with potential extension to continuous message spaces and broader Comm-MADRL settings.

Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

TL;DR

Abstract

Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (11)