Fully-Decentralized MADDPG with Networked Agents
Diego Bolliger, Lorenz Zauter, Robert Ziegler
TL;DR
This work tackles fully decentralized multi-agent reinforcement learning in partially observable stochastic games (POSG) by adapting MADDPG to operate with local information during both training and execution. It first introduces a fully decentralized MADDPG using surrogate policies to approximate other agents' behavior and local replay buffers, then adds a networked training paradigm with either hard consensus (averaging neighbors' critics) or soft consensus (penalized parameter alignment) to balance cooperation and decentralization. The authors extend these methods to adversarial and mixed settings by adjusting the gradient updates to account for potential adversaries and observations of their actions. Empirical results in the multi-particle environment show that the decentralized variants achieve comparable performance to MADDPG while reducing computational cost, with soft consensus offering better stability, especially as the agent count grows, and faster convergence than fully centralized approaches in larger-scale scenarios. These findings demonstrate scalable decentralized MARL with local observations and point to future work on applying the approach to other algorithms like MAPPO to further improve performance in large, networked teams.
Abstract
In this paper, we devise three actor-critic algorithms with decentralized training for multi-agent reinforcement learning in cooperative, adversarial, and mixed settings with continuous action spaces. To this goal, we adapt the MADDPG algorithm by applying a networked communication approach between agents. We introduce surrogate policies in order to decentralize the training while allowing for local communication during training. The decentralized algorithms achieve comparable results to the original MADDPG in empirical tests, while reducing computational cost. This is more pronounced with larger numbers of agents.
