Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games
Tailia Malloy, Tim Klinger, Miao Liu, Matthew Riemer, Gerald Tesauro, Chris R. Sims
TL;DR
The paper addresses nonstationarity in multi-agent reinforcement learning by introducing a capacity-limited policy information constraint within MADDPG. It formalizes a mutual-information budget $\mathcal{I}(\pi(a|s)) \le \mathcal{C}$ and a reward-regularization weight $\beta$, linking policy complexity to generalization and consolidation. The authors propose MI approximation techniques for deterministic MADDPG policies and present the Capacity-Limited MADDPG algorithm that integrates these terms into the centralized critic framework. Empirical results across cooperative, competitive, and mixed environments show improved generalization and learning stability in most tasks, with certain mixed-task dynamics showing sensitivity to the information budget. This work presents an information-theoretic regularization framework that can mitigate forgetting and improve robustness in nonstationary MARL settings, with practical implications for scalable multi-agent control.
Abstract
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Previous research with a related approach in continuous control experiments suggests that this method favors learning policies that are more robust to changing environment dynamics. The multi-agent game setting naturally requires this type of robustness, as other agents' policies change throughout learning, introducing a nonstationary environment. For this reason, recent methods in continual learning are compared to our approach, termed Capacity-Limited MADDPG. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.
