Table of Contents
Fetching ...

Double Distillation Network for Multi-Agent Reinforcement Learning

Yang Zhou, Siying Wang, Wenyu Chen, Ruoning Zhang, Zhitong Zhao, Zixuan Zhang

TL;DR

The paper tackles non-stationarity and partial observability in cooperative multi-agent reinforcement learning under CTDE. It introduces the Double Distillation Network (DDN), which combines an External Distillation Module (Global Guiding Network and Local Policy Network) with an Internal Distillation Module to both align centralized training with decentralized execution and drive exploration through intrinsic rewards derived from global state features. The external pathway uses personalized global information to reduce the gap between the leader and follower networks, while the internal pathway injects state-informed intrinsic rewards to boost exploration. Extensive experiments on SMAC and Predator-Prey show that DDN improves coordination and training efficiency, with ablations confirming the contributions of personalization, multi-level distillation, and intrinsic reward mechanisms. Overall, DDN provides a practical, scalable approach to leverage global state information while maintaining decentralized execution in complex, partially observable MARL settings.

Abstract

Multi-agent reinforcement learning typically employs a centralized training-decentralized execution (CTDE) framework to alleviate the non-stationarity in environment. However, the partial observability during execution may lead to cumulative gap errors gathered by agents, impairing the training of effective collaborative policies. To overcome this challenge, we introduce the Double Distillation Network (DDN), which incorporates two distillation modules aimed at enhancing robust coordination and facilitating the collaboration process under constrained information. The external distillation module uses a global guiding network and a local policy network, employing distillation to reconcile the gap between global training and local execution. In addition, the internal distillation module introduces intrinsic rewards, drawn from state information, to enhance the exploration capabilities of agents. Extensive experiments demonstrate that DDN significantly improves performance across multiple scenarios.

Double Distillation Network for Multi-Agent Reinforcement Learning

TL;DR

The paper tackles non-stationarity and partial observability in cooperative multi-agent reinforcement learning under CTDE. It introduces the Double Distillation Network (DDN), which combines an External Distillation Module (Global Guiding Network and Local Policy Network) with an Internal Distillation Module to both align centralized training with decentralized execution and drive exploration through intrinsic rewards derived from global state features. The external pathway uses personalized global information to reduce the gap between the leader and follower networks, while the internal pathway injects state-informed intrinsic rewards to boost exploration. Extensive experiments on SMAC and Predator-Prey show that DDN improves coordination and training efficiency, with ablations confirming the contributions of personalization, multi-level distillation, and intrinsic reward mechanisms. Overall, DDN provides a practical, scalable approach to leverage global state information while maintaining decentralized execution in complex, partially observable MARL settings.

Abstract

Multi-agent reinforcement learning typically employs a centralized training-decentralized execution (CTDE) framework to alleviate the non-stationarity in environment. However, the partial observability during execution may lead to cumulative gap errors gathered by agents, impairing the training of effective collaborative policies. To overcome this challenge, we introduce the Double Distillation Network (DDN), which incorporates two distillation modules aimed at enhancing robust coordination and facilitating the collaboration process under constrained information. The external distillation module uses a global guiding network and a local policy network, employing distillation to reconcile the gap between global training and local execution. In addition, the internal distillation module introduces intrinsic rewards, drawn from state information, to enhance the exploration capabilities of agents. Extensive experiments demonstrate that DDN significantly improves performance across multiple scenarios.

Paper Structure

This paper contains 25 sections, 10 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: The proposed DDN framework consists of two parts: (a) the External Distillation Module, which includes the global guiding network (on the left) and the local policy network (on the lower right), and (b) the Internal Distillation Module (on the upper right).
  • Figure 2: Knowledge distillation between the Personalization Fusion Block and the Independent Observation Block.
  • Figure 3: Detailed outline of Internal Distillation Module.
  • Figure 4: The win rates of different algorithms across the 6 combat scenarios in SMAC.
  • Figure 5: The comparing results on Predator-Prey.