PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

Yiqun Chen; Hangyu Mao; Jiaxin Mao; Shiguang Wu; Tianle Zhang; Bin Zhang; Wei Yang; Hongxing Chang

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

Yiqun Chen, Hangyu Mao, Jiaxin Mao, Shiguang Wu, Tianle Zhang, Bin Zhang, Wei Yang, Hongxing Chang

TL;DR

This paper tackles the limitation of using identical global information across agents in decentralized MARL by introducing Personalized Training with Distilled Execution (PTDE). PTDE first learns agent-personalized global information through a Global Information Personalization (GIP) module during centralized training, then distills that knowledge into a local-only student network for decentralized execution, enabling effective collaboration with minimal performance loss. The approach demonstrates strong, cross-domain improvements on StarCraft II, Google Research Football, and Learning to Rank, and proves universality across different algorithm families (e.g., QMIX, VDN, MAPPO). The two-stage training framework addresses the distributional challenges of knowledge distillation and offers a practical and scalable pathway to leverage global information without sacrificing decentralized execution. Overall, PTDE provides a robust, generalizable paradigm for enhancing multi-agent coordination under partial observability.

Abstract

Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint $Q$-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual $Q$-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamlessly integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

TL;DR

Abstract

-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual

-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamlessly integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.

Paper Structure (20 sections, 8 equations, 11 figures, 8 tables, 2 algorithms)

This paper contains 20 sections, 8 equations, 11 figures, 8 tables, 2 algorithms.

Introduction
Background
Dec-POMDP
Typical MARL Algorithms
Knowledge Distillation
Method
Naive Use of Global Information
Global Information Personalization
Knowledge Distillation
The Overall PTDE Paradigm
Experiments
StarCraft II
Google Research Football
Scenario Universality of PTDE Paradigm
Algorithm Universality of PTDE Paradigm
...and 5 more sections

Figures (11)

Figure 1: The framework of QMIX_GIU. (b) is the detail of the Global Information Unification (GIU) module.
Figure 2: The structure of the Global Information Personalization (GIP) module.
Figure 3: How GIP module is used in value-decomposition based methods (i.e., GIP_Q) and actor-critic based methods (i.e., GIP_AC).
Figure 4: The knowledge distillation framework.
Figure 5: The framework of PTDE: Two-Stage Training and Decentralized Execution.
...and 6 more figures

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

TL;DR

Abstract

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)