JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning

Chenxing Liu; Guizhong Liu

JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning

Chenxing Liu, Guizhong Liu

TL;DR

JointPPO is proposed, a CTCE method that uses Proximal Policy Optimization (PPO) to directly optimize the joint policy of the multi-agent system, and effectively handles a large joint action space and extends PPO to multi-agent setting in a clear and concise manner.

Abstract

While Centralized Training with Decentralized Execution (CTDE) has become the prevailing paradigm in Multi-Agent Reinforcement Learning (MARL), it may not be suitable for scenarios in which agents can fully communicate and share observations with each other. Fully centralized methods, also know as Centralized Training with Centralized Execution (CTCE) methods, can fully utilize observations of all the agents by treating the entire system as a single agent. However, traditional CTCE methods suffer from scalability issues due to the exponential growth of the joint action space. To address these challenges, in this paper we propose JointPPO, a CTCE method that uses Proximal Policy Optimization (PPO) to directly optimize the joint policy of the multi-agent system. JointPPO decomposes the joint policy into conditional probabilities, transforming the decision-making process into a sequence generation task. A Transformer-based joint policy network is constructed, trained with a PPO loss tailored for the joint policy. JointPPO effectively handles a large joint action space and extends PPO to multi-agent setting in a clear and concise manner. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) testbed demonstrate the superiority of JointPPO over strong baselines. Ablation experiments and analyses are conducted to explores the factors influencing JointPPO's performance.

JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (21 sections, 8 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 8 equations, 9 figures, 4 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
PODMP
Multi-Agent Transformer
Method
Problem Modeling
Transformer-Based Joint Policy Network
Joint PPO Loss
Decision Order Designation Mechanism
Experiments
SMAC Testbed
JointPPO's Performance
Ablation Studies
PPO Training Epochs and Clipping Parameter
...and 6 more sections

Figures (9)

Figure 1: Different learning paradigms in MARL.
Figure 2: Action generation process.
Figure 3: Illustration of the general framework of solving MARL using sequence generation model.
Figure 4: Architecture of the Transformer-based policy network.
Figure 5: Illustration of the graph generative model based mechanism.
...and 4 more figures

JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning

TL;DR

Abstract

JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)