XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

Jianye Xu; Omar Sobhy; Bassam Alrifaee

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

Jianye Xu, Omar Sobhy, Bassam Alrifaee

TL;DR

This work proposes an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address non-stationarity in cooperative settings, and is enabled by a proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others.

Abstract

Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a predefined or heuristic priority assignment, XP-MARL learns priority-assignment policies with an auxiliary MARL problem, leading to a joint learning scheme. Experiments in a motion-planning scenario involving Connected and Automated Vehicles (CAVs) demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and outperforms a state-of-the-art approach, which improves the baseline by only 12.8%. Code: github.com/cas-lab-munich/sigmarl

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

TL;DR

Abstract

Paper Structure (20 sections, 4 equations, 5 figures, 2 algorithms)

This paper contains 20 sections, 4 equations, 5 figures, 2 algorithms.

Introduction
Related Work
Centralized Critic
Opponent Modeling
Paper Contributions
Notation
Paper Structure
Problem Formulation
Hypothesis on Prioritization
Our XP-MARL Framework
Bi-Stage marl Problem
Priority-Assignment Stage
Decision-Making Stage
Overview of XP-MARL
Experiments
...and 5 more sections

Figures (5)

Figure 1: A navigation game with two agents. Each agent $i \in \{1, 2\}$ has three actions: turn left $_{\text{left}}^{(i)}$, go straight $_{\text{straight}}^{(i)}$, and turn right $_{\text{right}}^{(i)}$. Right side shows team rewards.
Figure 2: Our XP-MARL framework, time arguments omitted. $^{(i)} / ^{(i)} / \glslink{rl:policy}{\pi}^{(i)} / \glslink{rl:priRank}{\mathcal{R}_{\textbf{P},i}}:$ observation / action / policy / priority rank of agent $i$, $i \in = \left\{1,\dots,\right\}$. $\mathop{\mathrm{arg\,sort}}\limits$: returns the indices that sort the priority scores $(_{\textbf{P}}^{(i)})_{i \in }$ in descending order.
Figure 3: CPM scenario. Train only on the intersection (gray area) with 4 agents. Test on the entire map with 15 agents.
Figure 4: Training curves and testing results of 32 experiments.
Figure 5: Two agents avoiding a collision at an on-ramp by dynamically switching their priorities.

Theorems & Definitions (2)

Definition 1
Remark 1

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

TL;DR

Abstract

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)