Table of Contents
Fetching ...

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

Jianye Xu, Omar Sobhy, Bassam Alrifaee

TL;DR

This work proposes an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address non-stationarity in cooperative settings, and is enabled by a proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others.

Abstract

Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a predefined or heuristic priority assignment, XP-MARL learns priority-assignment policies with an auxiliary MARL problem, leading to a joint learning scheme. Experiments in a motion-planning scenario involving Connected and Automated Vehicles (CAVs) demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and outperforms a state-of-the-art approach, which improves the baseline by only 12.8%. Code: github.com/cas-lab-munich/sigmarl

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

TL;DR

This work proposes an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address non-stationarity in cooperative settings, and is enabled by a proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others.

Abstract

Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a predefined or heuristic priority assignment, XP-MARL learns priority-assignment policies with an auxiliary MARL problem, leading to a joint learning scheme. Experiments in a motion-planning scenario involving Connected and Automated Vehicles (CAVs) demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and outperforms a state-of-the-art approach, which improves the baseline by only 12.8%. Code: github.com/cas-lab-munich/sigmarl
Paper Structure (20 sections, 4 equations, 5 figures, 2 algorithms)

This paper contains 20 sections, 4 equations, 5 figures, 2 algorithms.

Figures (5)

  • Figure 1: A navigation game with two agents. Each agent $i \in \{1, 2\}$ has three actions: turn left $_{\text{left}}^{(i)}$, go straight $_{\text{straight}}^{(i)}$, and turn right $_{\text{right}}^{(i)}$. Right side shows team rewards.
  • Figure 2: Our XP-MARL framework, time arguments omitted. $^{(i)} / ^{(i)} / \glslink{rl:policy}{\pi}^{(i)} / \glslink{rl:priRank}{\mathcal{R}_{\textbf{P},i}}:$ observation / action / policy / priority rank of agent $i$, $i \in = \left\{1,\dots,\right\}$. $\mathop{\mathrm{arg\,sort}}\limits$: returns the indices that sort the priority scores $(_{\textbf{P}}^{(i)})_{i \in }$ in descending order.
  • Figure 3: CPM scenario. Train only on the intersection (gray area) with 4 agents. Test on the entire map with 15 agents.
  • Figure 4: Training curves and testing results of 32 experiments.
  • Figure 5: Two agents avoiding a collision at an on-ramp by dynamically switching their priorities.

Theorems & Definitions (2)

  • Definition 1
  • Remark 1