Table of Contents
Fetching ...

SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning

Jianye Xu, Pan Hu, Bassam Alrifaee

TL;DR

SigmaRL tackles the challenge of sample efficiency and generalization in multi-agent reinforcement learning for motion planning under partial observability. It introduces five information-dense observation strategies and implements them in a decentralized MARL framework based on multi-agent PPO with a centralized critic, evaluated in a VMAS-based environment with a nonlinear single-track vehicle model. The key findings are that training can be completed in under one hour on a CPU and that the agents exhibit zero-shot generalization to unseen scenarios, with the full five-strategy model performing best. These results highlight the importance of carefully designed observations for enabling scalable, generalizable motion planning in connected and automated vehicles.

Abstract

This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same scenarios seen during training. Various methods have been proposed to address these challenges, including experience replay and regularization. However, how observation design in RL affects sample efficiency and generalization remains an under-explored area. We address this gap by proposing five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios. We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout. Incorporating these information-dense observations reduces training times to under one hour on a single CPU, and the evaluation results reveal that our RL agents can effectively zero-shot generalize. Code: github.com/bassamlab/SigmaRL

SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning

TL;DR

SigmaRL tackles the challenge of sample efficiency and generalization in multi-agent reinforcement learning for motion planning under partial observability. It introduces five information-dense observation strategies and implements them in a decentralized MARL framework based on multi-agent PPO with a centralized critic, evaluated in a VMAS-based environment with a nonlinear single-track vehicle model. The key findings are that training can be completed in under one hour on a CPU and that the agents exhibit zero-shot generalization to unseen scenarios, with the full five-strategy model performing best. These results highlight the importance of carefully designed observations for enabling scalable, generalizable motion planning in connected and automated vehicles.

Abstract

This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same scenarios seen during training. Various methods have been proposed to address these challenges, including experience replay and regularization. However, how observation design in RL affects sample efficiency and generalization remains an under-explored area. We address this gap by proposing five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios. We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout. Incorporating these information-dense observations reduces training times to under one hour on a single CPU, and the evaluation results reveal that our RL agents can effectively zero-shot generalize. Code: github.com/bassamlab/SigmaRL
Paper Structure (21 sections, 2 equations, 5 figures, 1 table)

This paper contains 21 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the proposed decentralized marl framework SigmaRL. $t$: time step; $i \in \{1,\dots,N\}$: agent index.
  • Figure 2: Kinematic single-track model. $C$: cg; $x, y$: $x$- and $y$-coordinates; $v$: velocity; $\beta$: slide slip angle; $\psi$: yaw angle; $\delta$: steering angle; $L$: wheelbase.
  • Figure 3: Observations of agent $i$. Red: efficient observation (ours). Green: inefficient observation (not ours).
  • Figure 4: Training and testing scenarios. Train only on the intersection of the CPM Scenario (see gray area). Test in all four scenarios, with the depicted numbers of agents.
  • Figure 5: Mean reward per episode when training the models $_{i \in \{0,\dots,\}}$. Model $_0$ incorporates all five observation-design strategies we proposed in \ref{['sec:observationDesign']}, whereas models $_1$ to $_5$ each omit one of these strategies.

Theorems & Definitions (4)

  • Definition 1: Sample Efficiency
  • Definition 2: Adapted from hansen2024dynamic
  • Remark 1: Generalization and Sample Efficiency
  • Remark 2: Practicability