Table of Contents
Fetching ...

Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware

Keshawn Smith, Zhili Zhang, H M Sabbir Ahmad, Ehsan Sabouni, Maniak Mondal, Song Han, Wenchao Li, Fei Miao

TL;DR

The paper tackles the sim-to-real gap in multi-agent reinforcement learning for autonomous vehicles by introducing RSR-RSMARL, a Real-Sim-Real framework that couples robust MARL with a modular Safety Shield based on Control Barrier Functions. It aligns state and action representations with real-world sensing and actuation, explicitly models V2V communication delays during training, and supports both PID and MPC low-level controllers to enforce safety during execution. The approach is validated through extensive CARLA simulations and zero-shot transfers to 1/10th-scale hardware, demonstrating zero collisions, improved safety, and coordinated behaviors across varying scenarios. The work highlights that integrating a safety shield with communication-aware training, and selecting appropriate low-level control backends, is crucial for practical, scalable multi-agent autonomy in real-world traffic.

Abstract

Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.

Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware

TL;DR

The paper tackles the sim-to-real gap in multi-agent reinforcement learning for autonomous vehicles by introducing RSR-RSMARL, a Real-Sim-Real framework that couples robust MARL with a modular Safety Shield based on Control Barrier Functions. It aligns state and action representations with real-world sensing and actuation, explicitly models V2V communication delays during training, and supports both PID and MPC low-level controllers to enforce safety during execution. The approach is validated through extensive CARLA simulations and zero-shot transfers to 1/10th-scale hardware, demonstrating zero collisions, improved safety, and coordinated behaviors across varying scenarios. The work highlights that integrating a safety shield with communication-aware training, and selecting appropriate low-level control backends, is crucial for practical, scalable multi-agent autonomy in real-world traffic.

Abstract

Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.

Paper Structure

This paper contains 14 sections, 1 equation, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the RSR-RSMARL framework. Real-world sensor data informs the state and action design used in simulation training. The MARL training process incorporates safety shields and communication delays. Trained policies are then transferred back to hardware, where high-level actions are converted into safe control inputs.
  • Figure 2: Policy execution pipeline for agent $i$, with all other agents following the same procedure.During training, both the critic and the worst-case Q network update the actor’s policy under observation delays. At test time, the actor samples a high-level action $a_i$ from uncertain observations, which is first filtered by the CBF-based Safety Shield to ensure safety. If a safe action is available, it is passed to the Controller Module, which integrates sensor data, trajectory planning, and a hybrid low-level controller (PID or MPC) to generate safe control commands $\boldsymbol{u}_i$. If no safe action exists, the system triggers an emergency stop. This layered architecture ensures robust decision-making, safety enforcement, and reliable real-world execution.
  • Figure 3: Hardware evaluation setting with 1/10th-scale F1TENTH vehicles. Green boxes denote ego vehicles running RSR-RSMARL policies, while red boxes indicate obstacle vehicles used to vary scenario complexity.
  • Figure 4: Discounted Efficiency Returns during Training with PID Controller. The RSR-RSMARL method (orange curve) consistently achieves the highest discounted returns compared to all baselines, highlighting its superior robustness and safety during training.
  • Figure 5: Discounted Efficiency Returns during Training with MPC Controller. Compared to the baseline RSMARL policy (blue curve), our RSR-RSMARL method (orange curve) achieves consistently higher returns and demonstrates smoother convergence, confirming the benefit of integrating the CBF-based Safety Shield and real-sim-real training.
  • ...and 1 more figures