Table of Contents
Fetching ...

Safety Guaranteed Robust Multi-Agent Reinforcement Learning with Hierarchical Control for Connected and Automated Vehicles

Zhili Zhang, H M Sabbir Ahmad, Ehsan Sabouni, Yanchao Sun, Furong Huang, Wenchao Li, Fei Miao

TL;DR

The paper tackles safe coordination of CAVs in mixed traffic under imperfect state information by introducing Safe-RMM, a hierarchical framework that combines a Robust MAPPO policy with a robust MPC controller leveraging CBFs. The high-level policy optimizes discrete actions under worst-case state perturbations via a worst-case Q-network, while the low-level MPC-CBF module enforces collision-free safety and tracks high-level plans despite uncertainty. Empirical CARLA experiments across intersections and highways show Safe-RMM achieves zero collisions under tested perturbations and outperforms baselines in safety and efficiency, with ablations highlighting the complementary benefits of robust MARL and robust MPC. The work demonstrates that integrating learning-based planning with safety-guaranteed execution yields superior performance in complex, uncertain multi-agent driving environments, offering a practical path toward reliable CAV deployment in mixed traffic.

Abstract

We address the problem of coordination and control of Connected and Automated Vehicles (CAVs) in the presence of imperfect observations in mixed traffic environment. A commonly used approach is learning-based decision-making, such as reinforcement learning (RL). However, most existing safe RL methods suffer from two limitations: (i) they assume accurate state information, and (ii) safety is generally defined over the expectation of the trajectories. It remains challenging to design optimal coordination between multi-agents while ensuring hard safety constraints under system state uncertainties (e.g., those that arise from noisy sensor measurements, communication, or state estimation methods) at every time step. We propose a safety guaranteed hierarchical coordination and control scheme called Safe-RMM to address the challenge. Specifically, the high-level coordination policy of CAVs in mixed traffic environment is trained by the Robust Multi-Agent Proximal Policy Optimization (RMAPPO) method. Though trained without uncertainty, our method leverages a worst-case Q network to ensure the model's robust performances when state uncertainties are present during testing. The low-level controller is implemented using model predictive control (MPC) with robust Control Barrier Functions (CBFs) to guarantee safety through their forward invariance property. We compare our method with baselines in different road networks in the CARLA simulator. Results show that our method provides best evaluated safety and efficiency in challenging mixed traffic environments with uncertainties.

Safety Guaranteed Robust Multi-Agent Reinforcement Learning with Hierarchical Control for Connected and Automated Vehicles

TL;DR

The paper tackles safe coordination of CAVs in mixed traffic under imperfect state information by introducing Safe-RMM, a hierarchical framework that combines a Robust MAPPO policy with a robust MPC controller leveraging CBFs. The high-level policy optimizes discrete actions under worst-case state perturbations via a worst-case Q-network, while the low-level MPC-CBF module enforces collision-free safety and tracks high-level plans despite uncertainty. Empirical CARLA experiments across intersections and highways show Safe-RMM achieves zero collisions under tested perturbations and outperforms baselines in safety and efficiency, with ablations highlighting the complementary benefits of robust MARL and robust MPC. The work demonstrates that integrating learning-based planning with safety-guaranteed execution yields superior performance in complex, uncertain multi-agent driving environments, offering a practical path toward reliable CAV deployment in mixed traffic.

Abstract

We address the problem of coordination and control of Connected and Automated Vehicles (CAVs) in the presence of imperfect observations in mixed traffic environment. A commonly used approach is learning-based decision-making, such as reinforcement learning (RL). However, most existing safe RL methods suffer from two limitations: (i) they assume accurate state information, and (ii) safety is generally defined over the expectation of the trajectories. It remains challenging to design optimal coordination between multi-agents while ensuring hard safety constraints under system state uncertainties (e.g., those that arise from noisy sensor measurements, communication, or state estimation methods) at every time step. We propose a safety guaranteed hierarchical coordination and control scheme called Safe-RMM to address the challenge. Specifically, the high-level coordination policy of CAVs in mixed traffic environment is trained by the Robust Multi-Agent Proximal Policy Optimization (RMAPPO) method. Though trained without uncertainty, our method leverages a worst-case Q network to ensure the model's robust performances when state uncertainties are present during testing. The low-level controller is implemented using model predictive control (MPC) with robust Control Barrier Functions (CBFs) to guarantee safety through their forward invariance property. We compare our method with baselines in different road networks in the CARLA simulator. Results show that our method provides best evaluated safety and efficiency in challenging mixed traffic environments with uncertainties.
Paper Structure (17 sections, 12 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 12 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Intersection (a, b): two HDVs run the red light when two CAVs are passing the box in Intersection scenario. CAVs are in green; HDVs are in red. (\ref{['fig:cross_safe']}): CAVs successfully avoid collision with our method during testing with state uncertainties; (\ref{['fig:cross_collision']}): CAV1 adopting benchmark (MCP) collides with an HDV because the perturbed location of HDV1 (with yellow triangle) misleads the CAV to believe collision will not happen.
  • Figure 2: Safe-RMM algorithm. The figure demonstrates an agent's decision pipeline while other agents share the same procedure. During training, both Value network and worst-Q network join the update of actor's policy. During testing, Agent $i$ takes states with uncertainty to its actor and samples the high level action $a_i$, which is subsequently handled by robust MPC controller for path-planning and generating safe control $\boldsymbol{u}_{i}$.
  • Figure 3: Illustration of the ellipsoidal safety set.
  • Figure 4: Intersection (a) and Highway (b) scenarios for testing. (\ref{['fig:cross_scene']}) 3 CAVs and 2 HDVs participate (left), CAVs could either dodge (middle) or collide with HDVs (right); (\ref{['fig:hwy_scene']}) 3 CAVs and 3 HDVs participate in a multi-lane Highway scenario, with one HDV suddenly brakes.
  • Figure 5: Discounted Efficiency Returns during Training in Intersection.
  • ...and 1 more figures