Table of Contents
Fetching ...

Performance Comparison of Deep RL Algorithms for Mixed Traffic Cooperative Lane-Changing

Xue Yao, Shengren Hou, Serge P. Hoogendoorn, Simeon C. Calvert

TL;DR

The problem addressed is lane-changing in mixed traffic under HV uncertainty. The authors formulate cooperative lane-changing in mixed traffic (CLCMT) as a Markov decision process with continuous actions and evaluate four state-of-the-art DRL algorithms (DDPG, TD3, SAC, PPO) on this task. The main contributions are extending CLCMT to include HV uncertainty and microscopic HV–CAV interactions and providing a fair, empirical comparison that shows PPO generally outperforms the others in terms of safety, comfort, and ecology, with DDPG and TD3 also delivering strong efficiency; SAC struggles to converge. The results demonstrate PPO’s superior stability and policy quality for safe, comfortable, and eco-friendly LC maneuvers, suggesting practical guidance for deploying DRL-driven LC planning in mixed-traffic scenarios. The work highlights the importance of modeling HV heterogeneity and offers directions for future research on driving-behavior heterogeneity to further improve robustness.

Abstract

Lane-changing (LC) is a challenging scenario for connected and automated vehicles (CAVs) because of the complex dynamics and high uncertainty of the traffic environment. This challenge can be handled by deep reinforcement learning (DRL) approaches, leveraging their data-driven and model-free nature. Our previous work proposed a cooperative lane-changing in mixed traffic (CLCMT) mechanism based on TD3 to facilitate an optimal lane-changing strategy. This study enhances the current CLCMT mechanism by considering both the uncertainty of the human-driven vehicles (HVs) and the microscopic interactions between HVs and CAVs. The state-of-the-art (SOTA) DRL algorithms including DDPG, TD3, SAC, and PPO are utilized to deal with the formulated MDP with continuous actions. Performance comparison among the four DRL algorithms demonstrates that DDPG, TD3, and PPO algorithms can deal with uncertainty in traffic environments and learn well-performed LC strategies in terms of safety, efficiency, comfort, and ecology. The PPO algorithm outperforms the other three algorithms, regarding a higher reward, fewer exploration mistakes and crashes, and a more comfortable and ecology LC strategy. The improvements promise CLCMT mechanism greater advantages in the LC motion planning of CAVs.

Performance Comparison of Deep RL Algorithms for Mixed Traffic Cooperative Lane-Changing

TL;DR

The problem addressed is lane-changing in mixed traffic under HV uncertainty. The authors formulate cooperative lane-changing in mixed traffic (CLCMT) as a Markov decision process with continuous actions and evaluate four state-of-the-art DRL algorithms (DDPG, TD3, SAC, PPO) on this task. The main contributions are extending CLCMT to include HV uncertainty and microscopic HV–CAV interactions and providing a fair, empirical comparison that shows PPO generally outperforms the others in terms of safety, comfort, and ecology, with DDPG and TD3 also delivering strong efficiency; SAC struggles to converge. The results demonstrate PPO’s superior stability and policy quality for safe, comfortable, and eco-friendly LC maneuvers, suggesting practical guidance for deploying DRL-driven LC planning in mixed-traffic scenarios. The work highlights the importance of modeling HV heterogeneity and offers directions for future research on driving-behavior heterogeneity to further improve robustness.

Abstract

Lane-changing (LC) is a challenging scenario for connected and automated vehicles (CAVs) because of the complex dynamics and high uncertainty of the traffic environment. This challenge can be handled by deep reinforcement learning (DRL) approaches, leveraging their data-driven and model-free nature. Our previous work proposed a cooperative lane-changing in mixed traffic (CLCMT) mechanism based on TD3 to facilitate an optimal lane-changing strategy. This study enhances the current CLCMT mechanism by considering both the uncertainty of the human-driven vehicles (HVs) and the microscopic interactions between HVs and CAVs. The state-of-the-art (SOTA) DRL algorithms including DDPG, TD3, SAC, and PPO are utilized to deal with the formulated MDP with continuous actions. Performance comparison among the four DRL algorithms demonstrates that DDPG, TD3, and PPO algorithms can deal with uncertainty in traffic environments and learn well-performed LC strategies in terms of safety, efficiency, comfort, and ecology. The PPO algorithm outperforms the other three algorithms, regarding a higher reward, fewer exploration mistakes and crashes, and a more comfortable and ecology LC strategy. The improvements promise CLCMT mechanism greater advantages in the LC motion planning of CAVs.
Paper Structure (11 sections, 16 equations, 5 figures, 2 tables)

This paper contains 11 sections, 16 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Lane-changing scenario in mixed traffic: An illustrative example
  • Figure 2: Compositions of leader-follower types.
  • Figure 3: Average (a) total reward, (b) time steps, (c) move-on reward, (d) lane-changing reward
  • Figure 4: Average (a) crash rate, (b) warning times, (c) comfort cost, (d) fuel consumption and emissions cost
  • Figure 5: Trajectories of ego vehicle based on (a) DDPG, (b) SAC, (c) TD3 and (d) PPO algorithm