Table of Contents
Fetching ...

Discretionary Lane-Change Decision and Control via Parameterized Soft Actor-Critic for Hybrid Action Space

Yuan Lin, Xiao Liu, Zishun Zheng

TL;DR

The paper tackles discretionary lane-change decisions in autonomous driving by formulating a hybrid-action reinforcement learning approach using Parameterized Soft Actor–Critic (PASAC) and comparing it against Model Predictive Control (MPC). By implementing PASAC in SUMO and designing discrete lane-change decisions with continuous acceleration, the authors demonstrate that PASAC can achieve zero collision rates like MPC while delivering higher average speeds and more favorable rewards, albeit with generalization challenges at higher traffic densities. A detailed comparison includes training dynamics, testing outcomes, and generalization analyses across traffic densities, along with a clear MPC baseline that shares the same cost structure. Overall, PASAC shows competitive performance and potential practical benefits for real-time, hybrid-action lane-change control, while highlighting areas for theoretical and robustness improvements.

Abstract

This study focuses on a crucial task in the field of autonomous driving, autonomous lane change. Autonomous lane change plays a pivotal role in improving traffic flow, alleviating driver burden, and reducing the risk of traffic accidents. However, due to the complexity and uncertainty of lane-change scenarios, the functionality of autonomous lane change still faces challenges. In this research, we conducted autonomous lane-change simulations using both deep reinforcement learning (DRL) and model predictive control (MPC). Specifically, we used the parameterized soft actor--critic (PASAC) algorithm to train a DRL-based lane-change strategy to output both discrete lane-change decisions and continuous longitudinal vehicle acceleration. We also used MPC for lane selection based on the smallest predictive car-following costs for the different lanes. For the first time, we compared the performance of DRL and MPC in the context of lane-change decisions. The simulation results indicated that, under the same reward/cost function and traffic flow, both MPC and PASAC achieved a collision rate of 0%. PASAC demonstrated a comparable performance to MPC in terms of average rewards/costs and vehicle speeds.

Discretionary Lane-Change Decision and Control via Parameterized Soft Actor-Critic for Hybrid Action Space

TL;DR

The paper tackles discretionary lane-change decisions in autonomous driving by formulating a hybrid-action reinforcement learning approach using Parameterized Soft Actor–Critic (PASAC) and comparing it against Model Predictive Control (MPC). By implementing PASAC in SUMO and designing discrete lane-change decisions with continuous acceleration, the authors demonstrate that PASAC can achieve zero collision rates like MPC while delivering higher average speeds and more favorable rewards, albeit with generalization challenges at higher traffic densities. A detailed comparison includes training dynamics, testing outcomes, and generalization analyses across traffic densities, along with a clear MPC baseline that shares the same cost structure. Overall, PASAC shows competitive performance and potential practical benefits for real-time, hybrid-action lane-change control, while highlighting areas for theoretical and robustness improvements.

Abstract

This study focuses on a crucial task in the field of autonomous driving, autonomous lane change. Autonomous lane change plays a pivotal role in improving traffic flow, alleviating driver burden, and reducing the risk of traffic accidents. However, due to the complexity and uncertainty of lane-change scenarios, the functionality of autonomous lane change still faces challenges. In this research, we conducted autonomous lane-change simulations using both deep reinforcement learning (DRL) and model predictive control (MPC). Specifically, we used the parameterized soft actor--critic (PASAC) algorithm to train a DRL-based lane-change strategy to output both discrete lane-change decisions and continuous longitudinal vehicle acceleration. We also used MPC for lane selection based on the smallest predictive car-following costs for the different lanes. For the first time, we compared the performance of DRL and MPC in the context of lane-change decisions. The simulation results indicated that, under the same reward/cost function and traffic flow, both MPC and PASAC achieved a collision rate of 0%. PASAC demonstrated a comparable performance to MPC in terms of average rewards/costs and vehicle speeds.
Paper Structure (21 sections, 22 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 22 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure S1: (a) The framework on the left is the standard SAC architecture designed for continuous operation. The actor outputs mean and standard deviation vectors $\mu$ and $\sigma$, which are utilized for injecting standard normal noise $\epsilon$ and applying the tanh nonlinearity (to keep the actions within a bounded range). The critic estimates the corresponding Q value based on the state and the actor's action $a_c$. (b) On the right, we use the parameterized SAC structure. including the mean $\mu$ and the variance $\sigma$ for the continuous components. It outputs continuous actions $a_c$ and $k_d$. The largest $k_d$ among continuous actions is selected for the discrete action. The critic network still takes the state $s$, continuous actions $a_c$ and $k_d$ as inputs.
  • Figure S2: Lane change scenario in SUMO.
  • Figure S3: The reward(cost) between MPC and PASAC.
  • Figure S4: Lane change in the simulation.
  • Figure S5: Acceleration and jerk during the simulation.
  • ...and 1 more figures