Discretionary Lane-Change Decision and Control via Parameterized Soft Actor-Critic for Hybrid Action Space
Yuan Lin, Xiao Liu, Zishun Zheng
TL;DR
The paper tackles discretionary lane-change decisions in autonomous driving by formulating a hybrid-action reinforcement learning approach using Parameterized Soft Actor–Critic (PASAC) and comparing it against Model Predictive Control (MPC). By implementing PASAC in SUMO and designing discrete lane-change decisions with continuous acceleration, the authors demonstrate that PASAC can achieve zero collision rates like MPC while delivering higher average speeds and more favorable rewards, albeit with generalization challenges at higher traffic densities. A detailed comparison includes training dynamics, testing outcomes, and generalization analyses across traffic densities, along with a clear MPC baseline that shares the same cost structure. Overall, PASAC shows competitive performance and potential practical benefits for real-time, hybrid-action lane-change control, while highlighting areas for theoretical and robustness improvements.
Abstract
This study focuses on a crucial task in the field of autonomous driving, autonomous lane change. Autonomous lane change plays a pivotal role in improving traffic flow, alleviating driver burden, and reducing the risk of traffic accidents. However, due to the complexity and uncertainty of lane-change scenarios, the functionality of autonomous lane change still faces challenges. In this research, we conducted autonomous lane-change simulations using both deep reinforcement learning (DRL) and model predictive control (MPC). Specifically, we used the parameterized soft actor--critic (PASAC) algorithm to train a DRL-based lane-change strategy to output both discrete lane-change decisions and continuous longitudinal vehicle acceleration. We also used MPC for lane selection based on the smallest predictive car-following costs for the different lanes. For the first time, we compared the performance of DRL and MPC in the context of lane-change decisions. The simulation results indicated that, under the same reward/cost function and traffic flow, both MPC and PASAC achieved a collision rate of 0%. PASAC demonstrated a comparable performance to MPC in terms of average rewards/costs and vehicle speeds.
