Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization
Yuan Lin, Antai Xie, Xiao Liu
TL;DR
The paper tackles the sim-to-real transfer problem for reinforcement learning-based autonomous driving by showing that domain randomization of rule-based microscopic traffic flows improves generalization to diverse traffic scenes. It randomizes IDM and SL2015 parameters with Gaussian distributions to create varied driving behaviours during training in SUMO, and compares against non-randomized and LimSim high-fidelity traffic flows in merging and freeway scenarios. Policies trained under domain-randomized traffic achieve high success rates and rewards across multiple traffic-flow types and densities, while those trained without randomization struggle under domain mismatch; high-fidelity traffic flow provides a stronger testing environment but is less effective for training due to longer run times and poorer generalization. The work demonstrates a practical path toward robust sim-to-real transfer for autonomous vehicle decision and control, with future work extending validation to real vehicles.
Abstract
Most of the current studies on autonomous vehicle decision-making and control tasks based on reinforcement learning are conducted in simulated environments. The training and testing of these studies are carried out under rule-based microscopic traffic flow, with little consideration of migrating them to real or near-real environments to test their performance. It may lead to a degradation in performance when the trained model is tested in more realistic traffic scenes. In this study, we propose a method to randomize the driving style and behavior of surrounding vehicles by randomizing certain parameters of the car-following model and the lane-changing model of rule-based microscopic traffic flow in SUMO. We trained policies with deep reinforcement learning algorithms under the domain randomized rule-based microscopic traffic flow in freeway and merging scenes, and then tested them separately in rule-based microscopic traffic flow and high-fidelity microscopic traffic flow. Results indicate that the policy trained under domain randomization traffic flow has significantly better success rate and calculative reward compared to the models trained under other microscopic traffic flows.
