RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging
Jordan Poots
TL;DR
This work tackles parallel-style autonomous on-ramp merging in human-controlled traffic by introducing a reinforcement learning framework with a Social Value Orientation-based reward that balances ego and surrounding vehicle utilities. The method uses PPO within an OpenAI Gym-SUMO environment on a two-lane highway with taper and parallel ramps, and evaluates across a range of SVO values, finding that a prosocial setting around $\varphi = \pi/4$ yields socially courteous, collision-free merging. The results show that incorporating SV considerations improves safety and social compatibility compared to ego-centric DRL baselines, with performance maintained across densities. The approach highlights the practical significance of social considerations in autonomous driving and suggests future work to generalize to diverse networks and sensing constraints, including higher-fidelity simulators like CARLA.
Abstract
Autonomous parallel-style on-ramp merging in human controlled traffic continues to be an existing issue for autonomous vehicle control. Existing non-learning based solutions for vehicle control rely on rules and optimization primarily. These methods have been seen to present significant challenges. Recent advancements in Deep Reinforcement Learning have shown promise and have received significant academic interest however the available learning based approaches show inadequate attention to other highway vehicles and often rely on inaccurate road traffic assumptions. In addition, the parallel-style case is rarely considered. A novel learning based model for acceleration and lane change decision making that explicitly considers the utility to both the ego vehicle and its surrounding vehicles which may be cooperative or uncooperative to produce behaviour that is socially acceptable is proposed. The novel reward function makes use of Social Value Orientation to weight the vehicle's level of social cooperation and is divided into ego vehicle and surrounding vehicle utility which are weighted according to the model's designated Social Value Orientation. A two-lane highway with an on-ramp divided into a taper-style and parallel-style section is considered. Simulation results indicated the importance of considering surrounding vehicles in reward function design and show that the proposed model matches or surpasses those in literature in terms of collisions while also introducing socially courteous behaviour avoiding near misses and anti-social behaviour through direct consideration of the effect of merging on surrounding vehicles.
