RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

Jordan Poots

RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

Jordan Poots

TL;DR

This work tackles parallel-style autonomous on-ramp merging in human-controlled traffic by introducing a reinforcement learning framework with a Social Value Orientation-based reward that balances ego and surrounding vehicle utilities. The method uses PPO within an OpenAI Gym-SUMO environment on a two-lane highway with taper and parallel ramps, and evaluates across a range of SVO values, finding that a prosocial setting around $\varphi = \pi/4$ yields socially courteous, collision-free merging. The results show that incorporating SV considerations improves safety and social compatibility compared to ego-centric DRL baselines, with performance maintained across densities. The approach highlights the practical significance of social considerations in autonomous driving and suggests future work to generalize to diverse networks and sensing constraints, including higher-fidelity simulators like CARLA.

Abstract

Autonomous parallel-style on-ramp merging in human controlled traffic continues to be an existing issue for autonomous vehicle control. Existing non-learning based solutions for vehicle control rely on rules and optimization primarily. These methods have been seen to present significant challenges. Recent advancements in Deep Reinforcement Learning have shown promise and have received significant academic interest however the available learning based approaches show inadequate attention to other highway vehicles and often rely on inaccurate road traffic assumptions. In addition, the parallel-style case is rarely considered. A novel learning based model for acceleration and lane change decision making that explicitly considers the utility to both the ego vehicle and its surrounding vehicles which may be cooperative or uncooperative to produce behaviour that is socially acceptable is proposed. The novel reward function makes use of Social Value Orientation to weight the vehicle's level of social cooperation and is divided into ego vehicle and surrounding vehicle utility which are weighted according to the model's designated Social Value Orientation. A two-lane highway with an on-ramp divided into a taper-style and parallel-style section is considered. Simulation results indicated the importance of considering surrounding vehicles in reward function design and show that the proposed model matches or surpasses those in literature in terms of collisions while also introducing socially courteous behaviour avoiding near misses and anti-social behaviour through direct consideration of the effect of merging on surrounding vehicles.

RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

TL;DR

yields socially courteous, collision-free merging. The results show that incorporating SV considerations improves safety and social compatibility compared to ego-centric DRL baselines, with performance maintained across densities. The approach highlights the practical significance of social considerations in autonomous driving and suggests future work to generalize to diverse networks and sensing constraints, including higher-fidelity simulators like CARLA.

Abstract

Paper Structure (23 sections, 11 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 11 equations, 10 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Heuristic control
Optimization approaches
Reinforcement leaning approaches
Social value orientation
Contribution
Background
Reinforcement learning
Proximal policy optimisation
Methodology
Road network
Action space
State space
Reward function
...and 8 more sections

Figures (10)

Figure 1: A comparison between on-ramps with a taper-style section only and on-ramps with both a taper-style and parallel-style section.
Figure 2: The social value orientation ring proposed by Griesinger et al. Griesinger1973 The highlighted quadrant was used in the reward function design.
Figure 3: Reinforcement learning training cycle visualisation. The environment and agent are both indicated on the diagram.
Figure 4: A visualisation of the environment observation space used. Human controlled vehicles are shown in red and the ego vehicle is shown in yellow. V represents a velocity and G represents a gap. X is the longitudinal distance to the end of the merging lane and Y is the vehicle's distance from the centre of the lane.
Figure 5: The social value orientation ring quadrant to be used within the reward function.
...and 5 more figures

RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

TL;DR

Abstract

RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (10)