Multi-Task Lane-Free Driving Strategy for Connected and Automated Vehicles: A Multi-Agent Deep Reinforcement Learning Approach

Mehran Berahman; Majid Rostami-Shahrbabaki; Klaus Bogenberger

Multi-Task Lane-Free Driving Strategy for Connected and Automated Vehicles: A Multi-Agent Deep Reinforcement Learning Approach

Mehran Berahman, Majid Rostami-Shahrbabaki, Klaus Bogenberger

TL;DR

The paper addresses decision-making for lane-free driving of connected and automated vehicles under non-stationary traffic. It introduces a centralized-training, decentralized-execution MADDPG framework that uses dynamic elliptical safety borders and inter-vehicle nudging and repulsion forces to enable overtaking, merging, and safe following in lane-free environments. Trained on a lane-free ring road in SUMO and evaluated on a 4 km freeway, the approach demonstrates substantial capacity gains (up to ~17,000 veh/h) and emergent lateral sorting by desired speed, while prioritizing safety and passenger comfort. Limitations include training in relatively low-density traffic and plans for transfer learning and cooperative multi-agent extensions to further enhance performance in denser networks.

Abstract

Deep reinforcement learning has shown promise in various engineering applications, including vehicular traffic control. The non-stationary nature of traffic, especially in the lane-free environment with more degrees of freedom in vehicle behaviors, poses challenges for decision-making since a wrong action might lead to a catastrophic failure. In this paper, we propose a novel driving strategy for Connected and Automated Vehicles (CAVs) based on a competitive Multi-Agent Deep Deterministic Policy Gradient approach. The developed multi-agent deep reinforcement learning algorithm creates a dynamic and non-stationary scenario, mirroring real-world traffic complexities and making trained agents more robust. The algorithm's reward function is strategically and uniquely formulated to cover multiple vehicle control tasks, including maintaining desired speeds, overtaking, collision avoidance, and merging and diverging maneuvers. Moreover, additional considerations for both lateral and longitudinal passenger comfort and safety criteria are taken into account. We employed inter-vehicle forces, known as nudging and repulsive forces, to manage the maneuvers of CAVs in a lane-free traffic environment. The proposed driving algorithm is trained and evaluated on lane-free roads using the Simulation of Urban Mobility platform. Experimental results demonstrate the algorithm's efficacy in handling different objectives, highlighting its potential to enhance safety and efficiency in autonomous driving within lane-free traffic environments.

Multi-Task Lane-Free Driving Strategy for Connected and Automated Vehicles: A Multi-Agent Deep Reinforcement Learning Approach

TL;DR

Abstract

Paper Structure (19 sections, 27 equations, 11 figures, 2 tables)

This paper contains 19 sections, 27 equations, 11 figures, 2 tables.

Introduction
Related works
Problem statement and formulation
Multi-agent deep reinforcement learning
Lane-free traffic environment
Artificial Safety Border
Calculation of nudge and repulsion forces
Problem formulation in Markov Decision Process framework
State Space definition
Action Space
Reward function
Proposed multi-agent deep reinforcement learning algorithm
Simulation set up and performance analysis
Setup for the DRL Algorithm
Evaluation of the training phase
...and 4 more sections

Figures (11)

Figure 1: Lane-free circular freeway used to train CAVs’ agent
Figure 2: Artificial semi-ellipses and their corresponding forces. N1,2 and N2,3 are lateral nudging forces applied to agent2 and agent3 from agent1 and agent2 respectively. R1,3 and R2,3 are lateral repulsive forces acting on agent1 and agent2 from agent3.
Figure 3: Definition of left and right freedom, $fr_l$ and $fr_r$, for CAVs in lane-free traffic freeway.
Figure 4: The structure and operational sequence of the proposed MADDPG algorithm. Each agent, denoted as $p$, where $p=1,2,\dotsc ,P$, comprises two primary components: an original actor network $\mu_p$, with a corresponding target actor network $\mu_p'$, as well as an original critic network $Q_p$, accompanied by a target critic network $Q_p'$
Figure 5: Average episode reward and collision occurrences during the MADDPG training process
...and 6 more figures

Multi-Task Lane-Free Driving Strategy for Connected and Automated Vehicles: A Multi-Agent Deep Reinforcement Learning Approach

TL;DR

Abstract

Multi-Task Lane-Free Driving Strategy for Connected and Automated Vehicles: A Multi-Agent Deep Reinforcement Learning Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (11)