Table of Contents
Fetching ...

Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

Wei Zhou, Dong Chen, Jun Yan, Zhaojian Li, Huilin Yin, Wanchen Ge

TL;DR

This paper addresses cooperative lane changing for connected autonomous vehicles in mixed traffic by formulating it as a multi-agent reinforcement learning problem. It introduces MA2C, a parameter-sharing, multi-agent actor-critic framework with a novel local reward design that jointly optimizes safety, efficiency, and passenger comfort. Through comprehensive experiments across three traffic densities and varying HDV aggressiveness, MA2C consistently outperforms state-of-the-art MARL baselines in key metrics and demonstrates robust adaptability and interpretable cooperative behaviors. The work advances practical autonomous driving in realistic, heterogeneous traffic by balancing performance with ride quality and safety considerations, with potential implications for scalable deployment in mixed-traffic highways.

Abstract

Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.

Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

TL;DR

This paper addresses cooperative lane changing for connected autonomous vehicles in mixed traffic by formulating it as a multi-agent reinforcement learning problem. It introduces MA2C, a parameter-sharing, multi-agent actor-critic framework with a novel local reward design that jointly optimizes safety, efficiency, and passenger comfort. Through comprehensive experiments across three traffic densities and varying HDV aggressiveness, MA2C consistently outperforms state-of-the-art MARL baselines in key metrics and demonstrates robust adaptability and interpretable cooperative behaviors. The work advances practical autonomous driving in realistic, heterogeneous traffic by balancing performance with ride quality and safety considerations, with potential implications for scalable deployment in mixed-traffic highways.

Abstract

Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.

Paper Structure

This paper contains 19 sections, 5 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of the considered lane-changing scenario (green: AVs, blue: HDVs, arrow curve: a possible trajectory of the ego vehicle AV1 to make the lane change).
  • Figure 2: The architecture of the proposed MA2C network with shared actor-critic network design, where $x$ and $y$ are the longitudinal and lateral position of the observed vehicle relative to the ego vehicle, and $v_x$ and $v_y$ are the longitudinal and lateral speed of the observed vehicle relative to the ego vehicle.
  • Figure 3: Performance comparisons between local and global reward designs. The shaded region denotes the standard deviation over 2 random seeds.
  • Figure 4: Performance comparisons between with and without actor-critic network sharing.
  • Figure 5: Performance comparisons of acceleration between the reward design with or without comfort measurement.
  • ...and 4 more figures