Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic
Wei Zhou, Dong Chen, Jun Yan, Zhaojian Li, Huilin Yin, Wanchen Ge
TL;DR
This paper addresses cooperative lane changing for connected autonomous vehicles in mixed traffic by formulating it as a multi-agent reinforcement learning problem. It introduces MA2C, a parameter-sharing, multi-agent actor-critic framework with a novel local reward design that jointly optimizes safety, efficiency, and passenger comfort. Through comprehensive experiments across three traffic densities and varying HDV aggressiveness, MA2C consistently outperforms state-of-the-art MARL baselines in key metrics and demonstrates robust adaptability and interpretable cooperative behaviors. The work advances practical autonomous driving in realistic, heterogeneous traffic by balancing performance with ride quality and safety considerations, with potential implications for scalable deployment in mixed-traffic highways.
Abstract
Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.
