Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning
Xiaojun Bi, Mingjie He, Yiwen Sun
TL;DR
This work addresses multi-agent lane-changing under CVIS by introducing Mix Q-learning for Lane Changing (MQLC), a collaborative framework with two interconnected value networks: an individual Q and a global Q. It augments decision-making with a deep intent-prediction module and a trajectory-informed observation pipeline, and formalizes coordination through a consistency regularization loss $\\mathcal{L}_{reg}$ weighted by $\\lambda$ and an urgency-based priority $\\varepsilon$ guiding joint-action arbitration. The approach leverages a GCN-enabled value network and a GC N+MLP encoder to capture spatial and environmental context, enabling robust, safe, and efficient lane changes across varying traffic densities. Empirical results in highway-env with B-GAP show MQLC outperforming strong baselines (including QCOMBO and DQN variants) in safety and reward metrics, highlighting the benefits of explicit inter-agent coordination for traffic efficiency.
Abstract
Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but also hinders the vehicle's own normal driving in the long run. To address the aforementioned issue, this paper proposes a method named Mix Q-learning for Lane Changing(MQLC) that integrates a hybrid value Q network, taking into account both collective and individual benefits for the greater good. At the collective level, our method coordinates the individual Q and global Q networks by utilizing global information. This enables agents to effectively balance their individual interests with the collective benefit. At the individual level, we integrated a deep learning-based intent recognition module into our observation and enhanced the decision network. These changes provide agents with richer decision information and more accurate feature extraction for improved lane-changing decisions. This strategy enables the multi-agent system to learn and formulate optimal decision-making strategies effectively. Our MQLC model, through extensive experimental results, impressively outperforms other state-of-the-art multi-agent decision-making methods, achieving significantly safer and faster lane-changing decisions. The code is available at https:github.com/pku-smart-city/source_code/tree/main/MQLC.
