Cooperative Autonomous Driving in Diverse Behavioral Traffic: A Heterogeneous Graph Reinforcement Learning Approach
Qi Liu, Xueyuan Li, Zirui Li, Juhui Gim
TL;DR
This work tackles autonomous-vehicle decision-making in heterogeneous traffic by formulating the problem as an MDP $(\mathcal{S},\mathcal{A},P,R,\gamma)$ and introducing a heterogeneous graph representation to capture interactions among the ego vehicle and HVs. It presents HGNN-EM, a four-part neural architecture (pre-encoders, expert model, Relational Graph Attention Network, and policy fusion) that encodes heterogeneous traffic and fuses expert guidance with a GRL policy, trained via Double DQN. Case studies at a four-way intersection show zero collisions and superior safety, efficiency, stability, and convergence speed compared with baselines, while maintaining real-time performance. The approach demonstrates the value of integrating domain knowledge with graph-based perception to robustly handle diverse driving styles in traffic, paving the way for more resilient AV decision-making in complex environments.
Abstract
Navigating heterogeneous traffic environments with diverse driving styles poses a significant challenge for autonomous vehicles (AVs) due to their inherent complexity and dynamic interactions. This paper addresses this challenge by proposing a heterogeneous graph reinforcement learning (GRL) framework enhanced with an expert system to improve AV decision-making performance. Initially, a heterogeneous graph representation is introduced to capture the intricate interactions among vehicles. Then, a heterogeneous graph neural network with an expert model (HGNN-EM) is proposed to effectively encode diverse vehicle features and produce driving instructions informed by domain-specific knowledge. Moreover, the double deep Q-learning (DDQN) algorithm is utilized to train the decision-making model. A case study on a typical four-way intersection, involving various driving styles of human vehicles (HVs), demonstrates that the proposed method has superior performance over several baselines regarding safety, efficiency, stability, and convergence rate, all while maintaining favorable real-time performance.
