Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Liu Weiwei, Hu Wenxuan, Jing Wei, Lei Lanxin, Gao Lingping, Liu Yong
TL;DR
This work tackles the challenge of diverse driving styles in multi-agent reinforcement learning for autonomous driving by introducing the Personality Modeling Network (PeMN). PeMN decomposes each agent's reward into self- and cooperative components with a cooperation value function, balanced by a personality parameter $\alpha$, and it trains diverse background traffic to improve ego-vehicle generalization. Using centralized training with decentralized execution (MAPPO), the framework learns both self and cooperative value functions, enabling adaptive cooperation in highly interactive scenarios. Empirical results in MetaDrive show that PeMN enhances generalization to unseen personalities, improves robustness against diverse traffic, and yields safer, more efficient driving policies by leveraging diverse interaction data during training.
Abstract
Autonomous vehicles trained through Multi-Agent Reinforcement Learning (MARL) have shown impressive results in many driving scenarios. However, the performance of these trained policies can be impacted when faced with diverse driving styles and personalities, particularly in highly interactive situations. This is because conventional MARL algorithms usually operate under the assumption of fully cooperative behavior among all agents and focus on maximizing team rewards during training. To address this issue, we introduce the Personality Modeling Network (PeMN), which includes a cooperation value function and personality parameters to model the varied interactions in high-interactive scenarios. The PeMN also enables the training of a background traffic flow with diverse behaviors, thereby improving the performance and generalization of the ego vehicle. Our extensive experimental studies, which incorporate different personality parameters in high-interactive driving scenarios, demonstrate that the personality parameters effectively model diverse driving styles and that policies trained with PeMN demonstrate better generalization compared to traditional MARL methods.
