Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning

Liu Weiwei; Hu Wenxuan; Jing Wei; Lei Lanxin; Gao Lingping; Liu Yong

Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning

Liu Weiwei, Hu Wenxuan, Jing Wei, Lei Lanxin, Gao Lingping, Liu Yong

TL;DR

This work tackles the challenge of diverse driving styles in multi-agent reinforcement learning for autonomous driving by introducing the Personality Modeling Network (PeMN). PeMN decomposes each agent's reward into self- and cooperative components with a cooperation value function, balanced by a personality parameter $\alpha$, and it trains diverse background traffic to improve ego-vehicle generalization. Using centralized training with decentralized execution (MAPPO), the framework learns both self and cooperative value functions, enabling adaptive cooperation in highly interactive scenarios. Empirical results in MetaDrive show that PeMN enhances generalization to unseen personalities, improves robustness against diverse traffic, and yields safer, more efficient driving policies by leveraging diverse interaction data during training.

Abstract

Autonomous vehicles trained through Multi-Agent Reinforcement Learning (MARL) have shown impressive results in many driving scenarios. However, the performance of these trained policies can be impacted when faced with diverse driving styles and personalities, particularly in highly interactive situations. This is because conventional MARL algorithms usually operate under the assumption of fully cooperative behavior among all agents and focus on maximizing team rewards during training. To address this issue, we introduce the Personality Modeling Network (PeMN), which includes a cooperation value function and personality parameters to model the varied interactions in high-interactive scenarios. The PeMN also enables the training of a background traffic flow with diverse behaviors, thereby improving the performance and generalization of the ego vehicle. Our extensive experimental studies, which incorporate different personality parameters in high-interactive driving scenarios, demonstrate that the personality parameters effectively model diverse driving styles and that policies trained with PeMN demonstrate better generalization compared to traditional MARL methods.

Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning

TL;DR

, and it trains diverse background traffic to improve ego-vehicle generalization. Using centralized training with decentralized execution (MAPPO), the framework learns both self and cooperative value functions, enabling adaptive cooperation in highly interactive scenarios. Empirical results in MetaDrive show that PeMN enhances generalization to unseen personalities, improves robustness against diverse traffic, and yields safer, more efficient driving policies by leveraging diverse interaction data during training.

Abstract

Paper Structure (18 sections, 11 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 11 equations, 8 figures, 1 table, 1 algorithm.

Introduction
RELATED WORK
RL and MARL Algorithms
Diverse Behaviour Modeling of Agents with RL
Application of RL in autonomous driving
BACKGROUND
Reward Determines Behavior
METHODS
Decompose action-value function
Personality Parameterized Reward Function
Training
Results and Discussions
Experiment Setup
Results Comparison
Diverse Behaviours and Diverse Interaction Data with Personality Differences
...and 3 more sections

Figures (8)

Figure 1: Highly interactive driving scenarios: meeting with different road geometry.
Figure 2: The algorithm framework of PeMN.
Figure 3: Performance of PeMN and baseline algorithms in different scenarios. In the figure, PeMN[1.0, 0.4] represents the personality parameters $\alpha_l, \alpha_r$ of the left and right cars are 1.0 and 0.4, respectively. [1, 0.4] is a personality pair.
Figure 4: The simulated vehicle trajectory is visualized to demonstrate the disparities in driving behavior between PeMN and baseline algorithms. The color from dark to light indicates the position of the vehicle in the past 25 steps.
Figure 5: The difference between the success rate and the collision rate with different personality pairs. Darker color represents higher success rate or collision rate.
...and 3 more figures

Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning

TL;DR

Abstract

Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)