Table of Contents
Fetching ...

CHARMS: A Cognitive Hierarchical Agent for Reasoning and Motion Stylization in Autonomous Driving

Jingyi Wang, Duanfeng Chu, Zejian Deng, Liping Lu, Jinxiang Wang, Chen Sun

TL;DR

CHARMS addresses the need for interactive and diverse driving behaviors in autonomous systems by integrating Level-k cognitive hierarchy with Social Value Orientation (SVO). It employs a two-stage training pipeline—reinforcement learning pretraining via Double DQN followed by supervised fine-tuning on real trajectories from the HighD dataset—alongside Poisson cognitive hierarchy-based scenario generation to produce varied driving styles. The approach yields eight distinct Level-2 policies and enables controllable, realistic scenario generation in traffic, improving safety and realism compared to baselines. The work demonstrates improved ego-vehicle decision-making and richer environment dynamics, with potential impact on training, testing, and evaluating autonomous driving systems in closed-loop simulations.

Abstract

To address the challenge of insufficient interactivity and behavioral diversity in autonomous driving decision-making, this paper proposes a Cognitive Hierarchical Agent for Reasoning and Motion Stylization (CHARMS). By leveraging Level-k game theory, CHARMS captures human-like reasoning patterns through a two-stage training pipeline comprising reinforcement learning pretraining and supervised fine-tuning. This enables the resulting models to exhibit diverse and human-like behaviors, enhancing their decision-making capacity and interaction fidelity in complex traffic environments. Building upon this capability, we further develop a scenario generation framework that utilizes the Poisson cognitive hierarchy theory to control the distribution of vehicles with different driving styles through Poisson and binomial sampling. Experimental results demonstrate that CHARMS is capable of both making intelligent driving decisions as an ego vehicle and generating diverse, realistic driving scenarios as environment vehicles. The code for CHARMS is released at https://github.com/chuduanfeng/CHARMS.

CHARMS: A Cognitive Hierarchical Agent for Reasoning and Motion Stylization in Autonomous Driving

TL;DR

CHARMS addresses the need for interactive and diverse driving behaviors in autonomous systems by integrating Level-k cognitive hierarchy with Social Value Orientation (SVO). It employs a two-stage training pipeline—reinforcement learning pretraining via Double DQN followed by supervised fine-tuning on real trajectories from the HighD dataset—alongside Poisson cognitive hierarchy-based scenario generation to produce varied driving styles. The approach yields eight distinct Level-2 policies and enables controllable, realistic scenario generation in traffic, improving safety and realism compared to baselines. The work demonstrates improved ego-vehicle decision-making and richer environment dynamics, with potential impact on training, testing, and evaluating autonomous driving systems in closed-loop simulations.

Abstract

To address the challenge of insufficient interactivity and behavioral diversity in autonomous driving decision-making, this paper proposes a Cognitive Hierarchical Agent for Reasoning and Motion Stylization (CHARMS). By leveraging Level-k game theory, CHARMS captures human-like reasoning patterns through a two-stage training pipeline comprising reinforcement learning pretraining and supervised fine-tuning. This enables the resulting models to exhibit diverse and human-like behaviors, enhancing their decision-making capacity and interaction fidelity in complex traffic environments. Building upon this capability, we further develop a scenario generation framework that utilizes the Poisson cognitive hierarchy theory to control the distribution of vehicles with different driving styles through Poisson and binomial sampling. Experimental results demonstrate that CHARMS is capable of both making intelligent driving decisions as an ego vehicle and generating diverse, realistic driving scenarios as environment vehicles. The code for CHARMS is released at https://github.com/chuduanfeng/CHARMS.

Paper Structure

This paper contains 11 sections, 13 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of Our Approach with Existing Methods. (a) No interaction (rule-based models): The model treats other vehicles as obstacles, and follow a fixed behavior pattern. (b) Unidirectional interaction (game-theoretic or learning-based models): The ego vehicle reacts to the actions of other vehicles, reflecting a one-step reasoning process. (c) Bidirectional interaction (CHARMS): The ego vehicle first predicts how other vehicles anticipate its behavior, then infers their actions based on this prediction, and responds accordingly, exhibiting a two-step reasoning process.
  • Figure 2: Overall framework of CHARMS. Eight distinct behavior policies are obtained via a two-stage reinforcement learning process followed by supervised fine-tuning, and used to generate diverse and controllable scenarios based on PCH theory.
  • Figure 3: Reward curves of DRL training and loss curves of supervised fine-tuning.
  • Figure 4: A typical edge case generated by CHARMS with PCH theory.
  • Figure 5: Comparison of speed distributions during car-following across different behavior models.
  • ...and 1 more figures