Table of Contents
Fetching ...

Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic

Yuting Wang, Lu Liu, Maonan Wang, Xi Xiong

TL;DR

This work addresses the challenge of aligning autonomous-vehicle lane-changing behavior with human preferences in mixed traffic. It proposes a three-stage RLHF framework that pre-trains a lane-change policy with PPO, collects human trajectory preferences, and learns a reward model via an LSTM to guide policy refinement. By formulating lane changing as an MDP and validating conservative and aggressive driving styles in SUMO-based obstacle avoidance and mixed-autonomy scenarios, it demonstrates that RLHF can diversify driving styles while maintaining safety. The findings suggest RLHF as a viable approach to improve human-like interaction and integration of AVs into human-driven traffic, with potential for broader application in complex driving contexts.

Abstract

The burgeoning field of autonomous driving necessitates the seamless integration of autonomous vehicles (AVs) with human-driven vehicles, calling for more predictable AV behavior and enhanced interaction with human drivers. Human-like driving, particularly during lane-changing maneuvers on highways, is a critical area of research due to its significant impact on safety and traffic flow. Traditional rule-based decision-making approaches often fail to encapsulate the nuanced boundaries of human behavior in diverse driving scenarios, while crafting reward functions for learning-based methods introduces its own set of complexities. This study investigates the application of Reinforcement Learning from Human Feedback (RLHF) to emulate human-like lane-changing decisions in AVs. An initial RL policy is pre-trained to ensure safe lane changes. Subsequently, this policy is employed to gather data, which is then annotated by humans to train a reward model that discerns lane changes aligning with human preferences. This human-informed reward model supersedes the original, guiding the refinement of the policy to reflect human-like preferences. The effectiveness of RLHF in producing human-like lane changes is demonstrated through the development and evaluation of conservative and aggressive lane-changing models within obstacle-rich environments and mixed autonomy traffic scenarios. The experimental outcomes underscore the potential of RLHF to diversify lane-changing behaviors in AVs, suggesting its viability for enhancing the integration of AVs into the fabric of human-driven traffic.

Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic

TL;DR

This work addresses the challenge of aligning autonomous-vehicle lane-changing behavior with human preferences in mixed traffic. It proposes a three-stage RLHF framework that pre-trains a lane-change policy with PPO, collects human trajectory preferences, and learns a reward model via an LSTM to guide policy refinement. By formulating lane changing as an MDP and validating conservative and aggressive driving styles in SUMO-based obstacle avoidance and mixed-autonomy scenarios, it demonstrates that RLHF can diversify driving styles while maintaining safety. The findings suggest RLHF as a viable approach to improve human-like interaction and integration of AVs into human-driven traffic, with potential for broader application in complex driving contexts.

Abstract

The burgeoning field of autonomous driving necessitates the seamless integration of autonomous vehicles (AVs) with human-driven vehicles, calling for more predictable AV behavior and enhanced interaction with human drivers. Human-like driving, particularly during lane-changing maneuvers on highways, is a critical area of research due to its significant impact on safety and traffic flow. Traditional rule-based decision-making approaches often fail to encapsulate the nuanced boundaries of human behavior in diverse driving scenarios, while crafting reward functions for learning-based methods introduces its own set of complexities. This study investigates the application of Reinforcement Learning from Human Feedback (RLHF) to emulate human-like lane-changing decisions in AVs. An initial RL policy is pre-trained to ensure safe lane changes. Subsequently, this policy is employed to gather data, which is then annotated by humans to train a reward model that discerns lane changes aligning with human preferences. This human-informed reward model supersedes the original, guiding the refinement of the policy to reflect human-like preferences. The effectiveness of RLHF in producing human-like lane changes is demonstrated through the development and evaluation of conservative and aggressive lane-changing models within obstacle-rich environments and mixed autonomy traffic scenarios. The experimental outcomes underscore the potential of RLHF to diversify lane-changing behaviors in AVs, suggesting its viability for enhancing the integration of AVs into the fabric of human-driven traffic.
Paper Structure (18 sections, 10 equations, 11 figures, 2 tables)

This paper contains 18 sections, 10 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: The overall framework of the lane changing model using RLHF.
  • Figure 2: Illustration of a Three-Lane Vehicle Lane-Change Scenario.
  • Figure 3: The architecture diagram of the feature extraction network. The blue color indicates the range of parameters that need to be fixed during the subsequent fine-tuning process.
  • Figure 4: Collection of human feedback using visualized trajectory segments.
  • Figure 5: The structure of the reward model.
  • ...and 6 more figures