Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

Zilin Huang; Zihao Sheng; Sikai Chen

Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

Zilin Huang, Zihao Sheng, Sikai Chen

TL;DR

PE-RLHF introduces a Physics-enhanced Human-AI (PE-HAI) collaborative paradigm for dynamic action selection between human and physics-based actions, employs a reward-free approach with a proxy value function to capture human preferences, and incorporates a minimal intervention mechanism to reduce the cognitive load on human mentors.

Abstract

In the field of autonomous driving, developing safe and trustworthy autonomous driving policies remains a significant challenge. Recently, Reinforcement Learning with Human Feedback (RLHF) has attracted substantial attention due to its potential to enhance training safety and sampling efficiency. Nevertheless, existing RLHF-enabled methods often falter when faced with imperfect human demonstrations, potentially leading to training oscillations or even worse performance than rule-based approaches. Inspired by the human learning process, we propose Physics-enhanced Reinforcement Learning with Human Feedback (PE-RLHF). This novel framework synergistically integrates human feedback (e.g., human intervention and demonstration) and physics knowledge (e.g., traffic flow model) into the training loop of reinforcement learning. The key advantage of PE-RLHF is its guarantee that the learned policy will perform at least as well as the given physics-based policy, even when human feedback quality deteriorates, thus ensuring trustworthy safety improvements. PE-RLHF introduces a Physics-enhanced Human-AI (PE-HAI) collaborative paradigm for dynamic action selection between human and physics-based actions, employs a reward-free approach with a proxy value function to capture human preferences, and incorporates a minimal intervention mechanism to reduce the cognitive load on human mentors. Extensive experiments across diverse driving scenarios demonstrate that PE-RLHF significantly outperforms traditional methods, achieving state-of-the-art (SOTA) performance in safety, efficiency, and generalizability, even with varying quality of human feedback. The philosophy behind PE-RLHF not only advances autonomous driving technology but can also offer valuable insights for other safety-critical domains. Demo video and code are available at: \https://zilin-huang.github.io/PE-RLHF-website/

Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

TL;DR

Abstract

Paper Structure (59 sections, 4 theorems, 46 equations, 20 figures, 13 tables, 2 algorithms)

This paper contains 59 sections, 4 theorems, 46 equations, 20 figures, 13 tables, 2 algorithms.

Introduction
Related Works
Safety Guarantees of RL-based Decision-Making
RLHF for Driving Policy Learning
Problem Formulation
Preliminaries
Problem Statement
Physics-enhanced Human-AI Collaborative Paradigm
Inspiration
Human Policy Generation
Human-AI Shared Control
The Form of Switch Function
Physics-based Policy Generation
Action Selection Mechanism
Value Estimator Construction
...and 44 more sections

Key Result

Lemma 1

The state distribution discrepancy between the human policy $\pi_{\text{human}}$ and the AV policy $\pi_{\text{AV}}$ is bounded by their expected policy discrepancy:

Figures (20)

Figure 1: Motivation for this work. (a) Fundamentals and limitations of existing IL/RL-based methods for driving policy learning. (b) Fundamentals and limitations of RLHF-based methods. (c) Fundamentals of our proposed PE-RHLF framework, which achieves trustworthy safety improvements.
Figure 2: The proposed PE-HAI paradigm for autonomous driving. (a) Inspiration from the human learning process, where a student learns from both a native speaker and a grammar book. (b) The main components of the PE-HAI paradigm, where the AV is equipped with a human policy $\pi_{\text{human}}$, a physics-based policy $\pi_{\text{phy}}$, and an AV policy $\pi_{\text{AV}}$. When human takeover occurs, the selection function $\mathcal{T}{\text{select}}(s)$ determines whether to execute the action generated by the $\pi_{\text{human}}$ or the $\pi_{\text{phy}}$ based on their expected Q values estimated by the ensemble of Q-networks $\mathbf{Q}^\phi$.
Figure 3: PE-HAI's action selection process in a roadblock avoidance scenario. Traditional RL methods (green path) would collide with the roadblock before learning to avoid it. In PE-HAI, humans perceive danger and take over, making a left lane change. As $a_{\text{human}}$'s expected Q value exceeds $a_{\text{phy}}$ at this point, PE-HAI adopts the human action (gray path). However, humans may subsequently make erroneous maneuvers such as deviating from the road. While this could cause training failure in traditional RLHF, PE-HAI switches to the physics-based action when $a_{\text{phy}}$'s Q value surpasses $a_{\text{human}}$'s as the vehicle nears road departure (orange path), thus ensuring training safety. Ultimately, the agent learns a safe and efficient obstacle avoidance strategy from this hybrid policy (blue path).
Figure 4: Overview of the proposed PE-RLHF framework.
Figure 5: Illustration of various driving scenarios generated in the MetaDrive simulator.
...and 15 more figures

Theorems & Definitions (4)

Lemma 1
Theorem 1
Theorem 2
Theorem 3

Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

TL;DR

Abstract

Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (4)