HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

Zilin Huang; Zihao Sheng; Chengyuan Ma; Sikai Chen

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen

TL;DR

This work tackles the challenge of learning safe and efficient autonomous driving policies in mixed traffic by introducing HAIM-DRL, an enhanced human-in-the-loop reinforcement learning framework. It replaces reward engineering with reward-free learning guided by explicit and implicit human interventions, leveraging a proxy value function learned from partial human demonstrations and takeover signals. The methodology integrates a reward-free off-policy actor-critic architecture, an offline-Learning-from-Explicit-Intervention pipeline via Conservative Q-Learning, entropy-based exploration, and a disturbance-cost-based implicit intervention to minimize downstream traffic disruption, all while reducing human cognitive load through a takeover-cost mechanism. Empirical results in MetaDrive and CARLA show HAIM-DRL achieves superior safety, faster convergence, higher generalization, and smoother traffic flow with substantially fewer human interactions than traditional IL, offline RL, and conventional HL baselines, underscoring the potential for safer, more efficient deployment of AVs in mixed-traffic environments.

Abstract

Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents' policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor's cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios. The code and demo videos for this paper can be accessed at: https://zilin-huang.github.io/HAIM-DRL-website/

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

TL;DR

Abstract

Paper Structure (47 sections, 43 equations, 15 figures, 12 tables, 2 algorithms)

This paper contains 47 sections, 43 equations, 15 figures, 12 tables, 2 algorithms.

Introduction
IL and RL Methods for Driving Policy Learning
Human-in-the-loop Learning Methods
Contributions
Preliminaries
Markov Decision Process
Longitudinal Dynamical Modeling of Mixed Platoon
Human as AI Mentor Paradigm
Inspiration Behind the HAIM Paradigm
Details of the HAIM Paradigm
Explicit Intervention Mechanism
Implicit Intervention Mechanism
HAIM-based DRL Framework for Driving Policy Learning
HAIM-DRL Overview
Reward-Free Off-Policy Actor-Critic Architecture
...and 32 more sections

Figures (15)

Figure 1: Overview of our proposed '$X+1+N$’ scenario and enhanced human-in-the-loop RL method. (a) The '$X+1+N$’ mixed traffic platoon is a novel concept uniting the transportation and robotics domains. This scenario is characterized by a combination of HVs and an AV, navigating through uncertain traffic environment. (b) HAIM is an innovative learning paradigm that integrates human intelligence into AI, thereby enhancing the learning capabilities of AI agents. Additionally, the HAIM-DRL can be seamlessly embedded into the MetaDrive/CARLA driving environment for testing.
Figure 2: Illustration of the HAIM in the driving school scenario. (a) Traditional RL paradigm learns from trial and errors. (b) Passive human involvement paradigm merely provides suggestions about which actions are good or evaluates collected trajectories. (c) The proposed HAIM paradigm enables the human expert to assume control in hazardous situations and demonstrate the correct actions to prevent potential accidents. Additionally, the agent is instructed to reduce the disturbance to traffic flow by other road users.
Figure 3: Explicit and implicit intervention mechanism in HAIM. (a) The former involves humans actively taking direct control of the AV agent, guiding it through correct behaviors in hazardous scenarios. (b) The latter involves penalizing the agent for actions that disrupt traffic, indirectly indicating that it should avoid such actions in the future.
Figure 4: Schematic of the HAIM-DRL framework for driving policy learning.
Figure 5: Experimental scenes in MetaDrive/CARLA simulator.
...and 10 more figures

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

TL;DR

Abstract

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (15)