Table of Contents
Fetching ...

From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning

Li Zeqiao, Wang Yijing, Wang Haoyu, Li Zheng, Li Peng, Liu Wenfei, Zuo Zhiqiang

TL;DR

This work tackles safe, sample-efficient real-world autonomous driving by marrying human expertise with distributional RL. The proposed method, H-DSAC, introduces a distributional proxy value function within DSAC to encode human intent and propagate guidance to unlabeled states, enabling reward-free TD learning and safer exploration. Empirical results from both MetaDrive simulations and real-world UGV experiments show that H-DSAC achieves higher returns, lower safety costs, and higher success rates than standard RL, offline RL, imitation learning, and other HIL baselines, while requiring manageable human involvement. The approach demonstrates practical feasibility for end-to-end autonomous driving in real environments, balancing human guidance with autonomous discovery to realize robust, real-time policy learning.

Abstract

Autonomous driving with reinforcement learning (RL) has significant potential. However, applying RL in real-world settings remains challenging due to the need for safe, efficient, and robust learning. Incorporating human expertise into the learning process can help overcome these challenges by reducing risky exploration and improving sample efficiency. In this work, we propose a reward-free, active human-in-the-loop learning method called Human-Guided Distributional Soft Actor-Critic (H-DSAC). Our method combines Proxy Value Propagation (PVP) and Distributional Soft Actor-Critic (DSAC) to enable efficient and safe training in real-world environments. The key innovation is the construction of a distributed proxy value function within the DSAC framework. This function encodes human intent by assigning higher expected returns to expert demonstrations and penalizing actions that require human intervention. By extrapolating these labels to unlabeled states, the policy is effectively guided toward expert-like behavior. With a well-designed state space, our method achieves real-world driving policy learning within practical training times. Results from both simulation and real-world experiments demonstrate that our framework enables safe, robust, and sample-efficient learning for autonomous driving.

From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning

TL;DR

This work tackles safe, sample-efficient real-world autonomous driving by marrying human expertise with distributional RL. The proposed method, H-DSAC, introduces a distributional proxy value function within DSAC to encode human intent and propagate guidance to unlabeled states, enabling reward-free TD learning and safer exploration. Empirical results from both MetaDrive simulations and real-world UGV experiments show that H-DSAC achieves higher returns, lower safety costs, and higher success rates than standard RL, offline RL, imitation learning, and other HIL baselines, while requiring manageable human involvement. The approach demonstrates practical feasibility for end-to-end autonomous driving in real environments, balancing human guidance with autonomous discovery to realize robust, real-time policy learning.

Abstract

Autonomous driving with reinforcement learning (RL) has significant potential. However, applying RL in real-world settings remains challenging due to the need for safe, efficient, and robust learning. Incorporating human expertise into the learning process can help overcome these challenges by reducing risky exploration and improving sample efficiency. In this work, we propose a reward-free, active human-in-the-loop learning method called Human-Guided Distributional Soft Actor-Critic (H-DSAC). Our method combines Proxy Value Propagation (PVP) and Distributional Soft Actor-Critic (DSAC) to enable efficient and safe training in real-world environments. The key innovation is the construction of a distributed proxy value function within the DSAC framework. This function encodes human intent by assigning higher expected returns to expert demonstrations and penalizing actions that require human intervention. By extrapolating these labels to unlabeled states, the policy is effectively guided toward expert-like behavior. With a well-designed state space, our method achieves real-world driving policy learning within practical training times. Results from both simulation and real-world experiments demonstrate that our framework enables safe, robust, and sample-efficient learning for autonomous driving.

Paper Structure

This paper contains 13 sections, 21 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Overall framework of H-DSAC
  • Figure 2: Simulation environment and human interfaces.
  • Figure 3: Routes for training and testing in real-world experiments.
  • Figure 4: Hardware architecture and real-world setup of UGV platform.
  • Figure 5: Radar-based obstacle detection and observation space visualization.
  • ...and 4 more figures