Table of Contents
Fetching ...

Data-Efficient Learning from Human Interventions for Mobile Robots

Zhenghao Peng, Zhizheng Liu, Bolei Zhou

TL;DR

This paper tackles the data-efficiency and safety bottlenecks of applying imitation and reinforcement learning to mobile robots in the real world. It introduces PVP4Real, a reward-free online human-in-the-loop framework that blends imitation and reinforcement learning, uses two replay buffers to handle human-intervened and non-intervened data, and leverages TD learning alongside behavior cloning to train from online demonstrations and interventions. The method is validated in simulation and on two real robots for Safe Navigation and Human Following, achieving training from scratch in about 15 minutes with minimal human input and outperforming purely behavioral cloning and other baselines, particularly in unsafe or unpredictable states. The results suggest a practical pathway to safe, data-efficient, real-world robotic learning, reducing reliance on large demonstration datasets and extensive reward engineering.

Abstract

Mobile robots are essential in applications such as autonomous delivery and hospitality services. Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability. Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps, making them challenging for efficient and safe real-world deployment. We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues. PVP4Real enables efficient real-time policy learning from online human intervention and demonstration, without reward or any pretraining, significantly improving data efficiency and training safety. We validate our method by training two different robots -- a legged quadruped, and a wheeled delivery robot -- in two mobile robot tasks, one of which even uses raw RGBD image as observation. The training finishes within 15 minutes. Our experiments show the promising future of human-in-the-loop learning in addressing the data efficiency issue in real-world robotic tasks. More information is available at: https://metadriverse.github.io/pvp4real/

Data-Efficient Learning from Human Interventions for Mobile Robots

TL;DR

This paper tackles the data-efficiency and safety bottlenecks of applying imitation and reinforcement learning to mobile robots in the real world. It introduces PVP4Real, a reward-free online human-in-the-loop framework that blends imitation and reinforcement learning, uses two replay buffers to handle human-intervened and non-intervened data, and leverages TD learning alongside behavior cloning to train from online demonstrations and interventions. The method is validated in simulation and on two real robots for Safe Navigation and Human Following, achieving training from scratch in about 15 minutes with minimal human input and outperforming purely behavioral cloning and other baselines, particularly in unsafe or unpredictable states. The results suggest a practical pathway to safe, data-efficient, real-world robotic learning, reducing reliance on large demonstration datasets and extensive reward engineering.

Abstract

Mobile robots are essential in applications such as autonomous delivery and hospitality services. Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability. Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps, making them challenging for efficient and safe real-world deployment. We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues. PVP4Real enables efficient real-time policy learning from online human intervention and demonstration, without reward or any pretraining, significantly improving data efficiency and training safety. We validate our method by training two different robots -- a legged quadruped, and a wheeled delivery robot -- in two mobile robot tasks, one of which even uses raw RGBD image as observation. The training finishes within 15 minutes. Our experiments show the promising future of human-in-the-loop learning in addressing the data efficiency issue in real-world robotic tasks. More information is available at: https://metadriverse.github.io/pvp4real/

Paper Structure

This paper contains 12 sections, 6 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: We train mobile robots in dynamic environments with a human-in-the-loop learning method. We can solve challenging tasks with raw camera input in as little as 15 minutes, training from scratch, without reward and only learning from online human-in-the-loop intervention and demonstration.
  • Figure 2: Method Overview. In human-in-the-loop learning, the human subject supervises the action of the learning agent $a_h$ and decides whether they should intervene and use the human action $a_h$ instead. Depending on the intervention results, we store the transitions in two separate replay buffers. Data from both buffers are sampled to update the policy and value networks in real-time, alongside the ongoing interactions between the agent, human, and environment. No reward and prior knowledge are needed.
  • Figure 3: Experiment on Simulation Environment. Human-in-the-loop methods achieves much better sample efficiency compared to the RL baselines.
  • Figure 4: Overview of the tasks.(A) In Safe Navigation, the agent is given the raw RGB and depth images. The goal is to traverse the corridor environment while avoiding dynamic and moving obstacles, like a walking person, and perform emergency stops if necessary. (B) In Human Following, the robot is given the 2D bounding box of the tracked human. It needs to follow the human within a certain distance while the human performs different behaviors and keeps the target within the view. We experiment with two embodiments, which have different robot dynamics and camera parameters.
  • Figure 5: Qualitative results for the Safe Navigation and Human Following tasks.(A) The stacked trajectories for the "static obstacle avoidance" subtask. (B) The stacked trajectories for the "sharp turn" subtask.