Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

Huihan Liu; Soroush Nasiriany; Lance Zhang; Zhiyao Bao; Yuke Zhu

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

Huihan Liu, Soroush Nasiriany, Lance Zhang, Zhiyao Bao, Yuke Zhu

TL;DR

This work addresses the brittleness of state-of-the-art robot learning by introducing Sirius, a human-in-the-loop framework that couples deployment-time human interventions with continual policy learning. It combines a two-thread architecture (deployment and policy updates) with an intervention-guided weighting scheme for weighted behavioral cloning, and memory-management strategies to operate under fixed storage. The approach yields improved policy performance on both simulation (8% boost) and real hardware (27% boost) while reducing human workload and enabling faster model updates. These contributions enable safer, more reliable long-term robotic manipulation and point toward scalable, continually improving autonomy in real-world settings.

Abstract

With the rapid growth of computing powers and recent advances in deep learning, we have witnessed impressive demonstrations of novel robot capabilities in research settings. Nonetheless, these learning systems exhibit brittle generalization and require excessive training data for practical tasks. To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations. Such a human-robot team ensures safe deployments in complex tasks. Further, we introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions. The core idea is re-weighing training samples with approximated human trust and optimizing the policies with weighted behavioral cloning. We evaluate Sirius in simulation and on real hardware, showing that Sirius consistently outperforms baselines over a collection of contact-rich manipulation tasks, achieving an 8% boost in simulation and 27% on real hardware than the state-of-the-art methods in policy success rate, with twice faster convergence and 85% memory size reduction. Videos and more details are available at https://ut-austin-rpl.github.io/sirius/

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

TL;DR

Abstract

Paper Structure (22 sections, 4 equations, 11 figures, 5 tables)

This paper contains 22 sections, 4 equations, 11 figures, 5 tables.

Introduction
Related Work
Background and Overview
Problem Formulation
Weighted Behavioral Cloning Methods
Sirius: Human-in-the-loop Learning and Deployment
Human-in-the-loop Deployment Framework
Human-in-the-loop Policy Learning
Memory Management
Implementation Details
Experiments
Tasks
Baselines and Evaluation Protocol
Experiment Results
Conclusion
...and 7 more sections

Figures (11)

Figure 1: Overview of Sirius, our human-in-the-loop learning and deployment framework. Sirius enables a human and a robot to collaborate on manipulation tasks through shared control. The human monitors the robot's autonomous execution and intervenes to provide corrections through teleoperation. Data from deployments will be used by our algorithm to improve the robot's policy in consecutive rounds of policy learning.
Figure 2: Illustration of the workflow in Sirius. Robot deployment and policy update co-occur in two parallel threads. Deployment data are passed to policy training, while a newly trained policy is deployed to the target environment for task execution.
Figure 3: Overview of our human-in-the-loop learning model. We maintain an ever-growing database of diverse experiences spanning four categories: human demonstrations, autonomous robot data, human interventions, and transitions preceding interventions which we call pre-interventions. We set weights according to these four categories, with a high weight given to interventions over other categories. We use these weighted samples to continually learn vision-based manipulation policies during deployment.
Figure 4: Policy Architecture. Our vision-based policy uses BC-RNN as our policy backbone. Our inputs are workspace camera image and eye-in-hand camera image, as well as robot proprioceptive states.
Figure 5: Quantitative evaluations. We compare our method with human-in-the-loop learning, imitation learning, and offline reinforcement learning baselines. Our results in simulated and real-world tasks show steady performance improvements of the autonomous policies over rounds. Our model reports the highest performance in all four tasks after three rounds of deployments and policy updates. Solid line: human-in-the-loop; dashed line: offline learning on data from our method.
...and 6 more figures

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

TL;DR

Abstract

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

Authors

TL;DR

Abstract

Table of Contents

Figures (11)