PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

Shivin Dass; Karl Pertsch; Hejia Zhang; Youngwoon Lee; Joseph J. Lim; Stefanos Nikolaidis

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

Shivin Dass, Karl Pertsch, Hejia Zhang, Youngwoon Lee, Joseph J. Lim, Stefanos Nikolaidis

TL;DR

The paper addresses the problem of inefficiently slow and costly large-scale robotic data collection by introducing Policy Assisted TeleOperation (PATO), a hierarchical, uncertainty-aware assistive system. PATO learns from multi-modal, diverse demonstrations to automate repetitive subtasks via a high-level subgoal predictor and a low-level subgoal-reaching policy, while actively deciding when human input is needed using task and policy uncertainty measures. The key contributions include a conditional-VAE based subgoal predictor, an LSTM-based low-level controller, an ensemble-based uncertainty mechanism, and empirical validation showing reduced operator workload and improved throughput in both real-robot and multi-robot simulated settings. This work demonstrates a feasible pathway toward scalable robotic data collection, enabling a single operator to supervise multiple robots and potentially accelerating downstream robot learning pipelines.

Abstract

Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOperation (PATO), a system which automates part of the demonstration collection process using a learned assistive policy. PATO autonomously executes repetitive behaviors in data collection and asks for human input only when it is uncertain about which subtask or behavior to execute. We conduct teleoperation user studies both with a real robot and a simulated robot fleet and demonstrate that our assisted teleoperation system reduces human operators' mental load while improving data collection efficiency. Further, it enables a single operator to control multiple robots in parallel, which is a first step towards scalable robotic data collection. For code and video results, see https://clvrai.com/pato

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

TL;DR

Abstract

Paper Structure (14 sections, 2 equations, 10 figures, 4 tables)

This paper contains 14 sections, 2 equations, 10 figures, 4 tables.

Introduction
Related Work
Approach
Problem Formulation
Learning Assistive Policies from Multi-Modal Data
Deciding When to Request User Input
Experiments
Reducing Mental Load during Data Collection
Scaling Data Collection to Multiple Robots
Conclusion
Implementation Details
Sub-goal Predictor
Low-level Sub-goal Reaching Policy
Q-Function (ThriftyDAgger)

Figures (10)

Figure 1: Policy Assisted TeleOperation (PATO) enables large-scale data collection by minimizing human operator inputs and mental efforts with an assistive policy, which autonomously performs repetitive subtasks. This allows a human operator to simultaneously manage multiple robots.
Figure 2: PATO is hierarchical: a high-level subgoal predictor $p(s_g \vert s, z)$ and a low-level subgoal-reaching policy $\pi_{LL}(a \vert s, s_g)$. To decide when to follow the assistive policy, we measure uncertainty of both high-level (subgoal predictor) and low-level (subgoal-reaching policy) decisions. The task uncertainty is estimated using the subgoal predictor's variance, and the policy uncertainty is estimated as a disagreement among an ensemble of subgoal-reaching policies.
Figure 3: Our hierarchical assistive policy is trained using a pre-collected dataset $\mathcal{D}_\text{pre}$. From a sampled trajectory $(s_1, a_1, \dots, a_{\mathcal{H}-1}, s_\mathcal{H})$ of length $\mathcal{H}$, a subgoal predictor $p(s_g \vert s_1, z)$ is trained as a conditional VAE to cover a multi-modal subgoal distribution, where $s_g = s_\mathcal{H}$. Then, an ensemble of subgoal-reaching policies $\pi^{(k)}_{LL}(a_t \vert s_t, s_g)$ are trained to predict the ground truth actions. The gray dashed lines represent supervision for the prediction tasks of the subgoal predictor and subgoal-reaching policies.
Figure 4: Our approach asks for human inputs when the assistive policy is uncertain about which subtask or action to take. If both the task uncertainty and policy uncertainty are lower than their thresholds, our assistive policy can reliably perform a subtask, reducing the workload of the human operator.
Figure 5: User study setup. (left) A Kinova Jaco arm, front-view and in-hand cameras, and objects for kitchen-inspired tasks are placed on the workspace. (right) A human operator can watch a monitor, which shows either the camera inputs or a side task. The operator uses a gamepad to control the robot, and uses a keyboard to solve the side task.
...and 5 more figures

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

TL;DR

Abstract

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)