Table of Contents
Fetching ...

RoboCopilot: Human-in-the-loop Interactive Imitation Learning for Robot Manipulation

Philipp Wu, Yide Shentu, Qiayuan Liao, Ding Jin, Menglong Guo, Koushil Sreenath, Xingyu Lin, Pieter Abbeel

TL;DR

RoboCopilot tackles the inefficiency of passive imitation learning by introducing a human-in-the-loop interactive imitation learning framework for bi-manual manipulation. It combines HG-DAgger with a compliant, bilateral teleoperation system and a continual learning loop, enabling seamless handovers and targeted corrective demonstrations. Through simulation and real-world experiments across picking, transport, and long-horizon tasks, the approach demonstrates improved data quality and higher task success with fewer human interventions, especially when using Batched DAgger. The work highlights a practical, cost-conscious path to scalable interactive learning for contact-rich robotics with long-horizon goals.

Abstract

Learning from human demonstration is an effective approach for learning complex manipulation skills. However, existing approaches heavily focus on learning from passive human demonstration data for its simplicity in data collection. Interactive human teaching has appealing theoretical and practical properties, but they are not well supported by existing human-robot interfaces. This paper proposes a novel system that enables seamless control switching between human and an autonomous policy for bi-manual manipulation tasks, enabling more efficient learning of new tasks. This is achieved through a compliant, bilateral teleoperation system. Through simulation and hardware experiments, we demonstrate the value of our system in an interactive human teaching for learning complex bi-manual manipulation skills.

RoboCopilot: Human-in-the-loop Interactive Imitation Learning for Robot Manipulation

TL;DR

RoboCopilot tackles the inefficiency of passive imitation learning by introducing a human-in-the-loop interactive imitation learning framework for bi-manual manipulation. It combines HG-DAgger with a compliant, bilateral teleoperation system and a continual learning loop, enabling seamless handovers and targeted corrective demonstrations. Through simulation and real-world experiments across picking, transport, and long-horizon tasks, the approach demonstrates improved data quality and higher task success with fewer human interventions, especially when using Batched DAgger. The work highlights a practical, cost-conscious path to scalable interactive learning for contact-rich robotics with long-horizon goals.

Abstract

Learning from human demonstration is an effective approach for learning complex manipulation skills. However, existing approaches heavily focus on learning from passive human demonstration data for its simplicity in data collection. Interactive human teaching has appealing theoretical and practical properties, but they are not well supported by existing human-robot interfaces. This paper proposes a novel system that enables seamless control switching between human and an autonomous policy for bi-manual manipulation tasks, enabling more efficient learning of new tasks. This is achieved through a compliant, bilateral teleoperation system. Through simulation and hardware experiments, we demonstrate the value of our system in an interactive human teaching for learning complex bi-manual manipulation skills.

Paper Structure

This paper contains 21 sections, 1 equation, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: A depiction of our RoboCopilot System which consists of a 20 degrees of freedom mobile bimanual robot and a bilateral teleoperation device. Our system enables easy teleoperation as well as human take-over at any time, allowing for an effective human-in-the-loop teleoperation system for interactive learning.
  • Figure 2: Overview for our interactive teaching system. (a) Workflow for learning a single skill: We start with a set of human demonstrations to pre-train the initial policy. Then in the interactive teaching stage, the policy is deployed and a human intervenes upon policy failure. The policy is continually fine-tuned from these new demos. As policy performance improves, less human intervention is needed. (b) During robot execution, the robot policy takes sensor observations and outputs actions. The human can decide when to use the policy switch to teleoperation. This enables the human to interrupt the robot on policy failure and correct the mistake, storing the data into the dataset. The model is continually training and being updated.
  • Figure 3: The comparison of the different end-effector human interface designs for different teleoperation systems is shown below: Left: Aloha zhao2023aloha and GELLO wu2023gello's handheld interface. Middle: RoboCopilot layout, where we attached the Quest2 controller at the end of our GELLO device. Right: The key map of our end-effector human input interface. We optimized the layout to allow efficient gripper control and interactive human-in-the-loop teaching.
  • Figure 4: An illustration of the evaluation protocol for the industrial part transport tasks. During training, the beam is placed in different positions within a defined boundary (highlighted in green). Industrial picking requires the robot to locate and manipulate the long beam or short beam and place it within the bin. Mobile industrial picking only considers the long beam, but the bin is further away, requiring the robot to drive the base before placing. We label the poses of the beams and the bin to ensure consistency during evaluation.
  • Figure 5: An overall illustration of the toy kitchen task. (Subtask 1) The robot needs to first open the spring-loaded cabinet door and hold the door open. (Subtask 2) The robot can pick the tomato and transfer it to the stove area. (Subtask 3): At this stage, the robot needs to put the tomato into the pot. Notice that the pot can be at different locations. (Subtask 4) Finally, the robot needs to turn the correct dial depending on which stove the pot is at.
  • ...and 5 more figures