Table of Contents
Fetching ...

Human-Robot Copilot for Data-Efficient Imitation Learning

Rui Yan, Zaitian Gongye, Lars Paulsen, Xuxin Cheng, Xiaolong Wang

Abstract

Collecting human demonstrations via teleoperation is a common approach for teaching robots task-specific skills. However, when only a limited number of demonstrations are available, policies are prone to entering out-of-distribution (OOD) states due to compounding errors or environmental stochasticity. Existing interactive imitation learning or human-in-the-loop methods try to address this issue by following the Human-Gated DAgger (HG-DAgger) paradigm, an approach that augments demonstrations through selective human intervention during policy execution. Nevertheless, these approaches struggle to balance dexterity and generality: they either provide fine-grained corrections but are limited to specific kinematic structures, or achieve generality at the cost of precise control. To overcome this limitation, we propose the Human-Robot Copilot framework that can leverage a scaling factor for dexterous teleoperation while maintaining compatibility with a wide range of industrial and research manipulators. Experimental results demonstrate that our framework achieves higher performance with the same number of demonstration trajectories. Moreover, since corrective interventions are required only intermittently, the overall data collection process is more efficient and less time-consuming.

Human-Robot Copilot for Data-Efficient Imitation Learning

Abstract

Collecting human demonstrations via teleoperation is a common approach for teaching robots task-specific skills. However, when only a limited number of demonstrations are available, policies are prone to entering out-of-distribution (OOD) states due to compounding errors or environmental stochasticity. Existing interactive imitation learning or human-in-the-loop methods try to address this issue by following the Human-Gated DAgger (HG-DAgger) paradigm, an approach that augments demonstrations through selective human intervention during policy execution. Nevertheless, these approaches struggle to balance dexterity and generality: they either provide fine-grained corrections but are limited to specific kinematic structures, or achieve generality at the cost of precise control. To overcome this limitation, we propose the Human-Robot Copilot framework that can leverage a scaling factor for dexterous teleoperation while maintaining compatibility with a wide range of industrial and research manipulators. Experimental results demonstrate that our framework achieves higher performance with the same number of demonstration trajectories. Moreover, since corrective interventions are required only intermittently, the overall data collection process is more efficient and less time-consuming.

Paper Structure

This paper contains 17 sections, 1 equation, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure B1: Bidirectional control and observation communication. Forward and inverse kinematics (FK/IK) are continuously computed for both the leader and follower robots. Dashed lines denote control signals, of which only one is selected for synchronization between the two robots. The human teleoperator determines which control signal the two robots execute.
  • Figure C1: Illustration of task workspaces under different scaling factors. The black cube represents the workspace of the leader arm, while the two blue cubes correspond to task workspaces under different scaling factors. A larger task workspace facilitates rapid large-scale movements, whereas a smaller task workspace supports precise and accurate actions for high-precision tasks.
  • Figure C2: Training and data augmentation workflow. The base policy is first initialized through regular imitation learning. It is then deployed to identify potential failure modes. During deployment, a human teleoperator intervenes when necessary, providing corrective actions. These corrective demonstrations are recorded and incorporated into the original dataset, which is subsequently used to fine-tune the policy.
  • Figure D1: The real-world experiments and their key challenges. Fig. A illustrates the tower of hanoi insertion task. In addition to the narrow tolerance required for insertion, grasping the disk itself is challenging. From the top view (top-right), the gripper must align precisely with the disk’s center; otherwise, the disk slips out. From the side view (bottom-right), the gripper must also engage below the midpoint of the disk’s curved edge. Fig. B presents the cube sorting task, where cubes of different colors must be placed into their corresponding containers. The six objects are randomly distributed within a 45 cm × 35 cm workspace, creating a highly randomized environment that significantly increases the difficulty of learning correct actions.
  • Figure D2: End-effector trajectories under different scaling factors ($\alpha=2.0$ vs $\alpha=0.5$) during Tower of Hanoi insertion. Top: lateral position over time showing the full transport-to-alignment trajectory. Bottom: lateral and forward deviations from the target position during the alignment phase (starting from the first velocity zero-crossing). The alignment-phase RMS deviations are reported below.