Table of Contents
Fetching ...

Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation

Cheng Pan, Hung Hon Cheng, Josie Hughes

TL;DR

This work tackles the data inefficiency of imitation learning for robotic manipulation by introducing Decaying Relative Correction (DRC) delivered through a cable-driven teleoperation interface. The method couples an online imitation learning pipeline based on diffusion-model behavior cloning with a corrective feedback mechanism that decays over time, reducing the need for continuous expert input. Empirical results show a roughly 30% decrease in expert interventions and rapid improvements in task success for raspberry harvesting and stain removal, with demonstrated transfer to unseen objects and expansion to multi-arm setups. The approach offers a practical pathway to scalable, expert-efficient learning of manipulation policies in dynamic environments.

Abstract

Teleoperated robotic manipulators enable the collection of demonstration data, which can be used to train control policies through imitation learning. However, such methods can require significant amounts of training data to develop robust policies or adapt them to new and unseen tasks. While expert feedback can significantly enhance policy performance, providing continuous feedback can be cognitively demanding and time-consuming for experts. To address this challenge, we propose to use a cable-driven teleoperation system which can provide spatial corrections with 6 degree of freedom to the trajectories generated by a policy model. Specifically, we propose a correction method termed Decaying Relative Correction (DRC) which is based upon the spatial offset vector provided by the expert and exists temporarily, and which reduces the intervention steps required by an expert. Our results demonstrate that DRC reduces the required expert intervention rate by 30\% compared to a standard absolute corrective method. Furthermore, we show that integrating DRC within an online imitation learning framework rapidly increases the success rate of manipulation tasks such as raspberry harvesting and cloth wiping.

Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation

TL;DR

This work tackles the data inefficiency of imitation learning for robotic manipulation by introducing Decaying Relative Correction (DRC) delivered through a cable-driven teleoperation interface. The method couples an online imitation learning pipeline based on diffusion-model behavior cloning with a corrective feedback mechanism that decays over time, reducing the need for continuous expert input. Empirical results show a roughly 30% decrease in expert interventions and rapid improvements in task success for raspberry harvesting and stain removal, with demonstrated transfer to unseen objects and expansion to multi-arm setups. The approach offers a practical pathway to scalable, expert-efficient learning of manipulation policies in dynamic environments.

Abstract

Teleoperated robotic manipulators enable the collection of demonstration data, which can be used to train control policies through imitation learning. However, such methods can require significant amounts of training data to develop robust policies or adapt them to new and unseen tasks. While expert feedback can significantly enhance policy performance, providing continuous feedback can be cognitively demanding and time-consuming for experts. To address this challenge, we propose to use a cable-driven teleoperation system which can provide spatial corrections with 6 degree of freedom to the trajectories generated by a policy model. Specifically, we propose a correction method termed Decaying Relative Correction (DRC) which is based upon the spatial offset vector provided by the expert and exists temporarily, and which reduces the intervention steps required by an expert. Our results demonstrate that DRC reduces the required expert intervention rate by 30\% compared to a standard absolute corrective method. Furthermore, we show that integrating DRC within an online imitation learning framework rapidly increases the success rate of manipulation tasks such as raspberry harvesting and cloth wiping.

Paper Structure

This paper contains 16 sections, 2 equations, 8 figures.

Figures (8)

  • Figure 1: Architecture of our proposed approach. The initial training data for online imitation learning is first collected through expert demonstration. The trained policy model is then deployed to robot, but still allowing the operator to correct motions in real time. Based on the corrected trajectories, the model is retrained and redeployed iteratively
  • Figure 2: Two types of correction methods: (a) Relative correction — a decaying offset is applied to the original motion, allowing the robot to temporarily shift its trajectory to complete the task. The offset gradually decreases at a decaying rate, ensuring the state remains within the target area temporarily, thereby reducing the need for frequent interventions. (b) Absolute correction — the operator fully overwrites the motion to provide high-quality demonstration data. However, this requires manually repositioning the robot to the end state after correction.
  • Figure 3: Human teleoperated correction when the policy model struggles. (a) Switching from autonomous policy control to human intervention to correct robot state via the cable-driven teleoperation handle. Once the correction is complete, control is returned to the robot. (b) Experimental Robot Trajectory Data – After a failed attempt, an expert corrects the robot’s motion using Direct Real-time Correction (DRC). The collected real-world data corresponds to Fig. \ref{['fig:fig2']} (a).
  • Figure 4: Experiments on Two Manipulation Tasks: harvesting and cleaning motion. The target object is randomly placed in a defined of area, and the robotic arm must locate it and execute the task autonomously. Small object sizes increase the task difficulty and highlights the effectiveness of the proposed method.
  • Figure 5: The expert intervention rate (ratio of time steps during the task where the expert is providing corrections) for the two tasks for the pre-trained policy model (trained on 20 demonstrations), and 2 further rounds of online imitation learning, where 10 additional corrected trajectories are used to update the policy model.
  • ...and 3 more figures