Online Imitation Learning for Manipulation via Decaying Relative Correction through Teleoperation
Cheng Pan, Hung Hon Cheng, Josie Hughes
TL;DR
This work tackles the data inefficiency of imitation learning for robotic manipulation by introducing Decaying Relative Correction (DRC) delivered through a cable-driven teleoperation interface. The method couples an online imitation learning pipeline based on diffusion-model behavior cloning with a corrective feedback mechanism that decays over time, reducing the need for continuous expert input. Empirical results show a roughly 30% decrease in expert interventions and rapid improvements in task success for raspberry harvesting and stain removal, with demonstrated transfer to unseen objects and expansion to multi-arm setups. The approach offers a practical pathway to scalable, expert-efficient learning of manipulation policies in dynamic environments.
Abstract
Teleoperated robotic manipulators enable the collection of demonstration data, which can be used to train control policies through imitation learning. However, such methods can require significant amounts of training data to develop robust policies or adapt them to new and unseen tasks. While expert feedback can significantly enhance policy performance, providing continuous feedback can be cognitively demanding and time-consuming for experts. To address this challenge, we propose to use a cable-driven teleoperation system which can provide spatial corrections with 6 degree of freedom to the trajectories generated by a policy model. Specifically, we propose a correction method termed Decaying Relative Correction (DRC) which is based upon the spatial offset vector provided by the expert and exists temporarily, and which reduces the intervention steps required by an expert. Our results demonstrate that DRC reduces the required expert intervention rate by 30\% compared to a standard absolute corrective method. Furthermore, we show that integrating DRC within an online imitation learning framework rapidly increases the success rate of manipulation tasks such as raspberry harvesting and cloth wiping.
