Real-Time Operator Takeover for Visuomotor Diffusion Policy Training

Nils Ingelhag; Jesper Munkeby; Michael C. Welle; Marco Moletta; Danica Kragic

Real-Time Operator Takeover for Visuomotor Diffusion Policy Training

Nils Ingelhag, Jesper Munkeby, Michael C. Welle, Marco Moletta, Danica Kragic

TL;DR

The paper addresses robustness gaps in imitation learning for visuomotor diffusion policies by introducing Real-Time Operator Takeover (RTOT), which lets a human operator intervene in real time to correct undesirable states and collects only the takeover trajectories for subsequent retraining. The approach interleaves initial demonstrations with targeted takeover demonstrations, retraining the policy iteratively to cover failure modes while reducing total demonstration time. A Mahalanobis distance analysis on image and pose embeddings investigates out-of-distribution detection as a diagnostic tool for deployment, revealing that high distances correlate with failure states but do not strictly predict final task performance. Experimental results on a cyclic rice scooping task show that RTOT-produced policies outperform baselines trained on the same or greater amounts of initial data, confirming improved data efficiency and robustness, and demonstrating practical benefits for real-world robotic manipulation.

Abstract

We present a Real-Time Operator Takeover (RTOT) paradigm enabling operators to seamlessly take control of a live visuomotor diffusion policy, guiding the system back into desirable states or reinforcing specific demonstrations. We present new insights in using the Mahalonobis distance to automatically identify undesirable states. Once the operator has intervened and redirected the system, the control is seamlessly returned to the policy, which resumes generating actions until further intervention is required. We demonstrate that incorporating the targeted takeover demonstrations significantly improves policy performance compared to training solely with an equivalent number of, but longer, initial demonstrations. We provide an in-depth analysis of using the Mahalanobis distance to detect out-of-distribution states, illustrating its utility for identifying critical failure points during execution. Supporting materials, including videos of initial and takeover demonstrations and all rice scooping experiments, are available on the project website: https://operator-takeover.github.io/

Real-Time Operator Takeover for Visuomotor Diffusion Policy Training

TL;DR

Abstract

Real-Time Operator Takeover for Visuomotor Diffusion Policy Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)