Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies
Jonne Van Haastregt, Bastian Orthmann, Michael C. Welle, Yuchong Zhang, Danica Kragic
TL;DR
This work tackles the gap between imitation-based visuomotor policies and real-world manipulation by integrating Teleoperation Augmentation Primitives (TAPs) into a Hybrid-Diffusion framework. TAPs—axis locking, perching-waypoints, and open-loop routines—are triggered during demonstrations and can be autonomously invoked by the policy during inference, enabling embodiment-aware actions. Experimental results across vial aspiration, open-container liquid transfer, and container unscrewing show task-dependent benefits, with open-loop routines delivering clear gains in morphologically favorable tasks. The approach offers a scalable way to combine structured, robot-specific routines with learned policies, improving robustness and deployment potential in complex manipulation. Future work points to richer TAP types, automatic discovery, hierarchical composition, and cross-embodiment transfer to broaden applicability.
Abstract
Despite the fact that visuomotor-based policies obtained via imitation learning demonstrate good performances in complex manipulation tasks, they usually struggle to achieve the same accuracy and speed as traditional control based methods. In this work, we introduce Hybrid-Diffusion models that combine open-loop routines with visuomotor diffusion policies. We develop Teleoperation Augmentation Primitives (TAPs) that allow the operator to perform predefined routines, such as locking specific axes, moving to perching waypoints, or triggering task-specific routines seamlessly during demonstrations. Our Hybrid-Diffusion method learns to trigger such TAPs during inference. We validate the method on challenging real-world tasks: Vial Aspiration, Open-Container Liquid Transfer, and container unscrewing. All experimental videos are available on the project's website: https://hybriddiffusion.github.io/
