Table of Contents
Fetching ...

Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies

Jonne Van Haastregt, Bastian Orthmann, Michael C. Welle, Yuchong Zhang, Danica Kragic

TL;DR

This work tackles the gap between imitation-based visuomotor policies and real-world manipulation by integrating Teleoperation Augmentation Primitives (TAPs) into a Hybrid-Diffusion framework. TAPs—axis locking, perching-waypoints, and open-loop routines—are triggered during demonstrations and can be autonomously invoked by the policy during inference, enabling embodiment-aware actions. Experimental results across vial aspiration, open-container liquid transfer, and container unscrewing show task-dependent benefits, with open-loop routines delivering clear gains in morphologically favorable tasks. The approach offers a scalable way to combine structured, robot-specific routines with learned policies, improving robustness and deployment potential in complex manipulation. Future work points to richer TAP types, automatic discovery, hierarchical composition, and cross-embodiment transfer to broaden applicability.

Abstract

Despite the fact that visuomotor-based policies obtained via imitation learning demonstrate good performances in complex manipulation tasks, they usually struggle to achieve the same accuracy and speed as traditional control based methods. In this work, we introduce Hybrid-Diffusion models that combine open-loop routines with visuomotor diffusion policies. We develop Teleoperation Augmentation Primitives (TAPs) that allow the operator to perform predefined routines, such as locking specific axes, moving to perching waypoints, or triggering task-specific routines seamlessly during demonstrations. Our Hybrid-Diffusion method learns to trigger such TAPs during inference. We validate the method on challenging real-world tasks: Vial Aspiration, Open-Container Liquid Transfer, and container unscrewing. All experimental videos are available on the project's website: https://hybriddiffusion.github.io/

Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies

TL;DR

This work tackles the gap between imitation-based visuomotor policies and real-world manipulation by integrating Teleoperation Augmentation Primitives (TAPs) into a Hybrid-Diffusion framework. TAPs—axis locking, perching-waypoints, and open-loop routines—are triggered during demonstrations and can be autonomously invoked by the policy during inference, enabling embodiment-aware actions. Experimental results across vial aspiration, open-container liquid transfer, and container unscrewing show task-dependent benefits, with open-loop routines delivering clear gains in morphologically favorable tasks. The approach offers a scalable way to combine structured, robot-specific routines with learned policies, improving robustness and deployment potential in complex manipulation. Future work points to richer TAP types, automatic discovery, hierarchical composition, and cross-embodiment transfer to broaden applicability.

Abstract

Despite the fact that visuomotor-based policies obtained via imitation learning demonstrate good performances in complex manipulation tasks, they usually struggle to achieve the same accuracy and speed as traditional control based methods. In this work, we introduce Hybrid-Diffusion models that combine open-loop routines with visuomotor diffusion policies. We develop Teleoperation Augmentation Primitives (TAPs) that allow the operator to perform predefined routines, such as locking specific axes, moving to perching waypoints, or triggering task-specific routines seamlessly during demonstrations. Our Hybrid-Diffusion method learns to trigger such TAPs during inference. We validate the method on challenging real-world tasks: Vial Aspiration, Open-Container Liquid Transfer, and container unscrewing. All experimental videos are available on the project's website: https://hybriddiffusion.github.io/

Paper Structure

This paper contains 10 sections, 4 figures, 2 algorithms.

Figures (4)

  • Figure 1: Overview of our Hybrid-Diffusion Model; during teleoperation, the expert can trigger a Teleoperation Augmentation Primitive (TAP) either via speech or (AR) controller inputs. The Hybrid-Defusion model learns to also trigger such TAP routines during execution, making use of routines during tasks.
  • Figure 2: Different ways of triggering TAPS; Via speech command (a), AR button interfaces(b), or via direct mapping with haptic pattern as confirmation (c -expert users)
  • Figure 3: Different types of TAPs and example tasks. a) shows vial asporation where rotational axis looking is useful to the operator, b) Open-Container liqid transfer deploys perching waypoints in order to get the respective container in view, and c) - container unscrewing triggers an open-loop unscrewing routine when at the right places.
  • Figure 4: Experimental results on three tasks, seven novel starting positions each repeated three times. Comparing Hybrid-Diffusion (HD) with the baseline diffusion (D) method.