Table of Contents
Fetching ...

FODMP: Fast One-Step Diffusion of Movement Primitives Generation for Time-Dependent Robot Actions

Xirui Shi, Arya Ebrahimi, Yi Hu, Jun Jin

Abstract

Diffusion models are increasingly used for robot learning, but current designs face a clear trade-off. Action-chunking diffusion policies like ManiCM are fast to run, yet they only predict short segments of motion. This makes them reactive, but unable to capture time-dependent motion primitives, such as following a spring-damper-like behavior with built-in dynamic profiles of acceleration and deceleration. Recently, Movement Primitive Diffusion (MPD) partially addresses this limitation by parameterizing full trajectories using Probabilistic Dynamic Movement Primitives (ProDMPs), thereby enabling the generation of temporally structured motions. Nevertheless, MPD integrates the motion decoder directly into a multi-step diffusion process, resulting in prohibitively high inference latency that limits its applicability in real-time control settings. We propose FODMP (Fast One-step Diffusion of Movement Primitives), a new framework that distills diffusion models into the ProDMPs trajectory parameter space and generates motion using a single-step decoder. FODMP retains the temporal structure of movement primitives while eliminating the inference bottleneck through single-step consistency distillation. This enables robots to execute time-dependent primitives at high inference speed, suitable for closed-loop vision-based control. On standard manipulation benchmarks (MetaWorld, ManiSkill), FODMP runs up to 10 times faster than MPD and 7 times faster than action-chunking diffusion policies, while matching or exceeding their success rates. Beyond speed, by generating fast acceleration-deceleration motion primitives, FODMP allows the robot to intercept and securely catch a fast-flying ball, whereas action-chunking diffusion policy and MPD respond too slowly for real-time interception.

FODMP: Fast One-Step Diffusion of Movement Primitives Generation for Time-Dependent Robot Actions

Abstract

Diffusion models are increasingly used for robot learning, but current designs face a clear trade-off. Action-chunking diffusion policies like ManiCM are fast to run, yet they only predict short segments of motion. This makes them reactive, but unable to capture time-dependent motion primitives, such as following a spring-damper-like behavior with built-in dynamic profiles of acceleration and deceleration. Recently, Movement Primitive Diffusion (MPD) partially addresses this limitation by parameterizing full trajectories using Probabilistic Dynamic Movement Primitives (ProDMPs), thereby enabling the generation of temporally structured motions. Nevertheless, MPD integrates the motion decoder directly into a multi-step diffusion process, resulting in prohibitively high inference latency that limits its applicability in real-time control settings. We propose FODMP (Fast One-step Diffusion of Movement Primitives), a new framework that distills diffusion models into the ProDMPs trajectory parameter space and generates motion using a single-step decoder. FODMP retains the temporal structure of movement primitives while eliminating the inference bottleneck through single-step consistency distillation. This enables robots to execute time-dependent primitives at high inference speed, suitable for closed-loop vision-based control. On standard manipulation benchmarks (MetaWorld, ManiSkill), FODMP runs up to 10 times faster than MPD and 7 times faster than action-chunking diffusion policies, while matching or exceeding their success rates. Beyond speed, by generating fast acceleration-deceleration motion primitives, FODMP allows the robot to intercept and securely catch a fast-flying ball, whereas action-chunking diffusion policy and MPD respond too slowly for real-time interception.

Paper Structure

This paper contains 34 sections, 12 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of Policy Designs. (a) Action chunking–based diffusion policies predict discrete sequences of actions without explicitly modeling time-dependent motion, resulting in temporally unstructured behavior. (b) MPD introduces a motion decoder into the diffusion process to recover time-dependent trajectories; however, motion parameters are still inferred through multi-step denoising, leading to high inference latency. (c) Our method directly predicts movement primitive parameters via one-step diffusion, followed by a single decoder pass to generate time-dependent motion. This design enables low-latency inference while explicitly modeling continuous motion evolution over time.
  • Figure 2: Method Overview. Our approach conditions a Transformer-based one-step diffusion model on multi-modal observations to directly predict ProDMPs parameters. The predicted parameters are decoded by a ProDMPs ODE solver, enabling fast generation of smooth, structured robot trajectories across diverse robot tasks.
  • Figure 3: Consistency Distillation Pipeline. Given observations, a multi-step teacher performs k-step denoising to obtain cleaner parameters, which supervise a one-step student model. The student is trained using a consistency objective against a target model, whose parameters are updated via EMA from the student.
  • Figure 4: Simulation Environments. We evaluate our method on a diverse set of manipulation tasks from MetaWorld and ManiSkill, grouped by difficulty into easy, medium, and hard categories.
  • Figure 5: RealWorld Experiments Setups. (a) Push-T task setup, where a Franka arm pushes a T-shaped object using visual observations from two RealSense cameras. (b) Ball-catching task setup, where the robot intercepts a human-thrown ball using depth observations.
  • ...and 4 more figures