Table of Contents
Fetching ...

FRMD: Fast Robot Motion Diffusion with Consistency-Distilled Movement Primitives for Smooth Action Generation

Xirui Shi, Jun Jin

TL;DR

This work tackles the latency and motion quality limitations of diffusion-based robot motion generation by introducing FRMD, a consistency-distilled movement primitives framework. FRMD shifts learning from raw action sequences to trajectory parameters using ProDMPs, and leverages Consistency Models with PF-ODE to enable single-step, fast inference while preserving temporal coherence. A teacher MPD-based diffusion model provides a structured prior, and a consistency-distilled student learns to map noisy inputs directly to movement primitive parameters, achieving real-time motion with smoother trajectories. Evaluations on MetaWorld and ManiSkill show FRMD delivering higher success rates and substantially lower inference times (e.g., 17.2 ms) compared to state-of-the-art diffusion methods, highlighting its potential for real-time, multi-task robotic manipulation.

Abstract

We consider the problem of using diffusion models to generate fast, smooth, and temporally consistent robot motions. Although diffusion models have demonstrated superior performance in robot learning due to their task scalability and multi-modal flexibility, they suffer from two fundamental limitations: (1) they often produce non-smooth, jerky motions due to their inability to capture temporally consistent movement dynamics, and (2) their iterative sampling process incurs prohibitive latency for many robotic tasks. Inspired by classic robot motion generation methods such as DMPs and ProMPs, which capture temporally and spatially consistent dynamic of trajectories using low-dimensional vectors -- and by recent advances in diffusion-based image generation that use consistency models with probability flow ODEs to accelerate the denoising process, we propose Fast Robot Motion Diffusion (FRMD). FRMD uniquely integrates Movement Primitives (MPs) with Consistency Models to enable efficient, single-step trajectory generation. By leveraging probabilistic flow ODEs and consistency distillation, our method models trajectory distributions while learning a compact, time-continuous motion representation within an encoder-decoder architecture. This unified approach eliminates the slow, multi-step denoising process of conventional diffusion models, enabling efficient one-step inference and smooth robot motion generation. We extensively evaluated our FRMD on the well-recognized Meta-World and ManiSkills Benchmarks, ranging from simple to more complex manipulation tasks, comparing its performance against state-of-the-art baselines. Our results show that FRMD generates significantly faster, smoother trajectories while achieving higher success rates.

FRMD: Fast Robot Motion Diffusion with Consistency-Distilled Movement Primitives for Smooth Action Generation

TL;DR

This work tackles the latency and motion quality limitations of diffusion-based robot motion generation by introducing FRMD, a consistency-distilled movement primitives framework. FRMD shifts learning from raw action sequences to trajectory parameters using ProDMPs, and leverages Consistency Models with PF-ODE to enable single-step, fast inference while preserving temporal coherence. A teacher MPD-based diffusion model provides a structured prior, and a consistency-distilled student learns to map noisy inputs directly to movement primitive parameters, achieving real-time motion with smoother trajectories. Evaluations on MetaWorld and ManiSkill show FRMD delivering higher success rates and substantially lower inference times (e.g., 17.2 ms) compared to state-of-the-art diffusion methods, highlighting its potential for real-time, multi-task robotic manipulation.

Abstract

We consider the problem of using diffusion models to generate fast, smooth, and temporally consistent robot motions. Although diffusion models have demonstrated superior performance in robot learning due to their task scalability and multi-modal flexibility, they suffer from two fundamental limitations: (1) they often produce non-smooth, jerky motions due to their inability to capture temporally consistent movement dynamics, and (2) their iterative sampling process incurs prohibitive latency for many robotic tasks. Inspired by classic robot motion generation methods such as DMPs and ProMPs, which capture temporally and spatially consistent dynamic of trajectories using low-dimensional vectors -- and by recent advances in diffusion-based image generation that use consistency models with probability flow ODEs to accelerate the denoising process, we propose Fast Robot Motion Diffusion (FRMD). FRMD uniquely integrates Movement Primitives (MPs) with Consistency Models to enable efficient, single-step trajectory generation. By leveraging probabilistic flow ODEs and consistency distillation, our method models trajectory distributions while learning a compact, time-continuous motion representation within an encoder-decoder architecture. This unified approach eliminates the slow, multi-step denoising process of conventional diffusion models, enabling efficient one-step inference and smooth robot motion generation. We extensively evaluated our FRMD on the well-recognized Meta-World and ManiSkills Benchmarks, ranging from simple to more complex manipulation tasks, comparing its performance against state-of-the-art baselines. Our results show that FRMD generates significantly faster, smoother trajectories while achieving higher success rates.

Paper Structure

This paper contains 28 sections, 13 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of FRMD Training Framework. Given observations $o_i$, raw action sequence $\tau^0$ and initial state $(y_0, \dot{y_0})$ from the robot datasets, we first perform a forward diffusion to introduce noise over $n+k$ steps. The resulting noisy sequence $\tau^{n+k}$ is then fed into both the student model and the teacher model to predict the action sequence $\tau^0$ and $\tau^n$. The target model uses the teacher network’s $k$-step estimation results to predict the action sequence. The student model, trained via consistency distillation and its weights are updated through an Exponential Moving Average (EMA).
  • Figure 2: Learning Curve comparison of different methods across various robotic tasks. We compare FRMD (ours), Diffusion Policy (DP) and Movement Primitives Diffusion (MPD) across 12 different tasks from the MetaWorld and ManiSkill benchmarks. The success rate is computed by evaluating 10 episodes with random seeds in each environment at every 5k training steps for each method, until convergence. The mean success rate is plotted as a solid line, and the variance is shown as shaded areas. Results show that our method consistently achieves higher success rates with significantly smaller inference latency (10x faster than MPD scheikl2024movement, 7x faster than DP chi2023diffusion, as shown in Table 1) compared to baselines. Note that the intermediate success rate of our method in the initial training steps is due to model distillation using the teach model.
  • Figure 3: Trajectories generated in PlugCharger-v1 task. The top plot shows the trajectory generated by DP, while the bottom plot presents the trajectory produced by our method (FRMD). The green circles highlight regions where the data point transitions exhibits significant non-smoothness which is computed by comparing its curvature $k_i$ with a threshold $k_{max}=1$, as defined in Section \ref{['IV-B']} . In comparison, our method results in a significantly smoother trajectory with fewer oscillations, demonstrating improved motion stability, especially near the start and goal point.