FRMD: Fast Robot Motion Diffusion with Consistency-Distilled Movement Primitives for Smooth Action Generation
Xirui Shi, Jun Jin
TL;DR
This work tackles the latency and motion quality limitations of diffusion-based robot motion generation by introducing FRMD, a consistency-distilled movement primitives framework. FRMD shifts learning from raw action sequences to trajectory parameters using ProDMPs, and leverages Consistency Models with PF-ODE to enable single-step, fast inference while preserving temporal coherence. A teacher MPD-based diffusion model provides a structured prior, and a consistency-distilled student learns to map noisy inputs directly to movement primitive parameters, achieving real-time motion with smoother trajectories. Evaluations on MetaWorld and ManiSkill show FRMD delivering higher success rates and substantially lower inference times (e.g., 17.2 ms) compared to state-of-the-art diffusion methods, highlighting its potential for real-time, multi-task robotic manipulation.
Abstract
We consider the problem of using diffusion models to generate fast, smooth, and temporally consistent robot motions. Although diffusion models have demonstrated superior performance in robot learning due to their task scalability and multi-modal flexibility, they suffer from two fundamental limitations: (1) they often produce non-smooth, jerky motions due to their inability to capture temporally consistent movement dynamics, and (2) their iterative sampling process incurs prohibitive latency for many robotic tasks. Inspired by classic robot motion generation methods such as DMPs and ProMPs, which capture temporally and spatially consistent dynamic of trajectories using low-dimensional vectors -- and by recent advances in diffusion-based image generation that use consistency models with probability flow ODEs to accelerate the denoising process, we propose Fast Robot Motion Diffusion (FRMD). FRMD uniquely integrates Movement Primitives (MPs) with Consistency Models to enable efficient, single-step trajectory generation. By leveraging probabilistic flow ODEs and consistency distillation, our method models trajectory distributions while learning a compact, time-continuous motion representation within an encoder-decoder architecture. This unified approach eliminates the slow, multi-step denoising process of conventional diffusion models, enabling efficient one-step inference and smooth robot motion generation. We extensively evaluated our FRMD on the well-recognized Meta-World and ManiSkills Benchmarks, ranging from simple to more complex manipulation tasks, comparing its performance against state-of-the-art baselines. Our results show that FRMD generates significantly faster, smoother trajectories while achieving higher success rates.
