DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation
Guowei Zou, Haitao Wang, Hejun Wu, Yukun Qian, Yuhang Wang, Weibing Li
TL;DR
DM1 introduces a MeanFlow-based policy augmented with dispersive regularization to enable true one-step action generation in vision-based robotic manipulation. Dispersive losses are applied to intermediate embeddings ($H^{(T)}$, $H^{(R)}$, $H^{(\text{Cond})}$) to prevent representation collapse without architectural changes, and four variants (InfoNCE-L2, InfoNCE-Cosine, Hinge, Covariance) are evaluated. On RoboMimic benchmarks, DM1 delivers $20$–$40\times$ faster inference and $10$–$20$ percentage point gains in success, with Lift approaching near-perfect performance, and real-robot validation on a Franka-Emika-Panda confirms sim-to-real transfer and real-time control above $50$ Hz. These results indicate that representation regularization can sustain multimodal control signals in flow-based policies, enabling practical, real-time manipulation.
Abstract
The ability to learn multi-modal action distributions is indispensable for robotic manipulation policies to perform precise and robust control. Flow-based generative models have recently emerged as a promising solution to learning distributions of actions, offering one-step action generation and thus achieving much higher sampling efficiency compared to diffusion-based methods. However, existing flow-based policies suffer from representation collapse, the inability to distinguish similar visual representations, leading to failures in precise manipulation tasks. We propose DM1 (MeanFlow with Dispersive Regularization for One-Step Robotic Manipulation), a novel flow matching framework that integrates dispersive regularization into MeanFlow to prevent collapse while maintaining one-step efficiency. DM1 employs multiple dispersive regularization variants across different intermediate embedding layers, encouraging diverse representations across training batches without introducing additional network modules or specialized training procedures. Experiments on RoboMimic benchmarks show that DM1 achieves 20-40 times faster inference (0.07s vs. 2-3.5s) and improves success rates by 10-20 percentage points, with the Lift task reaching 99% success over 85% of the baseline. Real-robot deployment on a Franka Panda further validates that DM1 transfers effectively from simulation to the physical world. To the best of our knowledge, this is the first work to leverage representation regularization to enable flow-based policies to achieve strong performance in robotic manipulation, establishing a simple yet powerful approach for efficient and robust manipulation.
