ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation
Guanxing Lu, Zifeng Gao, Tianxing Chen, Wenxun Dai, Ziwei Wang, Wenbo Ding, Yansong Tang
TL;DR
The paper tackles the bottleneck of slow inference in diffusion-based policies for 3D robotic manipulation. It proposes ManiCM, a manipulation consistency model that enforces self-consistency to enable one-step action generation conditioned on 3D point clouds, and uses consistency distillation to train from a teacher diffusion model. The approach yields about a 10x speedup while maintaining competitive success across 31 tasks in Adroit and Metaworld, with real-world validation on UR3e hardware. This work significantly advances real-time deployment of diffusion-based policies in complex 3D manipulation scenarios and lays groundwork for scalable, high-frequency robotics control.
Abstract
Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on the point cloud input, where the original action is required to be directly denoised from any point along the ODE trajectory. To model this process, we design a consistency distillation technique to predict the action sample directly instead of predicting the noise within the vision community for fast convergence in the low-dimensional action manifold. We evaluate ManiCM on 31 robotic manipulation tasks from Adroit and Metaworld, and the results demonstrate that our approach accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive average success rate.
