ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

Ge Yan; Jiyue Zhu; Yuquan Deng; Shiqi Yang; Ri-Zhao Qiu; Xuxin Cheng; Marius Memmel; Ranjay Krishna; Ankit Goyal; Xiaolong Wang; Dieter Fox

ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

Ge Yan, Jiyue Zhu, Yuquan Deng, Shiqi Yang, Ri-Zhao Qiu, Xuxin Cheng, Marius Memmel, Ranjay Krishna, Ankit Goyal, Xiaolong Wang, Dieter Fox

TL;DR

ManiFlow tackles the challenge of general robot manipulation by learning a dexterous visuomotor policy from multi-modal observations. It combines flow matching with a continuous-time consistency objective and deploys a DiT-X Transformer with AdaLN-Zero conditioning to efficiently fuse visual, language, and proprioceptive cues. In extensive simulations and real-world experiments, ManiFlow outperforms diffusion-based and prior flow-matching policies, achieving higher success rates, faster few-step inference, and strong generalization to unseen objects and backgrounds. The work also provides thorough ablations on time-sampling strategies and perceptual encoders, and demonstrates scalable performance with larger demonstration datasets.

Abstract

This paper introduces ManiFlow, a visuomotor imitation learning policy for general robot manipulation that generates precise, high-dimensional actions conditioned on diverse visual, language and proprioceptive inputs. We leverage flow matching with consistency training to enable high-quality dexterous action generation in just 1-2 inference steps. To handle diverse input modalities efficiently, we propose DiT-X, a diffusion transformer architecture with adaptive cross-attention and AdaLN-Zero conditioning that enables fine-grained feature interactions between action tokens and multi-modal observations. ManiFlow demonstrates consistent improvements across diverse simulation benchmarks and nearly doubles success rates on real-world tasks across single-arm, bimanual, and humanoid robot setups with increasing dexterity. The extensive evaluation further demonstrates the strong robustness and generalizability of ManiFlow to novel objects and background changes, and highlights its strong scaling capability with larger-scale datasets. Our website: maniflow-policy.github.io.

ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

TL;DR

Abstract

ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (19)