Table of Contents
Fetching ...

MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation

Juyi Sheng, Ziyi Wang, Peiming Li, Mengyuan Liu

TL;DR

MP1 introduces MeanFlow-based one-step trajectory generation for robot manipulation conditioned on 3D point clouds, eliminating the need for iterative denoising and explicit consistency losses. It adds a lightweight Dispersive Loss to improve few-shot generalization by dispersing latent embeddings, while employing Classifier-Free Guidance to enhance trajectory controllability. Empirical results on Adroit and Meta-World benchmarks show MP1 outperforms diffusion- and flow-based baselines in both success rate and inference speed, achieving 6.8 ms average latency. Real-world experiments on a dual-arm robot corroborate MP1's robustness and rapid execution, highlighting its practical potential for real-time robotics.

Abstract

In robot manipulation, robot learning has become a prevailing approach. However, generative models within this field face a fundamental trade-off between the slow, iterative sampling of diffusion models and the architectural constraints of faster Flow-based methods, which often rely on explicit consistency losses. To address these limitations, we introduce MP1, which pairs 3D point-cloud inputs with the MeanFlow paradigm to generate action trajectories in one network function evaluation (1-NFE). By directly learning the interval-averaged velocity via the "MeanFlow Identity", our policy avoids any additional consistency constraints. This formulation eliminates numerical ODE-solver errors during inference, yielding more precise trajectories. MP1 further incorporates CFG for improved trajectory controllability while retaining 1-NFE inference without reintroducing structural constraints. Because subtle scene-context variations are critical for robot learning, especially in few-shot learning, we introduce a lightweight Dispersive Loss that repels state embeddings during training, boosting generalization without slowing inference. We validate our method on the Adroit and Meta-World benchmarks, as well as in real-world scenarios. Experimental results show MP1 achieves superior average task success rates, outperforming DP3 by 10.2% and FlowPolicy by 7.3%. Its average inference time is only 6.8 ms-19x faster than DP3 and nearly 2x faster than FlowPolicy. Our project page is available at https://mp1-2254.github.io/, and the code can be accessed at https://github.com/LogSSim/MP1.

MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation

TL;DR

MP1 introduces MeanFlow-based one-step trajectory generation for robot manipulation conditioned on 3D point clouds, eliminating the need for iterative denoising and explicit consistency losses. It adds a lightweight Dispersive Loss to improve few-shot generalization by dispersing latent embeddings, while employing Classifier-Free Guidance to enhance trajectory controllability. Empirical results on Adroit and Meta-World benchmarks show MP1 outperforms diffusion- and flow-based baselines in both success rate and inference speed, achieving 6.8 ms average latency. Real-world experiments on a dual-arm robot corroborate MP1's robustness and rapid execution, highlighting its practical potential for real-time robotics.

Abstract

In robot manipulation, robot learning has become a prevailing approach. However, generative models within this field face a fundamental trade-off between the slow, iterative sampling of diffusion models and the architectural constraints of faster Flow-based methods, which often rely on explicit consistency losses. To address these limitations, we introduce MP1, which pairs 3D point-cloud inputs with the MeanFlow paradigm to generate action trajectories in one network function evaluation (1-NFE). By directly learning the interval-averaged velocity via the "MeanFlow Identity", our policy avoids any additional consistency constraints. This formulation eliminates numerical ODE-solver errors during inference, yielding more precise trajectories. MP1 further incorporates CFG for improved trajectory controllability while retaining 1-NFE inference without reintroducing structural constraints. Because subtle scene-context variations are critical for robot learning, especially in few-shot learning, we introduce a lightweight Dispersive Loss that repels state embeddings during training, boosting generalization without slowing inference. We validate our method on the Adroit and Meta-World benchmarks, as well as in real-world scenarios. Experimental results show MP1 achieves superior average task success rates, outperforming DP3 by 10.2% and FlowPolicy by 7.3%. Its average inference time is only 6.8 ms-19x faster than DP3 and nearly 2x faster than FlowPolicy. Our project page is available at https://mp1-2254.github.io/, and the code can be accessed at https://github.com/LogSSim/MP1.

Paper Structure

This paper contains 24 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The proposed method outperforms SOTA methods (DP3 dp3 and FlowPolicy flowpolicy) on the Adroit and Meta-World tasks, showing superior inference time and success rate, as demonstrated by the MP1 on the comparison plot.
  • Figure 2: Overview of MP1. The MP1 takes the historical observation point cloud and the robot's state as inputs. These inputs are processed through a visual encoder and a state encoder, respectively, and then serve as conditional inputs to the UNet-integrated MeanFlow. After passing through the MeanFlow, the model computes regression loss ($\mathcal{L}_{cfg}$) between the mean velocity generated from the initial noise and the target velocity. This $\mathcal{L}_{cfg}$ is combined with a Dispersive Loss ($\mathcal{L}_{disp}$) imposed on the UNet’s hidden states to jointly optimize the network parameters.
  • Figure 3: Qualitative comparison of the proposed MP1 and the previous SOTA method (FlowPolicy flowpolicy) on Adroit Hammer and real-world Hammer tasks. Our method is faster, with 7.1ms in the simulated hammer and 18.6s in the real-world scenario. Moreover, our method successfully completes the real-world hammer task, whereas FlowPolicy fails.
  • Figure 4: Success rate curves of different methods on multiple Meta-World tasks. We compare the performance of MP1, FlowPolicy, and DP3 on four tasks. The x-axis represents training steps, and the y-axis shows the success rate. Shaded areas represent the standard deviation across different random seeds. The proposed method achieves higher success rates with smaller variance.
  • Figure 5: The effect of the number of demonstrations on different methods. As the number increases, the success rate gradually improves.
  • ...and 1 more figures