Table of Contents
Fetching ...

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Shaolong Li, Lichao Sun, Yongchao Chen

Abstract

Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while accelerating action generation by over $100\times$. We further integrate OFP into the $π_{0.5}$ model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Abstract

Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while accelerating action generation by over . We further integrate OFP into the model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.
Paper Structure (54 sections, 1 theorem, 76 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 54 sections, 1 theorem, 76 equations, 8 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

Under the assumption that the interval-velocity field is continuously differentiable and globally Lipschitz continuous in state, when the contracting factor $\rho(s) \to 0$ and the EMA teacher $\mathbf u_{\theta^-}$ is accurate, $\mathbf{u}_{\text{target}}$ becomes a consistent supervision signal fo

Figures (8)

  • Figure 1: Averaged across 56 tasks. Evaluated at NFE=1, OFP outperforms all other single-step baselines and accelerates generation by over 100$\times$ compared to DP3 and FM Policy (NFE=100).
  • Figure 2: Self-Distillation of One-Step Flow Policies. (a) Self-Consistency Training: The model learns an interval-averaged velocity field by matching predictions across nested sub-intervals, enforcing temporal coherence along the marginal flow trajectory. (b) Self-Guided Training: By leveraging Classifier-Free Guidance on the model's own predictions, we extract a distribution-level correction signal. The regularization repels single-step predictions from the unconditional prior and sharpens the generated actions toward the high-density modes of the expert data.
  • Figure 3: Warm-Start Action Prior for One-Step Inference. The unexecuted suffix of the previously generated action chunk is shifted and padded with the terminal action to form a full-length temporal prior. By starting closer to the target data manifold rather than from pure Gaussian noise, this initialization reduces the required transport distance.
  • Figure 4: Data Scaling Behavior. OFP extracts higher utility from sparse data (20 demos) and continues to scale cleanly as data increases, avoiding the performance degradation seen in MP1 at 150 demos.
  • Figure 5: Transfer to $\pi_{0.5}$ on RoboTwin 2.0. All acceleration methods are evaluated at NFE=1, and the $\pi_{0.5}$ baseline operates at NFE=10. OFP achieves the best average success rate across four tasks, showing that OFP remains effective even for large-scale VLA models with richer multi-modal inputs.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1