One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Shaolong Li; Lichao Sun; Yongchao Chen

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Shaolong Li, Lichao Sun, Yongchao Chen

Abstract

Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while accelerating action generation by over $100\times$. We further integrate OFP into the $π_{0.5}$ model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Abstract

. We further integrate OFP into the

model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.

Paper Structure (54 sections, 1 theorem, 76 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 54 sections, 1 theorem, 76 equations, 8 figures, 8 tables, 1 algorithm.

Introduction
Preliminaries
Flow Matching
Flow-Based Generative Policy
One-Step Flow Policy
Self-Consistency Training
Self-Guided Regularization
Warm-Start for One-Step Inference
Experiments
Experimental Setup
Experimental Results
Integration with VLA Models
Related Work
Generative Policies for Robot Control
Distillation and Acceleration
...and 39 more sections

Key Result

Proposition 1

Under the assumption that the interval-velocity field is continuously differentiable and globally Lipschitz continuous in state, when the contracting factor $\rho(s) \to 0$ and the EMA teacher $\mathbf u_{\theta^-}$ is accurate, $\mathbf{u}_{\text{target}}$ becomes a consistent supervision signal fo

Figures (8)

Figure 1: Averaged across 56 tasks. Evaluated at NFE=1, OFP outperforms all other single-step baselines and accelerates generation by over 100$\times$ compared to DP3 and FM Policy (NFE=100).
Figure 2: Self-Distillation of One-Step Flow Policies. (a) Self-Consistency Training: The model learns an interval-averaged velocity field by matching predictions across nested sub-intervals, enforcing temporal coherence along the marginal flow trajectory. (b) Self-Guided Training: By leveraging Classifier-Free Guidance on the model's own predictions, we extract a distribution-level correction signal. The regularization repels single-step predictions from the unconditional prior and sharpens the generated actions toward the high-density modes of the expert data.
Figure 3: Warm-Start Action Prior for One-Step Inference. The unexecuted suffix of the previously generated action chunk is shifted and padded with the terminal action to form a full-length temporal prior. By starting closer to the target data manifold rather than from pure Gaussian noise, this initialization reduces the required transport distance.
Figure 4: Data Scaling Behavior. OFP extracts higher utility from sparse data (20 demos) and continues to scale cleanly as data increases, avoiding the performance degradation seen in MP1 at 150 demos.
Figure 5: Transfer to $\pi_{0.5}$ on RoboTwin 2.0. All acceleration methods are evaluated at NFE=1, and the $\pi_{0.5}$ baseline operates at NFE=10. OFP achieves the best average success rate across four tasks, showing that OFP remains effective even for large-scale VLA models with richer multi-modal inputs.
...and 3 more figures

Theorems & Definitions (1)

Proposition 1

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Abstract

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Authors

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (1)