FASTER: Rethinking Real-Time Flow VLAs

Yuxiang Lu; Zhe Liu; Xianzhe Fan; Zhenya Yang; Jinghua Hou; Junyi Li; Kaixin Ding; Hengshuang Zhao

FASTER: Rethinking Real-Time Flow VLAs

Yuxiang Lu, Zhe Liu, Xianzhe Fan, Zhenya Yang, Jinghua Hou, Junyi Li, Kaixin Ding, Hengshuang Zhao

Abstract

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $π_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

FASTER: Rethinking Real-Time Flow VLAs

Abstract

and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

Paper Structure (29 sections, 10 equations, 13 figures, 11 tables, 2 algorithms)

This paper contains 29 sections, 10 equations, 13 figures, 11 tables, 2 algorithms.

Introduction
Related Work
Analysis on Action Chunking Policy Inference
Methodology
Preliminaries
Pilot Study on Action Chunk Sampling
FASTER
Experiments
Experimental Analysis on Reaction Speed
Real-world Experiments
Simulation Experiments
Conclusion
Details of Asynchronous Inference Pipeline
Additional Results in Pilot Study
Additional Methodological Details
...and 14 more sections

Figures (13)

Figure 1: We propose FASTER to alleviate the reaction latency bottleneck in action chunking flow policies. By compressing the sampling iterations of the immediate reaction into a single step, FASTER (bottom) achieves 10$\times$ acceleration compared to original $\pi_{0.5}$ and X-VLA (top). This enables real-time responsiveness in highly dynamic tasks such as playing table tennis. FASTER is a plug-and-play solution for flow-based VLAs, demanding no architectural modifications or additional training.
Figure 2: Temporal pipelines of (a) synchronous and (b) asynchronous inference in a robotic system composed of an action chunking policy server and a robot client. As indicated by the best and worst cases, reaction time depends on both inference latency and the interval between consecutive inference-execution cycles. We also illustrate the decomposition of two adjacent action chunks to clarify the discretized inference delay $d$ and the execution horizon $s$ (bounded by $s_{\text{min}}$ and $H-d$) in the asynchronous client.
Figure 3: Visualizations of (a) straightness $S(\mathbf{A})$ of the denoising path during sampling of the action chunk, and (b) differences between the intermediate clean action estimates $\tilde{\mathbf{A}}_{t}^{\tau\rightarrow 0}$ at each sampling timestep $\tau$ and the final output $\mathbf{A}_{t}^{0}$.
Figure 4: Illustration of (a) constant timestep schedule used in conventional flow sampling and (b) Horizon-Aware Schedule (HAS) used in FASTER that allocates adaptive hit times across the action chunk and accelerates the sampling of early actions, enabling streaming output.
Figure 5: Comparison of real-world reaction speed on the table tennis task. Left: Visualization of rollouts on RTX 4090, the third column corresponds to the contact moment, and the interval between each image in a row is 166.7ms. Right: Quantitative completion scores on two GPUs.
...and 8 more figures

FASTER: Rethinking Real-Time Flow VLAs

Abstract

FASTER: Rethinking Real-Time Flow VLAs

Authors

Abstract

Table of Contents

Figures (13)