Table of Contents
Fetching ...

ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action Models

Zhou Fang, Jiaqi Wang, Yi Zhou, Qiongfeng Shi

Abstract

Recent Vision-Language-Action (VLA) models equipped with Flow Matching (FM) action heads achieve state-of-the-art performance in complex robot manipulation. However, the multi-step iterative ODE solving required by FM introduces inference latency that precludes responsive physical control. While current acceleration efforts optimize the Vision-Language Model (VLM) backbone, the action head bottleneck remains overlooked. To address this, we propose ProbeFlow, a training-free adaptive inference framework tai- lored for continuous robotic control. By evaluating geometric trajectory complexity via the cosine similarity between initial and lookahead velocity vectors, ProbeFlow dynamically sched- ules integration steps to prune redundant network evaluations. On the MetaWorld benchmark, it accelerates action decoding by 14.8x (reducing average steps from N = 50 to 2.6) and cuts end-to-end system latency by 2.8x without compromising the manipulation success rate. On the long-horizon LIBERO benchmark, the probe automatically allocates a denser schedule to navigate semantic bottlenecks, effectively resolving the flow solver delay. Real-world physical deployments confirm that ProbeFlow successfully mitigates action decoding latency while ensuring execution stability, offering a highly practical solution for low-latency continuous generative policies.

ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action Models

Abstract

Recent Vision-Language-Action (VLA) models equipped with Flow Matching (FM) action heads achieve state-of-the-art performance in complex robot manipulation. However, the multi-step iterative ODE solving required by FM introduces inference latency that precludes responsive physical control. While current acceleration efforts optimize the Vision-Language Model (VLM) backbone, the action head bottleneck remains overlooked. To address this, we propose ProbeFlow, a training-free adaptive inference framework tai- lored for continuous robotic control. By evaluating geometric trajectory complexity via the cosine similarity between initial and lookahead velocity vectors, ProbeFlow dynamically sched- ules integration steps to prune redundant network evaluations. On the MetaWorld benchmark, it accelerates action decoding by 14.8x (reducing average steps from N = 50 to 2.6) and cuts end-to-end system latency by 2.8x without compromising the manipulation success rate. On the long-horizon LIBERO benchmark, the probe automatically allocates a denser schedule to navigate semantic bottlenecks, effectively resolving the flow solver delay. Real-world physical deployments confirm that ProbeFlow successfully mitigates action decoding latency while ensuring execution stability, offering a highly practical solution for low-latency continuous generative policies.
Paper Structure (20 sections, 11 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the proposed ProbeFlow framework. Left (Flow Matching Principle): Illustrates how generative probability paths exhibit varying degrees of curvature from prior to target action distributions. Middle (Lookahead Linearity Probe & Dynamic Step Scheduler): The probe evaluates trajectory complexity via the cosine similarity $\mathcal{S}$ between initial and lookahead velocities. The scheduler then dynamically maps $\mathcal{S}$ to an adaptive step count $N$ bounded by real-time budget constraints. Right (Curvature Profile): In linear regions ($\mathcal{S} \approx 1$), it performs sparse integration by fully reusing probed states; in curved regions ($\mathcal{S} \ll 1$), it executes dense integration to bound truncation errors while still reusing the initial evaluation.
  • Figure 2: Geometric principle of the Lookahead Linearity Probe in ProbeFlow. Left (Linear Region): The linear probe step $\boldsymbol{x}_{\mathrm{probe}}$ stays aligned with the true trajectory. The local field $\boldsymbol{v}_{\mathrm{probe}}$ (blue) perfectly aligns with the initial velocity $\boldsymbol{v}_{\mathrm{start}}$ (red), yielding high similarity ($\mathcal{S} \approx 1$) and allowing aggressive step pruning. Right (Curved Region): The linear probe overshoots the true curved trajectory. The resulting angular deviation $\theta$ between the ghost initial velocity (dashed red) and the actual local field (blue) causes a sharp drop in cosine similarity ($\mathcal{S} = \cos \theta \ll 1$), correctly triggering a denser integration schedule to bound truncation errors.
  • Figure 3: Qualitative analysis of adaptive step scheduling. Top: In the complex Basketball task, ProbeFlow dynamically allocates denser steps (red) near critical interaction bottlenecks, such as object grasping and precise insertion. Bottom: In the Button Press task, it reliably assigns minimal steps (blue) during the linear transit phase.
  • Figure 4: Ablation study on the Lookahead Probe Horizon ($\Delta t_{\mathrm{probe}}$) on MetaWorld. The dual-axis plot illustrates the critical trade-off between manipulation success rate (solid blue) and computational efficiency (dashed orange).
  • Figure 5: Task progress of the real-world "Pick-and-Place" experiment. The sequence demonstrates ProbeFlow successfully executing continuous manipulation with a low-latency control loop.