Table of Contents
Fetching ...

Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

Chen Yang, Yucheng Hu, Yunchao Ma, Yunhuan Yang, Jing Tan, Haoqiang Fan

Abstract

In deployment of the VLA models to real-world robotic tasks, execution speed matters. In previous work arXiv:2510.26742 we analyze how to make neural computation of VLAs on GPU fast. However, we leave the question of how to actually deploy the VLA system on the real robots open. In this report we describe a set of practical techniques to achieve the end-to-end result of running a VLA-driven robot at an impressive speed in real world tasks that require both accuracy and dexterity. The stack of technology ranges across calibration, planning & control, and learning based method to identify optimal execution speed. In the tasks we show, the robot even executes in a speed on par with casual human operation and approaching the hardware limit of our lightweight arm. The unaccelerated videos and inference traces are provided in https://dexmal.github.io/realtime-vla-v2/.

Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

Abstract

In deployment of the VLA models to real-world robotic tasks, execution speed matters. In previous work arXiv:2510.26742 we analyze how to make neural computation of VLAs on GPU fast. However, we leave the question of how to actually deploy the VLA system on the real robots open. In this report we describe a set of practical techniques to achieve the end-to-end result of running a VLA-driven robot at an impressive speed in real world tasks that require both accuracy and dexterity. The stack of technology ranges across calibration, planning & control, and learning based method to identify optimal execution speed. In the tasks we show, the robot even executes in a speed on par with casual human operation and approaching the hardware limit of our lightweight arm. The unaccelerated videos and inference traces are provided in https://dexmal.github.io/realtime-vla-v2/.

Paper Structure

This paper contains 17 sections, 4 equations, 8 figures.

Figures (8)

  • Figure 1: Example tasks used in this paper. See https://dexmal.github.io/realtime-vla-v2/ for unaccelerated videos.
  • Figure 2: Time delay in the system and train-inference discrepancy.
  • Figure 3: Effect of pre-amplification of robot command. We have to increase the magnitude of change in the robot command so that the actual position tracks model's target.
  • Figure 4: Post-processing framework of VLA's trajectory.
  • Figure 5: Distribution of failure timestamps during autonomous rollouts at $1\times$, $2\times$, and $3\times$ execution speeds. At each speed, failures cluster around a distinct bottleneck stage, revealing a small set of phases that dominate the overall failure rate.
  • ...and 3 more figures