DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

Zihao Zheng; Hangyu Cao; Sicheng Tian; Jiayu Chen; Maoliang Li; Xinhao Sun; Hailong Zou; Zhaobo Zhang; Xuanzhe Liu; Donggang Cao; Hong Mei; Xiang Chen

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

Zihao Zheng, Hangyu Cao, Sicheng Tian, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen

TL;DR

DyQ-VLA is proposed, a dynamic quantization framework for VLAs that requires only 30.9% of the original memory footprint while maintaining 99.5% of its original performance, achieving 1.49x simulation and up to 1.43x real-world speedups.

Abstract

Vision-Language-Action (VLA) models are dominant in embodied intelligence but are constrained by inference overheads. While model quantization alleviates these bottlenecks for edge deployment, static quantization approaches remain suboptimal for VLAs due to two critical challenges: (1) Temporal-dynamic sensitivity, where fixed precision wastes resources by ignoring stage-varying error tolerances; and (2) Real-time allocation, where identifying real-time sensitivity to guide bit allocation remains unsolved. To address these challenges, we propose DyQ-VLA, a dynamic quantization framework for VLAs. Specifically, a sensitivity-aware switching strategy leverages real-time kinematic proxies to trigger the bit-width switch, while a kinematic-guided module dynamically allocates the optimal bit-width. Experiments show that DyQ-VLA requires only 30.9% of the original memory footprint while maintaining 99.5% of its original performance, achieving 1.49x simulation and up to 1.43x real-world speedups.

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

TL;DR

Abstract

Paper Structure (37 sections, 6 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Preliminary
VLA Models
Model Architecture
Autoregressive Inference
Model Quantization
Quantization for VLA Models
Challenge for VLA Model Quantization
Observation and Motivation
Temporal Dynamics of VLA's Quantization Sensitivity
Correlation Between Kinematic Metrics and Sensitivity
DyQ-VLA Framework
Sensitivity-Aware Precision Switching Strategy
Static W-Quant and Dynamic A-Quant Paradigm
Kinematic-Driven Sensitivity Fusion
...and 22 more sections

Figures (7)

Figure 1: (a) Challenges for VLA Model Quantization. (b) Overview of the proposed DyQ-VLA Framework
Figure 2: (a) Non-linear relationship between local action error and success rate. (b) Temporal-dynamic profiling of the sensitivity metric
Figure 3: (a) Smooth macro-alignment of Motion Fineness with sensitivity.(b) High Variance spike-alignment of Angular Jerk with sensitivity.
Figure 4: Overview of the DyQ-VLA framework. (a) The sensitivity-aware precision switching strategy. (b) The kinematic-guided bit allocation module.
Figure 5: System Implementation of DyQ-VLA Framework
...and 2 more figures

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

TL;DR

Abstract

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)