Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, Jungwook Choi
TL;DR
This paper tackles deploying imitation-learning policies on resource-limited hardware by proposing Quantization-Aware Imitation Learning (QAIL) and Quantization-Robust Behavior Cloning (QBC). By training quantized policies with QAIL and aligning them to full-precision policies via QBC, the approach mitigates cumulative quantization errors in sequential decision tasks. Across robot manipulation, autonomous driving, and physics simulation, 4-bit weight (and 4-bit weight+activation) quantization yields substantial speedups and energy savings while preserving near FP performance; additional 8-bit results show further gains on CPU. The methods enable practical deployment of IL-based policies on edge devices, offering a scalable path for efficient, robust, on-device robotic control and autonomous systems.
Abstract
Deep neural network (DNN)-based policy models like vision-language-action (VLA) models are transformative in automating complex decision-making across applications by interpreting multi-modal data. However, scaling these models greatly increases computational costs, which presents challenges in fields like robot manipulation and autonomous driving that require quick, accurate responses. To address the need for deployment on resource-limited hardware, we propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors during training, thereby maintaining efficiency and reliability under constrained conditions. Our evaluations with representative robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings while preserving accuracy. For 4-bit weight and activation quantized self-driving models, the framework achieves up to 3.7x speedup and 3.1x energy saving on a low-end GPU. These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.
