Table of Contents
Fetching ...

Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control

Seongmin Park, Hyungmin Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, Jungwook Choi

TL;DR

This paper tackles deploying imitation-learning policies on resource-limited hardware by proposing Quantization-Aware Imitation Learning (QAIL) and Quantization-Robust Behavior Cloning (QBC). By training quantized policies with QAIL and aligning them to full-precision policies via QBC, the approach mitigates cumulative quantization errors in sequential decision tasks. Across robot manipulation, autonomous driving, and physics simulation, 4-bit weight (and 4-bit weight+activation) quantization yields substantial speedups and energy savings while preserving near FP performance; additional 8-bit results show further gains on CPU. The methods enable practical deployment of IL-based policies on edge devices, offering a scalable path for efficient, robust, on-device robotic control and autonomous systems.

Abstract

Deep neural network (DNN)-based policy models like vision-language-action (VLA) models are transformative in automating complex decision-making across applications by interpreting multi-modal data. However, scaling these models greatly increases computational costs, which presents challenges in fields like robot manipulation and autonomous driving that require quick, accurate responses. To address the need for deployment on resource-limited hardware, we propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors during training, thereby maintaining efficiency and reliability under constrained conditions. Our evaluations with representative robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings while preserving accuracy. For 4-bit weight and activation quantized self-driving models, the framework achieves up to 3.7x speedup and 3.1x energy saving on a low-end GPU. These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.

Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control

TL;DR

This paper tackles deploying imitation-learning policies on resource-limited hardware by proposing Quantization-Aware Imitation Learning (QAIL) and Quantization-Robust Behavior Cloning (QBC). By training quantized policies with QAIL and aligning them to full-precision policies via QBC, the approach mitigates cumulative quantization errors in sequential decision tasks. Across robot manipulation, autonomous driving, and physics simulation, 4-bit weight (and 4-bit weight+activation) quantization yields substantial speedups and energy savings while preserving near FP performance; additional 8-bit results show further gains on CPU. The methods enable practical deployment of IL-based policies on edge devices, offering a scalable path for efficient, robust, on-device robotic control and autonomous systems.

Abstract

Deep neural network (DNN)-based policy models like vision-language-action (VLA) models are transformative in automating complex decision-making across applications by interpreting multi-modal data. However, scaling these models greatly increases computational costs, which presents challenges in fields like robot manipulation and autonomous driving that require quick, accurate responses. To address the need for deployment on resource-limited hardware, we propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors during training, thereby maintaining efficiency and reliability under constrained conditions. Our evaluations with representative robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings while preserving accuracy. For 4-bit weight and activation quantized self-driving models, the framework achieves up to 3.7x speedup and 3.1x energy saving on a low-end GPU. These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.

Paper Structure

This paper contains 31 sections, 10 equations, 12 figures, 14 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a) Differences in robot action between INT4 quantization and Bfloat16 in OpenVLA on LIBERO. Bfloat16 successfully places the mug inside the microwave and closes the door, whereas INT4 quantization fails in precise action, resulting in the inability to place the mug inside the microwave, leading to task failure. (b) Driving differences of quantized agent CILRS (W4A4) at intersections based on benchmark difficulty. (Left) In the relatively easier NoCrash-busy benchmark, the agent drives through intersections without collisions, but (Right) in the more challenging NoCrash-dense benchmark with many pedestrians and vehicles, collisions with other vehicles occur. Note that $t$ represents the timestep.
  • Figure 2: Comparison of the structure, number of parameters of DNN-based policy models.
  • Figure 3: Overview of QAIL+QBC
  • Figure 4: Action accuracy Comparison.
  • Figure 5: Comparison of attention visualization for tasks successfully completed on the LIBERO-Spatial benchmark. Additional examples are provided in \ref{['sec:appendix_map']}.
  • ...and 7 more figures