Table of Contents
Fetching ...

Modular Quantization-Aware Training for 6D Object Pose Estimation

Saqib Javed, Chengkun Li, Andrew Price, Yinlin Hu, Mathieu Salzmann

TL;DR

Modular Quantization-Aware Training (MQAT) is introduced, an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D pose estimation architectures to produce quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.

Abstract

Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques. Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Remarkably, MQAT-trained quantized models achieve a significant accuracy boost (>7%) over the baseline full-precision network while reducing model size by a factor of 4x or more. Our project website is at: https://saqibjaved1.github.io/MQAT_/

Modular Quantization-Aware Training for 6D Object Pose Estimation

TL;DR

Modular Quantization-Aware Training (MQAT) is introduced, an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D pose estimation architectures to produce quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.

Abstract

Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques. Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Remarkably, MQAT-trained quantized models achieve a significant accuracy boost (>7%) over the baseline full-precision network while reducing model size by a factor of 4x or more. Our project website is at: https://saqibjaved1.github.io/MQAT_/
Paper Structure (37 sections, 5 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 5 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: Summary of this Work.In contrast to uniform and mixed-precision quantization, MQAT accounts for the modularity of typical 6D object pose estimation frameworks. Uniform QAT quantizes an entire network simultaneously and uniformly; in mixed-precision QAT, layers are quantized to varying bit precisions regardless of their position; in contrast, MQAT applies quantization to network modules in a proposed order, with each module assigned the optimal bit precision. MQAT not only reduces the memory footprint of the network but can result in an accuracy boost that neither uniform nor mixed-precision quantization have demonstrated. The results shown in the figure represent a comparison of different quantization methods applied to the WDR network, evaluated on the SwissCube dataset.
  • Figure 2: Performance Comparison on Occluded-LINEMOD. The marker size is proportional to the memory footprint. Individual Models refers to methods training one model for each object. One Model refers to methods training a single model for all objects.
  • Figure 3: Representative 6D Object Pose Estimation Network with $K=3$ modules. From left to right, we denote them as backbone, feature aggregation, and heads.
  • Figure 4: A heuristic search of quantization flow sequences to demonstrate quantization flow optimality for K=3 module WDR. This corresponds to lines 1-7 in Algo.\ref{['MQAT']}.
  • Figure 5: Ablation study on the FPN bit-width. We compare the performance by varying the bit-width of the feature aggregation module in each model.
  • ...and 3 more figures