SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Minjun Kim; Jongjin Kim; U Kang

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Minjun Kim, Jongjin Kim, U Kang

Abstract

How can we accurately quantize a pre-trained model without any data? Quantization algorithms are widely used for deploying neural networks on resource-constrained edge devices. Zero-shot Quantization (ZSQ) addresses the crucial and practical scenario where training data are inaccessible for privacy or security reasons. However, three significant challenges hinder the performance of existing ZSQ methods: 1) noise in the synthetic dataset, 2) predictions based on off-target patterns, and the 3) misguidance by erroneous hard labels. In this paper, we propose SynQ (Synthesis-aware Fine-tuning for Zero-shot Quantization), a carefully designed ZSQ framework to overcome the limitations of existing methods. SynQ minimizes the noise from the generated samples by exploiting a low-pass filter. Then, SynQ trains the quantized model to improve accuracy by aligning its class activation map with the pre-trained model. Furthermore, SynQ mitigates misguidance from the pre-trained model's error by leveraging only soft labels for difficult samples. Extensive experiments show that SynQ provides the state-of-the-art accuracy, over existing ZSQ methods.

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Abstract

Paper Structure (36 sections, 1 theorem, 8 equations, 13 figures, 10 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 8 equations, 13 figures, 10 tables, 1 algorithm.

Introduction
Preliminaries and Problem Definition
Zero-shot Quantization
Difficulty of an Image
Problem Definition
Observation
Proposed Method
Overview
Low-pass Filter
Alignment of Class Activation Map
Soft Labels for Difficult Samples
Objective Function
Experiments
Experimental Setup
Accuracy in CNN Quantization (Q1)
...and 21 more sections

Key Result

Theorem 1

Given a model with an inference complexity of $O(T_{\theta})$, the time complexity for the quantization procedure (Algorithm alg:method) of SynQ is $O(NLT_{\theta})$.

Figures (13)

Figure 1: Comparison between (a) real images in ImageNet dataset and (b) generated samples in the synthetic dataset from TexQ TexQ. Each set displays samples labeled as timber wolf, tobacco shop, aircraft carrier, and beaker. We present the average magnitude spectrum for a randomly selected batch of 256 images from each dataset, highlighting their distinct differences.
Figure 2: Grad-CAM GradCAM plot of the (a) input by the (b) pre-trained ResNet-18 model on ImageNet dataset, the (c) 3bit quantized model by TexQ, and the (d) 3bit quantized model by SynQ. While TexQ fails to capture the correct image region, SynQ captures the region closely matching the pre-trained model.
Figure 3: Error rates of pre-trained ResNet-20 on CIFAR-10 (yellow) and CIFAR-100 (green), and ResNet-18 on ImageNet (purple) by difficulty. Error rate rapidly grows as the difficulty exceeds 0.5.
Figure 4: Overall architecture of SynQ. Our main ideas are 1) low-pass filter, 2) alignment of class activation map, and 3) soft labels for difficult samples. See Section \ref{['sec:method']} for details.
Figure 5: Comparison of amplitude distribution among (a) ImageNet dataset, (b) synthetic dataset by TexQ, and (c) filtered samples. After filtering, the distribution closely aligns with that of real images.
...and 8 more figures

Theorems & Definitions (3)

Theorem 1: Time Complexity of SynQ
proof
proof

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Abstract

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Authors

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)