Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

Jun Chen; Shipeng Bai; Tianxin Huang; Mengmeng Wang; Guanzhong Tian; Yong Liu

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, Yong Liu

TL;DR

The paper tackles data-free quantization at ultra-low precision by proposing Data-Free Mixed-Precision Compensation (DF-MPC), which compensates low-bitwidth layer errors with a high-bitwidth reconstruction in the subsequent layer without using original or synthetic data. It formalizes a layer-wise mixed-precision framework and derives a closed-form solution that minimizes a feature-map reconstruction loss between the full-precision and mixed-precision models. Empirical results on CIFAR and ImageNet show DF-MPC can outperform prior data-free methods across multiple architectures, with improved accuracy and efficiency, while realising a flatter loss landscape and weight distributions closer to zero after compensation. The work highlights a practical, data-free path to deploy ultra-low-precision networks, albeit with some performance gaps relative to synthetic-data methods, and suggests directions for future enhancement via feature-map estimation.

Abstract

Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data-free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

TL;DR

Abstract

Paper Structure (15 sections, 11 equations, 5 figures, 4 tables)

This paper contains 15 sections, 11 equations, 5 figures, 4 tables.

Introduction
Related Work
Quantization-Aware Training (QAT)
Post-Training Quantization (PTQ)
Data-Free Quantization (DFQ)
Problem Formulation of Data-Free Quantization
Background and Notations
Problem Statement
Proposed Method of Mixed-Precision Compensation
Compensation Assumption
Experiments
Ablation Study on CIFAR
Experiments on ImageNet
Visualization
Conclusion

Figures (5)

Figure 1: The overview of our DF-MPC method, where the filter in the $l$-th layer is quantized to low-bitwidth and the filter in the $(l+1)$-th layer is quantized to high-bitwidth. The output of $(l+1)$-th convolutional layer can be restored by multiplying the compensation coefficient with respect to the input channel of the high-bitwidth filter, which is equivalent to multiplying the compensation coefficient with respect to the output channel of the low-bitwidth filter. Note that the reconstruction loss is the output difference of $(l+1)$-th layer from the pre-trained full-precision model and its layer-wise mixed-precision quantized model.
Figure 2: The layer-wise mixed-precision structures of some main deep neural networks. (a): a building block for ResNet18/ResNet34. (b): a bottleneck block for ResNet50/ResNet101. (c): a dense block for DenseNet. (d): a building block for deep neural networks.
Figure 3: The accuracy comparison of different $\lambda_1$ and $\lambda_2$ values in Eq. (\ref{['solution']}). On CIFAR10 with ResNet56, $\lambda_1$ and $\lambda_2$ vary from 0.1 to 0.6 and from 0 to 0.01, respectively.
Figure 4: The 6-bit quantized weight distribution before and after compensation on CIFAR10 dataset. The mean of the compensated weight distribution is closer to zero.
Figure 5: The loss surfaces of the mixed-precision ResNet56 before and after compensation on CIFAR10 dataset, which reflects the sharpness/flatness of different quantized weights.

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

TL;DR

Abstract

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)