Table of Contents
Fetching ...

Sharpness-Aware Data Generation for Zero-shot Quantization

Dung Hoang-Anh, Cuong Pham Trung Le, Jianfei Cai, Thanh-Toan Do

TL;DR

This work tackles zero-shot quantization by generating synthetic data that actively reduces the sharpness of the resulting quantized model. It introduces Sharpness-Aware Data Generation (SADAG), which links sharpness minimization to gradient matching of reconstruction losses and replaces the need for real validation data with a neighborhood-based gradient matching mechanism inspired by Sharpness-Aware Minimization (SAM). The method combines BN statistics alignment, data diversity, and gradient-based calibration into a final objective and demonstrates state-of-the-art performance on CIFAR-100 and ImageNet under low-bit quantization, with only modest additional computation. The results suggest that considering model sharpness during synthetic data generation yields better generalization for quantized models in data-free settings, offering a practical approach for deploying low-bit quantized networks on resource-constrained devices.

Abstract

Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing the full-precision model. While it is well-known that deep neural networks with low sharpness have better generalization ability, none of the previous zero-shot quantization works considers the sharpness of the quantized model as a criterion for generating training data. This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization. Specifically, we first demonstrate that sharpness minimization can be attained by maximizing gradient matching between the reconstruction loss gradients computed on synthetic and real validation data, under certain assumptions. We then circumvent the problem of the gradient matching without real validation set by approximating it with the gradient matching between each generated sample and its neighbors. Experimental evaluations on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.

Sharpness-Aware Data Generation for Zero-shot Quantization

TL;DR

This work tackles zero-shot quantization by generating synthetic data that actively reduces the sharpness of the resulting quantized model. It introduces Sharpness-Aware Data Generation (SADAG), which links sharpness minimization to gradient matching of reconstruction losses and replaces the need for real validation data with a neighborhood-based gradient matching mechanism inspired by Sharpness-Aware Minimization (SAM). The method combines BN statistics alignment, data diversity, and gradient-based calibration into a final objective and demonstrates state-of-the-art performance on CIFAR-100 and ImageNet under low-bit quantization, with only modest additional computation. The results suggest that considering model sharpness during synthetic data generation yields better generalization for quantized models in data-free settings, offering a practical approach for deploying low-bit quantized networks on resource-constrained devices.

Abstract

Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing the full-precision model. While it is well-known that deep neural networks with low sharpness have better generalization ability, none of the previous zero-shot quantization works considers the sharpness of the quantized model as a criterion for generating training data. This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization. Specifically, we first demonstrate that sharpness minimization can be attained by maximizing gradient matching between the reconstruction loss gradients computed on synthetic and real validation data, under certain assumptions. We then circumvent the problem of the gradient matching without real validation set by approximating it with the gradient matching between each generated sample and its neighbors. Experimental evaluations on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.

Paper Structure

This paper contains 26 sections, 21 equations, 1 figure, 6 tables, 1 algorithm.

Figures (1)

  • Figure 1: The warm-up images and the corresponding images generated by our proposed method SADAG, and the corresponding heat maps of their differences.