Table of Contents
Fetching ...

Data Generation for Hardware-Friendly Post-Training Quantization

Lior Dikstein, Ariel Lapid, Arnon Netzer, Hai Victor Habi

TL;DR

This work tackles the challenge of zero-shot post-training quantization under privacy constraints by analyzing BN-based data generation and identifying three critical gaps: aggregation scope, data augmentation handling, and output-layer distribution mismatch. It introduces DGH, a method that unifies global BN-statistics aggregation, augmentation-aware preprocessing with image priors, and an output distribution stretching loss to align synthetic data with real data across all quantized layers. Empirical results on ImageNet and COCO show that DGH delivers up to 30% improvements in hardware-friendly ZSQ and can match real-data performance in many settings, across classification and object detection; the approach is compatible with multiple PTQ algorithms. The work also provides ablations to isolate the impact of each component and demonstrates broad applicability, with code and integration into open-source toolkits. This advances practical deployment of fully quantized models on resource-constrained hardware by enabling more accurate calibration with synthetic data.

Abstract

Zero-shot quantization (ZSQ) using synthetic data is a key approach for post-training quantization (PTQ) under privacy and security constraints. However, existing data generation methods often struggle to effectively generate data suitable for hardware-friendly quantization, where all model layers are quantized. We analyze existing data generation methods based on batch normalization (BN) matching and identify several gaps between synthetic and real data: 1) Current generation algorithms do not optimize the entire synthetic dataset simultaneously; 2) Data augmentations applied during training are often overlooked; and 3) A distribution shift occurs in the final model layers due to the absence of BN in those layers. These gaps negatively impact ZSQ performance, particularly in hardware-friendly quantization scenarios. In this work, we propose Data Generation for Hardware-friendly quantization (DGH), a novel method that addresses these gaps. DGH jointly optimizes all generated images, regardless of the image set size or GPU memory constraints. To address data augmentation mismatches, DGH includes a preprocessing stage that mimics the augmentation process and enhances image quality by incorporating natural image priors. Finally, we propose a new distribution-stretching loss that aligns the support of the feature map distribution between real and synthetic data. This loss is applied to the model's output and can be adapted to various tasks. DGH demonstrates significant improvements in quantization performance across multiple tasks, achieving up to a 30% increase in accuracy for hardware-friendly ZSQ in both classification and object detection, often performing on par with real data.

Data Generation for Hardware-Friendly Post-Training Quantization

TL;DR

This work tackles the challenge of zero-shot post-training quantization under privacy constraints by analyzing BN-based data generation and identifying three critical gaps: aggregation scope, data augmentation handling, and output-layer distribution mismatch. It introduces DGH, a method that unifies global BN-statistics aggregation, augmentation-aware preprocessing with image priors, and an output distribution stretching loss to align synthetic data with real data across all quantized layers. Empirical results on ImageNet and COCO show that DGH delivers up to 30% improvements in hardware-friendly ZSQ and can match real-data performance in many settings, across classification and object detection; the approach is compatible with multiple PTQ algorithms. The work also provides ablations to isolate the impact of each component and demonstrates broad applicability, with code and integration into open-source toolkits. This advances practical deployment of fully quantized models on resource-constrained hardware by enabling more accurate calibration with synthetic data.

Abstract

Zero-shot quantization (ZSQ) using synthetic data is a key approach for post-training quantization (PTQ) under privacy and security constraints. However, existing data generation methods often struggle to effectively generate data suitable for hardware-friendly quantization, where all model layers are quantized. We analyze existing data generation methods based on batch normalization (BN) matching and identify several gaps between synthetic and real data: 1) Current generation algorithms do not optimize the entire synthetic dataset simultaneously; 2) Data augmentations applied during training are often overlooked; and 3) A distribution shift occurs in the final model layers due to the absence of BN in those layers. These gaps negatively impact ZSQ performance, particularly in hardware-friendly quantization scenarios. In this work, we propose Data Generation for Hardware-friendly quantization (DGH), a novel method that addresses these gaps. DGH jointly optimizes all generated images, regardless of the image set size or GPU memory constraints. To address data augmentation mismatches, DGH includes a preprocessing stage that mimics the augmentation process and enhances image quality by incorporating natural image priors. Finally, we propose a new distribution-stretching loss that aligns the support of the feature map distribution between real and synthetic data. This loss is applied to the model's output and can be adapted to various tasks. DGH demonstrates significant improvements in quantization performance across multiple tasks, achieving up to a 30% increase in accuracy for hardware-friendly ZSQ in both classification and object detection, often performing on par with real data.

Paper Structure

This paper contains 25 sections, 12 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: The blue curve represents the Top-1 accuracy on the ImageNet-1k validation set of ResNet-18 quantized to W3A8-bit precision using AdaRound nagel2020_adaround with $1024$ generated images, where each batch is optimized separately according to \ref{['eq:bn_loss']}. The x-axis denotes the statistics aggregation scope (batch size), $K$, used in the image generation process. The red star indicates the result of our proposed aggregation algorithm, which utilizes the statistics of all images collectively.
  • Figure 2: Two-dimensional t-SNE van2008visualizing visualization of ResNet-18 embeddings comparing real images (blue) with those generated with global optimization (red) and ZeroQ (green).
  • Figure 3: A layer-wise comparison of the MSE for mean (left) and standard deviation (right) of model activations relative to BN statistics in a ResNet-18 model. The blue line indicates MSE values calculated using augmented real data, while the orange line represents values from non-augmented real data. Each variant runs 1024 images to calculate the respective activation statistics, with MSE averaged over five experiments.
  • Figure 4: A comparison between the output distributions of a ResNet-18 model when inferring three different data sources: real images, synthetic images generated using only the BNS loss, and synthetic images enhanced with our proposed ODSL.
  • Figure 5: The figure represents the Top-1 accuracy on the ImageNet-1k validation set of ResNet-18 quantized to W4A4-bit precision using AdaRound with 1024 generated images. The blue curve represents optimizing with ODSL, while the orange curve represents optimizing without ODSL. The x-axis denotes the statistics aggregation scope (batch size) used in the image generation process, where each batch is optimized separately according to \ref{['eq:bn_loss']}. The red stars indicate the results of using DGH's aggregation algorithm.
  • ...and 4 more figures