Table of Contents
Fetching ...

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

Yuhang Li, Youngeun Kim, Donghyun Lee, Souvik Kundu, Priyadarshini Panda

TL;DR

GenQ tackles the challenge of quantizing deep networks when data are scarce or restricted by employing Stable Diffusion to generate high-quality synthetic data. It introduces two regimes—data-free and data-scarce—and a two-stage filtering pipeline (energy-based and distribution-based) to ensure the synthetic data align with the real data distribution. In the data-scarce setting, GenQ further learns a token embedding to guide generation, enabling effective data synthesis with minimal real data. Across PTQ and QAT on CNNs and ViTs, GenQ achieves state-of-the-art accuracy (e.g., 4-bit QAT on ResNet-50 at 76.10% on ImageNet) and demonstrates strong transferability of synthetic data between models, significantly advancing quantization in low-data regimes.

Abstract

In the realm of deep neural network deployment, low-bit quantization presents a promising avenue for enhancing computational efficiency. However, it often hinges on the availability of training data to mitigate quantization errors, a significant challenge when data availability is scarce or restricted due to privacy or copyright concerns. Addressing this, we introduce GenQ, a novel approach employing an advanced Generative AI model to generate photorealistic, high-resolution synthetic data, overcoming the limitations of traditional methods that struggle to accurately mimic complex objects in extensive datasets like ImageNet. Our methodology is underscored by two robust filtering mechanisms designed to ensure the synthetic data closely aligns with the intrinsic characteristics of the actual training data. In case of limited data availability, the actual data is used to guide the synthetic data generation process, enhancing fidelity through the inversion of learnable token embeddings. Through rigorous experimentation, GenQ establishes new benchmarks in data-free and data-scarce quantization, significantly outperforming existing methods in accuracy and efficiency, thereby setting a new standard for quantization in low data regimes. Code is released at \url{https://github.com/Intelligent-Computing-Lab-Yale/GenQ}.

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

TL;DR

GenQ tackles the challenge of quantizing deep networks when data are scarce or restricted by employing Stable Diffusion to generate high-quality synthetic data. It introduces two regimes—data-free and data-scarce—and a two-stage filtering pipeline (energy-based and distribution-based) to ensure the synthetic data align with the real data distribution. In the data-scarce setting, GenQ further learns a token embedding to guide generation, enabling effective data synthesis with minimal real data. Across PTQ and QAT on CNNs and ViTs, GenQ achieves state-of-the-art accuracy (e.g., 4-bit QAT on ResNet-50 at 76.10% on ImageNet) and demonstrates strong transferability of synthetic data between models, significantly advancing quantization in low-data regimes.

Abstract

In the realm of deep neural network deployment, low-bit quantization presents a promising avenue for enhancing computational efficiency. However, it often hinges on the availability of training data to mitigate quantization errors, a significant challenge when data availability is scarce or restricted due to privacy or copyright concerns. Addressing this, we introduce GenQ, a novel approach employing an advanced Generative AI model to generate photorealistic, high-resolution synthetic data, overcoming the limitations of traditional methods that struggle to accurately mimic complex objects in extensive datasets like ImageNet. Our methodology is underscored by two robust filtering mechanisms designed to ensure the synthetic data closely aligns with the intrinsic characteristics of the actual training data. In case of limited data availability, the actual data is used to guide the synthetic data generation process, enhancing fidelity through the inversion of learnable token embeddings. Through rigorous experimentation, GenQ establishes new benchmarks in data-free and data-scarce quantization, significantly outperforming existing methods in accuracy and efficiency, thereby setting a new standard for quantization in low data regimes. Code is released at \url{https://github.com/Intelligent-Computing-Lab-Yale/GenQ}.
Paper Structure (17 sections, 13 equations, 6 figures, 5 tables)

This paper contains 17 sections, 13 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison of GenQ with existing methods on ImageNet. (1) Data-Free PTQ, (2) Data-Free QAT, (3) Data Generation Speed. Real* denotes using real ImageNet data in zero-shot quantization.
  • Figure 2: The overall image synthesis and filtering procedure of GenQ. ① For DFQ synthesis, we directly use the label as the text prompt, and ② we use several metrics to filter our out-of-distribution synthetic images. ③ For data-scarce synthesis, real images are used to optimize prompt $\{\mathtt{S}\}$ (\ref{['eq:fewshot_opt']}). We then generate the synthetic images with the optimized prompt.
  • Figure 3: Visualization of DF/DS GenQ and existing data synthesis method for quantization in low data regimes.
  • Figure 4: Evaluation of data-scarce GenQ .
  • Figure 5: Accuracy vs # syn. data with data-free PTQ using Genie-M.
  • ...and 1 more figures