Table of Contents
Fetching ...

QNCD: Quantization Noise Correction for Diffusion Models

Huanpeng Chu, Wei Wu, Chengjie Zang, Kun Yuan

TL;DR

A unified Quantization Noise Correction Scheme (QNCD), aimed at diminishing quantization noise throughout the sampling process, and outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4).

Abstract

Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our study introduces a unified Quantization Noise Correction Scheme (QNCD), aimed at minishing quantization noise throughout the sampling process. We identify two primary quantization challenges: intra and inter quantization noise. Intra quantization noise, mainly exacerbated by embeddings in the resblock module, extends activation quantization ranges, increasing disturbances in each single denosing step. Besides, inter quantization noise stems from cumulative quantization deviations across the entire denoising process, altering data distributions step-by-step. QNCD combats these through embedding-derived feature smoothing for eliminating intra quantization noise and an effective runtime noise estimatiation module for dynamicly filtering inter quantization noise. Extensive experiments demonstrate that our method outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4). Code is available at: https://github.com/huanpengchu/QNCD

QNCD: Quantization Noise Correction for Diffusion Models

TL;DR

A unified Quantization Noise Correction Scheme (QNCD), aimed at diminishing quantization noise throughout the sampling process, and outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4).

Abstract

Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our study introduces a unified Quantization Noise Correction Scheme (QNCD), aimed at minishing quantization noise throughout the sampling process. We identify two primary quantization challenges: intra and inter quantization noise. Intra quantization noise, mainly exacerbated by embeddings in the resblock module, extends activation quantization ranges, increasing disturbances in each single denosing step. Besides, inter quantization noise stems from cumulative quantization deviations across the entire denoising process, altering data distributions step-by-step. QNCD combats these through embedding-derived feature smoothing for eliminating intra quantization noise and an effective runtime noise estimatiation module for dynamicly filtering inter quantization noise. Extensive experiments demonstrate that our method outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4). Code is available at: https://github.com/huanpengchu/QNCD
Paper Structure (23 sections, 13 equations, 8 figures, 4 tables)

This paper contains 23 sections, 13 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Comparison of metrics for denoising processes w.r.t.timestep ($t$). LPIPS Distance between the quantized Stable Diffusion model (W8A8) outputs and its floating-point counterpart on MS-COCO, along with their respective CLIP scores and FID (Fréchet Inception Distance) scores.
  • Figure 2: (a) shows the similarity of individual layer features across the entire UNet model during a single sampling step, illuminating that quantization noise primarily arises from the incorporation of embeddings. (b) illustrates the distribution of activations before and after the incorperation of embedding (within the last Resblock). When combined with embeddings, outliers in features are amplified, which can be efficiently mitigated using our smoothing factor.
  • Figure 3: (a) demonstrates the mean and std of outputs across all time steps, while (b) visualizes the output distribution at a specific step , revealing a substantial discrepancy between the output of the quantized diffusion model (Orange) and that of the full-precision model (gray). The gray dashed line in (a) represents when our noise estimation module is running.
  • Figure 4: Visualization of $scale_t$ and smoothing factor $S$ in heatmap representation. For ease of visualization, we select only 12 values from the 512-dimensional data.
  • Figure 5: The pipeline of our proposed method. We initiate by saving the accurate embedding and deduce the smoothing factor $S$ in the calibration stage. During the inference stage, the pre-computed $S$ is applied to smooth the features $h_t$, thereby the intra quantization noise is diminished. Besides, at periodic intervals, the inter quantization noise ${q_{\theta}({\widetilde{x}}_{t},t)}$ is estimated through our noise estimation module, which is filter out in output distribution.
  • ...and 3 more figures