Table of Contents
Fetching ...

PTQD: Accurate Post-Training Quantization for Diffusion Models

Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

TL;DR

Diffusion models incur high inference cost, and naive PTQ degrades sample quality due to mean/variance distortions and shrinking SNR. PTQD introduces a unified noise model that disentangles quantization noise into correlated and uncorrelated parts, corrects the correlated component via a learned coefficient k, and absorbs uncorrelated variance through bias correction and variance schedule calibration. A step-aware mixed-precision strategy adaptively assigns bitwidths per denoising step to maintain high SNR throughout sampling. Empirically, PTQD achieves near full-precision quality on ImageNet 256×256 with up to ~20x bit-ops savings, and consistently outperforms prior PTQ methods across both class-conditioned and unconditional generation tasks, while offering deployment speedups.

Abstract

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at https://github.com/ziplab/PTQD.

PTQD: Accurate Post-Training Quantization for Diffusion Models

TL;DR

Diffusion models incur high inference cost, and naive PTQ degrades sample quality due to mean/variance distortions and shrinking SNR. PTQD introduces a unified noise model that disentangles quantization noise into correlated and uncorrelated parts, corrects the correlated component via a learned coefficient k, and absorbs uncorrelated variance through bias correction and variance schedule calibration. A step-aware mixed-precision strategy adaptively assigns bitwidths per denoising step to maintain high SNR throughout sampling. Empirically, PTQD achieves near full-precision quality on ImageNet 256×256 with up to ~20x bit-ops savings, and consistently outperforms prior PTQ methods across both class-conditioned and unconditional generation tasks, while offering deployment speedups.

Abstract

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at https://github.com/ziplab/PTQD.
Paper Structure (28 sections, 21 equations, 14 figures, 10 tables, 1 algorithm)

This paper contains 28 sections, 21 equations, 14 figures, 10 tables, 1 algorithm.

Figures (14)

  • Figure 1: The comparisons of samples generated by Q-Diffusion li2023qdiffusion, PTQD and full-precision LDM-4 rombach2021highresolutionLDM on CelebA-HQ $256\times256$ dataset. Here, W$x$A$y$ indicates the weights are quantized to $x$-bit while the activations are quantized to $y$-bit.
  • Figure 2: The correlation between the quantization noise (Y-axis) and the output of the full-precision noise prediction network (X-axis). Each data point on the plot corresponds to specific entries within these vectors. Data were collected by generating samples with $4$-bit LDM-8 rombach2021highresolutionLDM for $200$ steps on LSUN-Churches yu2015lsun.
  • Figure 3: The distribution of uncorrelated quantization noise collected from W4A8 LDM-4 on LSUN-Bedrooms $256\times256$ dataset, where the x-axis represents the range of values and the y-axis is the frequency of values.
  • Figure 4: Comparison of the signal-to-noise-ratio (SNR) in each step of LDM-4 on LSUN-Bedrooms across various bitwidths.
  • Figure A: The result of normal test for residual quantization noise across various steps. Data is collected from W4A4 LDM-8 on LSUN-Churches.
  • ...and 9 more figures