Table of Contents
Fetching ...

Timestep-Aware Correction for Quantized Diffusion Models

Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong Liu, Jingdong Wang

TL;DR

Diffusion models deliver high-fidelity images but are computationally intensive, and post-training quantization (PTQ) introduces error accumulation that degrades quality. The authors propose TAC-Diffusion, a timestep-aware correction framework that dynamically mitigates quantization errors during diffusion denoising via Noise Estimation Reconstruction (NER) and Input Bias Correction (IBC), without additional training. They derive a convex, closed-form solution for per-timestep correction coefficients and use a masked loss with relative distortion (rQNSR) to reconstruct noise estimates, while correcting input bias per timestep. Extensive experiments across CIFAR-10, LSUN, and Stable Diffusion show TAC-Diffusion achieves state-of-the-art performance among low-precision diffusion models, significantly narrowing the gap with full-precision models and enabling efficient deployment on resource-constrained devices.

Abstract

Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-constrained platforms like mobile devices. Existing post-training quantization (PTQ) methods have managed to compress diffusion models to low precision. Nevertheless, due to the iterative nature of diffusion models, quantization errors tend to accumulate throughout the generation process. This accumulation of error becomes particularly problematic in low-precision scenarios, leading to significant distortions in the generated images. We attribute this accumulation issue to two main causes: error propagation and exposure bias. To address these problems, we propose a timestep-aware correction method for quantized diffusion model, which dynamically corrects the quantization error. By leveraging the proposed method in low-precision diffusion models, substantial enhancement of output quality could be achieved with only negligible computation overhead. Extensive experiments underscore our method's effectiveness and generalizability. By employing the proposed correction strategy, we achieve state-of-the-art (SOTA) results on low-precision models.

Timestep-Aware Correction for Quantized Diffusion Models

TL;DR

Diffusion models deliver high-fidelity images but are computationally intensive, and post-training quantization (PTQ) introduces error accumulation that degrades quality. The authors propose TAC-Diffusion, a timestep-aware correction framework that dynamically mitigates quantization errors during diffusion denoising via Noise Estimation Reconstruction (NER) and Input Bias Correction (IBC), without additional training. They derive a convex, closed-form solution for per-timestep correction coefficients and use a masked loss with relative distortion (rQNSR) to reconstruct noise estimates, while correcting input bias per timestep. Extensive experiments across CIFAR-10, LSUN, and Stable Diffusion show TAC-Diffusion achieves state-of-the-art performance among low-precision diffusion models, significantly narrowing the gap with full-precision models and enabling efficient deployment on resource-constrained devices.

Abstract

Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-constrained platforms like mobile devices. Existing post-training quantization (PTQ) methods have managed to compress diffusion models to low precision. Nevertheless, due to the iterative nature of diffusion models, quantization errors tend to accumulate throughout the generation process. This accumulation of error becomes particularly problematic in low-precision scenarios, leading to significant distortions in the generated images. We attribute this accumulation issue to two main causes: error propagation and exposure bias. To address these problems, we propose a timestep-aware correction method for quantized diffusion model, which dynamically corrects the quantization error. By leveraging the proposed method in low-precision diffusion models, substantial enhancement of output quality could be achieved with only negligible computation overhead. Extensive experiments underscore our method's effectiveness and generalizability. By employing the proposed correction strategy, we achieve state-of-the-art (SOTA) results on low-precision models.
Paper Structure (26 sections, 24 equations, 13 figures, 6 tables, 2 algorithms)

This paper contains 26 sections, 24 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: (Upper) Illustration of error accumulation in diffusion models. Inherent to the design of DMs, discrepancy in the input not only propagates to the next timestep but also leads to significant discrepancy in noise estimation, due to exposure bias. This cascading effect amplifies errors in subsequent stages, cumulatively impairing the quality of the final output. (Lower-left) To address this challenge, our method focuses on reducing the accumulated error $\Delta\mathbf{x}_{0}$ through two key strategies: 1) minimizing the discrepancy $\Delta\mathbf{x}_{t-1}$ at each timestep $t < T$, and 2) decomposing $\Delta\mathbf{x}_{t-1}$ into two distinct components—the input discrepancy $\Delta\mathbf{x}_t$ and the noise estimation discrepancy $\Delta \boldsymbol{\epsilon}_t$—and rectifying them separately to enhance error correction.
  • Figure 2: 256 × 256 unconditional image generation results with W3A8 LDM-8 rombach2022high, on LSUN-Church dataset. Red boxes highlight areas where our model preserves intricate details more effectively. Blue boxes show regions where our model maintains structural accuracy, closely resembling the full-precision model's output.
  • Figure 3: Comparative analysis of 256 $\times$ 256 unconditional images generation with 200 steps latent diffusion model, on LSUN-Bedroom dataset. Red boxes showcase our model's enhanced detail preservation, and blue boxes emphasize its superior structural accuracy.
  • Figure 4: Comparison between different correction strategies in 256 $\times$ 256 unconditional generation on LSUN-Church with W3A8 500 steps LDM-8
  • Figure 5: The activation distribution of multiple layers in full-precision LDM-8 on LSUN-Church. The distribution varies during the denoising process. This dynamic nature of activation is the main source of clipping error in low-precision diffusion model.
  • ...and 8 more figures