TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models
Haocheng Huang, Jiaxin Chen, Jinyang Guo, Ruiyi Zhan, Yunhong Wang
TL;DR
This work addresses the high memory and computation demands of diffusion models by focusing on post-training quantization (PTQ) and its failure modes in diffusion contexts. It introduces TCAQ-DM, a three-part framework consisting of Timestep-Channel Joint Reparameterization (TCR) to stabilize activation ranges, a Dynamically Adaptive Quantizer (DAQ) to adapt to timestep-specific post-Softmax distributions, and Progressively Aligned Reconstruction (PAR) to align quantization with iterative inference. Empirical results across CIFAR-10, LSUN, and ImageNet show that TCAQ-DM outperforms prior PTQ methods, achieving comparable fidelity to full-precision models in challenging W4A4 settings and strong performance in W6A6 and W8A8 settings. The approach offers a practical path to efficient diffusion-model deployment with substantially reduced quantization error and less performance degradation at ultra-low bit-widths.
Abstract
Diffusion models have achieved remarkable success in the image and video generation tasks. Nevertheless, they often require a large amount of memory and time overhead during inference, due to the complex network architecture and considerable number of timesteps for iterative diffusion. Recently, the post-training quantization (PTQ) technique has proved a promising way to reduce the inference cost by quantizing the float-point operations to low-bit ones. However, most of them fail to tackle with the large variations in the distribution of activations across distinct channels and timesteps, as well as the inconsistent of input between quantization and inference on diffusion models, thus leaving much room for improvement. To address the above issues, we propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM). Specifically, we develop a timestep-channel joint reparameterization (TCR) module to balance the activation range along both the timesteps and channels, facilitating the successive reconstruction procedure. Subsequently, we employ a dynamically adaptive quantization (DAQ) module that mitigate the quantization error by selecting an optimal quantizer for each post-Softmax layers according to their specific types of distributions. Moreover, we present a progressively aligned reconstruction (PAR) strategy to mitigate the bias caused by the input mismatch. Extensive experiments on various benchmarks and distinct diffusion models demonstrate that the proposed method substantially outperforms the state-of-the-art approaches in most cases, especially yielding comparable FID metrics to the full precision model on CIFAR-10 in the W6A6 setting, while enabling generating available images in the W4A4 settings.
