Table of Contents
Fetching ...

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

Yushi Huang, Ruihao Gong, Jing Liu, Tianlong Chen, Xianglong Liu

TL;DR

The paper tackles the heavy compute burden of diffusion models by presenting Temporal Feature Maintenance Quantization (TFMQ-DM), a training-free PTQ framework that explicitly preserves time-step related temporal features. It introduces a Temporal Information Block, along with Temporal Information Aware Reconstruction (TIAR) and Finite Set Calibration (FSC), to maintain the temporal integrity of embeddings across the finite time-step set $t \in \{1, \ldots, T\}$. Empirical results demonstrate that 4-bit weight quantization can achieve near full-precision performance with substantial speedups (e.g., around 2× faster quantization) on multiple datasets and diffusion architectures, outperforming prior PTQ methods. The work enables more practical deployment of diffusion models by reducing memory and latency without retraining, with broader implications for real-time or large-scale image generation tasks.

Abstract

The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. Usually, $t$ from the finite set $\{1, \ldots, T\}$ is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step $t$ and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by $2.0 \times$ on LSUN-Bedrooms $256 \times 256$ compared to previous works. Our code is publicly available at https://github.com/ModelTC/TFMQ-DM.

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

TL;DR

The paper tackles the heavy compute burden of diffusion models by presenting Temporal Feature Maintenance Quantization (TFMQ-DM), a training-free PTQ framework that explicitly preserves time-step related temporal features. It introduces a Temporal Information Block, along with Temporal Information Aware Reconstruction (TIAR) and Finite Set Calibration (FSC), to maintain the temporal integrity of embeddings across the finite time-step set . Empirical results demonstrate that 4-bit weight quantization can achieve near full-precision performance with substantial speedups (e.g., around 2× faster quantization) on multiple datasets and diffusion architectures, outperforming prior PTQ methods. The work enables more practical deployment of diffusion models by reducing memory and latency without retraining, with broader implications for real-time or large-scale image generation tasks.

Abstract

The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step to achieve satisfactory multi-round denoising. Usually, from the finite set is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by on LSUN-Bedrooms compared to previous works. Our code is publicly available at https://github.com/ModelTC/TFMQ-DM.
Paper Structure (22 sections, 11 equations, 12 figures, 11 tables)

This paper contains 22 sections, 11 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Overview of the proposed Temporal Feature Maintenance Quantization. (a) Temporal Feature $\mathbf{emb}_{t,i}$, belonging to a finite set representing temporal information, has been overlooked in previous works due to inappropriate reconstruction targets (box with a solid line). (b) This oversight leads to a severe disturbance for $\mathbf{emb}_{t,i}$ and results in the mismatch of crucial temporal information for the diffusion model's generation, causing a deviation in the denoising trajectory and a significant drop in accuracy. (c) Based on these analyses, we introduce a Temporal Information Block that exclusively correlates with the time-step $t$. Leveraging this $\mathbf{x}_t$-unrelated block, we enable Temporal Information Aware Reconstruction and Finite Set Calibration (utilizing the finite number of $t$). This approach achieves the maintenance of temporal features and yields state-of-the-art results.
  • Figure 2: (Left) Temporal feature disturbance. The inflection points serve as indicators of temporal feature errors at different time-steps, and they highlight the significant phenomenon of temporal feature disturbance. (Right) Temporal information mismatch. The coordinates of the inflection points on the blue curve can denoted as $(t, t+\delta_{t, i})$. It indicates $\mathbf{emb}_{t+\delta_{t, i}, i}$ exhibits the highest similarity with $\widehat{\mathbf{emb}_{t,i}}$.
  • Figure 3: Denoising process of full-precision (Upper) and w4a8 quantized (Lower) Stable-Diffusion $(T = 50)$ under the same experiment settings and prompt: A man in the snow on a snow board. We represent $\{\mathbf{emb}_{t,i}\}_{i = 0, \ldots, n}$ and $\{\widehat{\mathbf{emb}_{t,i}}\}_{i = 0, \ldots, n}$ as $\mathbf{EMB}_t$ and $\widehat{\mathbf{EMB}_t}$, respectively. Additionally, we denote $\widehat{\mathbf{x}_t}$ as $\mathbf{x}_t$ in the context of the quantized diffusion model. It is noteworthy that, in the quantized model employed here, to showcase the impact of temporal features, only the layers in Temporal Information Block are quantized and the components unrelated to the generation of temporal features are maintained in full precision.
  • Figure 4: Temporal feature errors across different PTQ methods.
  • Figure I: Activation ranges within sampling data-unrelated components for LDM-4 on LSUN-Bedrooms $256\times 256$ with 50 denoising steps. We randomly select 4 linear or convolutional layers' activations in these components to demonstrate the range variation.
  • ...and 7 more figures