Table of Contents
Fetching ...

TR-DQ: Time-Rotation Diffusion Quantization

Yihua Shao, Deyang Lin, Fanhu Zeng, Minxi Yan, Muyang Zhang, Siyu Chen, Yuxuan Fan, Ziyang Yan, Haozhe Wang, Jingcai Guo, Yan Wang, Haotong Qin, Hao Tang

TL;DR

TR-DQ tackles diffusion-model quantization by introducing time-step aware rotation-based quantization that dynamically adapts rotations, diagonals, and permutations per time step to smooth activations and shift challenging dynamics into weights, formalized with time-dependent matrices $\mathbf{R}_t$, $\boldsymbol{\Delta}_t$, and $\mathbf{P}_t$. It further leverages Attention-Sharing to exploit high similarity between CFG and non-CFG attention blocks, reducing computation without large quality loss. The approach achieves state-of-the-art performance on image and video generation after quantization, delivering a practical speedup of $1.38$–$1.89$× and memory reduction of $1.97$–$2.58$× compared to existing quantization methods. By enabling finer-grained, time-aware quantization and targeted attention sharing, TR-DQ facilitates efficient deployment of diffusion models on resource-constrained hardware while maintaining high visual fidelity and temporal coherence.$

Abstract

Diffusion models have been widely adopted in image and video generation. However, their complex network architecture leads to high inference overhead for its generation process. Existing diffusion quantization methods primarily focus on the quantization of the model structure while ignoring the impact of time-steps variation during sampling. At the same time, most current approaches fail to account for significant activations that cannot be eliminated, resulting in substantial performance degradation after quantization. To address these issues, we propose Time-Rotation Diffusion Quantization (TR-DQ), a novel quantization method incorporating time-step and rotation-based optimization. TR-DQ first divides the sampling process based on time-steps and applies a rotation matrix to smooth activations and weights dynamically. For different time-steps, a dedicated hyperparameter is introduced for adaptive timing modeling, which enables dynamic quantization across different time steps. Additionally, we also explore the compression potential of Classifier-Free Guidance (CFG-wise) to establish a foundation for subsequent work. TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1.38-1.89x speedup and 1.97-2.58x memory reduction in inference compared to existing quantization methods.

TR-DQ: Time-Rotation Diffusion Quantization

TL;DR

TR-DQ tackles diffusion-model quantization by introducing time-step aware rotation-based quantization that dynamically adapts rotations, diagonals, and permutations per time step to smooth activations and shift challenging dynamics into weights, formalized with time-dependent matrices , , and . It further leverages Attention-Sharing to exploit high similarity between CFG and non-CFG attention blocks, reducing computation without large quality loss. The approach achieves state-of-the-art performance on image and video generation after quantization, delivering a practical speedup of × and memory reduction of × compared to existing quantization methods. By enabling finer-grained, time-aware quantization and targeted attention sharing, TR-DQ facilitates efficient deployment of diffusion models on resource-constrained hardware while maintaining high visual fidelity and temporal coherence.$

Abstract

Diffusion models have been widely adopted in image and video generation. However, their complex network architecture leads to high inference overhead for its generation process. Existing diffusion quantization methods primarily focus on the quantization of the model structure while ignoring the impact of time-steps variation during sampling. At the same time, most current approaches fail to account for significant activations that cannot be eliminated, resulting in substantial performance degradation after quantization. To address these issues, we propose Time-Rotation Diffusion Quantization (TR-DQ), a novel quantization method incorporating time-step and rotation-based optimization. TR-DQ first divides the sampling process based on time-steps and applies a rotation matrix to smooth activations and weights dynamically. For different time-steps, a dedicated hyperparameter is introduced for adaptive timing modeling, which enables dynamic quantization across different time steps. Additionally, we also explore the compression potential of Classifier-Free Guidance (CFG-wise) to establish a foundation for subsequent work. TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1.38-1.89x speedup and 1.97-2.58x memory reduction in inference compared to existing quantization methods.

Paper Structure

This paper contains 14 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Contribution of TR-DQ. (a) TR-DQ solves the massive activation vs. time-steps activation distribution in diffusion and makes the first attempt to incorporate CFG for compression. (b) TR-DQ compressed model results are more powerful compared to the current SOTA in image and video generation. (c) The TR-DQ compressed model lantency and memory are all significantly reduced.
  • Figure 2: Main pipeline of TR-DQ. TR-DQ uses a rotation matrix for the activations to reduce the massive outliers, and also rearranges the weights to be a smoother and easier to quantify model overall. For CFG and non-CFG with high similarity of attention TR-DQ performs weight sharing, which further reduces the computational cost.
  • Figure 3: Effect of Time-Rotation on Data Distribution. Data distribution with Time-Rotation is more smoother. Where $X$ is the activations and $W$ is the weights.
  • Figure 4: Heat maps of multi-head self-attention under conditional and unconditional situations. Each square reflects the similarity between the two. The redder the square, the higher the similarity; the bluer the square, the lower the similarity.
  • Figure 5: Visualisation of image. TR-DQ and the model adding weight sharing generated better quality images.
  • ...and 1 more figures