Table of Contents
Fetching ...

AdaTSQ: Pushing the Pareto Frontier of Diffusion Transformers via Temporal-Sensitivity Quantization

Shaoqiu Zhang, Zizhong Ding, Kaicheng Yang, Junyi Wu, Xianglong Yan, Xi Li, Bingnan Duan, Jianping Fang, Yulun Zhang

TL;DR

This work tackles the high compute and memory demands of Diffusion Transformers (DiTs) by introducing AdaTSQ, a temporally aware post-training quantization framework. AdaTSQ jointly learns a Pareto-aware timestep-dynamic bit-width allocation via constrained beam search and performs Fisher-guided temporal calibration to reweight calibration data according to per-timestep sensitivity, integrating with Hessian-based weight optimization. The method defines a timestep-specific policy b_t that minimizes end-to-end reconstruction error under a target average bit-width, using α_{t,l} derived from temporal Fisher information and a risk-aware Hessian objective with H'_l = sum_t α_{t,l} (X_{t,l} X_{t,l}^T). Across four state-of-the-art DiTs for image and video, AdaTSQ outperforms SVDQuant and ViDiT-Q, enabling robust W4A4 and even W3A3 generation with substantial reductions in FLOPs and model size; the authors also provide release-ready code. These results establish a practical, scalable path to edge-enabled diffusion synthesis, broadening accessibility of high-fidelity generative models.

Abstract

Diffusion Transformers (DiTs) have emerged as the state-of-the-art backbone for high-fidelity image and video generation. However, their massive computational cost and memory footprint hinder deployment on edge devices. While post-training quantization (PTQ) has proven effective for large language models (LLMs), directly applying existing methods to DiTs yields suboptimal results due to the neglect of the unique temporal dynamics inherent in diffusion processes. In this paper, we propose AdaTSQ, a novel PTQ framework that pushes the Pareto frontier of efficiency and quality by exploiting the temporal sensitivity of DiTs. First, we propose a Pareto-aware timestep-dynamic bit-width allocation strategy. We model the quantization policy search as a constrained pathfinding problem. We utilize a beam search algorithm guided by end-to-end reconstruction error to dynamically assign layer-wise bit-widths across different timesteps. Second, we propose a Fisher-guided temporal calibration mechanism. It leverages temporal Fisher information to prioritize calibration data from highly sensitive timesteps, seamlessly integrating with Hessian-based weight optimization. Extensive experiments on four advanced DiTs (e.g., Flux-Dev, Flux-Schnell, Z-Image, and Wan2.1) demonstrate that AdaTSQ significantly outperforms state-of-the-art methods like SVDQuant and ViDiT-Q. Our code will be released at https://github.com/Qiushao-E/AdaTSQ.

AdaTSQ: Pushing the Pareto Frontier of Diffusion Transformers via Temporal-Sensitivity Quantization

TL;DR

This work tackles the high compute and memory demands of Diffusion Transformers (DiTs) by introducing AdaTSQ, a temporally aware post-training quantization framework. AdaTSQ jointly learns a Pareto-aware timestep-dynamic bit-width allocation via constrained beam search and performs Fisher-guided temporal calibration to reweight calibration data according to per-timestep sensitivity, integrating with Hessian-based weight optimization. The method defines a timestep-specific policy b_t that minimizes end-to-end reconstruction error under a target average bit-width, using α_{t,l} derived from temporal Fisher information and a risk-aware Hessian objective with H'_l = sum_t α_{t,l} (X_{t,l} X_{t,l}^T). Across four state-of-the-art DiTs for image and video, AdaTSQ outperforms SVDQuant and ViDiT-Q, enabling robust W4A4 and even W3A3 generation with substantial reductions in FLOPs and model size; the authors also provide release-ready code. These results establish a practical, scalable path to edge-enabled diffusion synthesis, broadening accessibility of high-fidelity generative models.

Abstract

Diffusion Transformers (DiTs) have emerged as the state-of-the-art backbone for high-fidelity image and video generation. However, their massive computational cost and memory footprint hinder deployment on edge devices. While post-training quantization (PTQ) has proven effective for large language models (LLMs), directly applying existing methods to DiTs yields suboptimal results due to the neglect of the unique temporal dynamics inherent in diffusion processes. In this paper, we propose AdaTSQ, a novel PTQ framework that pushes the Pareto frontier of efficiency and quality by exploiting the temporal sensitivity of DiTs. First, we propose a Pareto-aware timestep-dynamic bit-width allocation strategy. We model the quantization policy search as a constrained pathfinding problem. We utilize a beam search algorithm guided by end-to-end reconstruction error to dynamically assign layer-wise bit-widths across different timesteps. Second, we propose a Fisher-guided temporal calibration mechanism. It leverages temporal Fisher information to prioritize calibration data from highly sensitive timesteps, seamlessly integrating with Hessian-based weight optimization. Extensive experiments on four advanced DiTs (e.g., Flux-Dev, Flux-Schnell, Z-Image, and Wan2.1) demonstrate that AdaTSQ significantly outperforms state-of-the-art methods like SVDQuant and ViDiT-Q. Our code will be released at https://github.com/Qiushao-E/AdaTSQ.
Paper Structure (22 sections, 7 equations, 8 figures, 4 tables)

This paper contains 22 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Visual comparison of AdaTSQ with FP16 and SVDQuant li2024svdquant under different low-bit quantization settings. The comparison includes three text-to-image models (Flux-Dev, Flux-Schnell, Z-Image) and one text-to-video model (Wan2.1-1.3B).
  • Figure 2: Holistic Performance Comparison. Radar charts comparing AdaTSQ (Blue) with baselines across four DiT models. Axes denote normalized metrics for fidelity, alignment, and consistency. AdaTSQ consistently achieves the largest coverage area, indicating superior comprehensive performance across all modalities and sampling schedules.
  • Figure 3: Overview of the AdaTSQ framework. The upper panel illustrates the Pareto-aware Timestep-Dynamic Allocation, which employs beam search to find the optimal bit-width schedule. The lower panel depicts the Fisher-Guided Temporal Calibration, which leverages temporal sensitivity to re-weight the Hessian for risk-aware weight optimization.
  • Figure 4: Temporal Heterogeneity in DiTs. (a) Normalized Fisher Information reveals that layer sensitivity varies drastically across different phases of the denoising process (e.g., structure formation vs. texture refinement). (b) Violin plots illustrate significant shifts in activation distributions across timesteps. These observations collectively motivate our timestep-dynamic quantization strategy.
  • Figure 5: Reconstruction error of transformer.blocks.18.ff.net.2 on Flux-Dev across timesteps. Compared to standard Uniform Calibration (Blue), our Fisher-Guided Calibration (Pink) effectively suppresses quantization noise in high-sensitivity regions, resulting in a smoother and lower risk profile.
  • ...and 3 more figures