AdaTSQ: Pushing the Pareto Frontier of Diffusion Transformers via Temporal-Sensitivity Quantization
Shaoqiu Zhang, Zizhong Ding, Kaicheng Yang, Junyi Wu, Xianglong Yan, Xi Li, Bingnan Duan, Jianping Fang, Yulun Zhang
TL;DR
This work tackles the high compute and memory demands of Diffusion Transformers (DiTs) by introducing AdaTSQ, a temporally aware post-training quantization framework. AdaTSQ jointly learns a Pareto-aware timestep-dynamic bit-width allocation via constrained beam search and performs Fisher-guided temporal calibration to reweight calibration data according to per-timestep sensitivity, integrating with Hessian-based weight optimization. The method defines a timestep-specific policy b_t that minimizes end-to-end reconstruction error under a target average bit-width, using α_{t,l} derived from temporal Fisher information and a risk-aware Hessian objective with H'_l = sum_t α_{t,l} (X_{t,l} X_{t,l}^T). Across four state-of-the-art DiTs for image and video, AdaTSQ outperforms SVDQuant and ViDiT-Q, enabling robust W4A4 and even W3A3 generation with substantial reductions in FLOPs and model size; the authors also provide release-ready code. These results establish a practical, scalable path to edge-enabled diffusion synthesis, broadening accessibility of high-fidelity generative models.
Abstract
Diffusion Transformers (DiTs) have emerged as the state-of-the-art backbone for high-fidelity image and video generation. However, their massive computational cost and memory footprint hinder deployment on edge devices. While post-training quantization (PTQ) has proven effective for large language models (LLMs), directly applying existing methods to DiTs yields suboptimal results due to the neglect of the unique temporal dynamics inherent in diffusion processes. In this paper, we propose AdaTSQ, a novel PTQ framework that pushes the Pareto frontier of efficiency and quality by exploiting the temporal sensitivity of DiTs. First, we propose a Pareto-aware timestep-dynamic bit-width allocation strategy. We model the quantization policy search as a constrained pathfinding problem. We utilize a beam search algorithm guided by end-to-end reconstruction error to dynamically assign layer-wise bit-widths across different timesteps. Second, we propose a Fisher-guided temporal calibration mechanism. It leverages temporal Fisher information to prioritize calibration data from highly sensitive timesteps, seamlessly integrating with Hessian-based weight optimization. Extensive experiments on four advanced DiTs (e.g., Flux-Dev, Flux-Schnell, Z-Image, and Wan2.1) demonstrate that AdaTSQ significantly outperforms state-of-the-art methods like SVDQuant and ViDiT-Q. Our code will be released at https://github.com/Qiushao-E/AdaTSQ.
