Table of Contents
Fetching ...

DVD-Quant: Data-free Video Diffusion Transformers Quantization

Zhiteng Li, Hanxuan Li, Junyi Wu, Kai Liu, Haotong Qin, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

TL;DR

Diffusion Transformers enable high-fidelity video generation but are hindered by heavy compute and memory demands. DVD-Quant delivers a data-free PTQ framework for Video DiTs by combining Bounded-init Grid Refinement, Auto-scaling Rotated Quantization, and δ-Guided Bit Switching to reduce quantization error without calibration data while adapting bit-width across timesteps. The approach achieves approximately $2\times$ speedup over full-precision baselines, maintains visual fidelity, and uniquely enables W4A4 PTQ for video generation; it also proves compatible with cache-based acceleration like TeaCache. These results advance practical deployment of high-quality video diffusion models on resource-constrained hardware, expanding accessibility and real-time applicability.

Abstract

Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on computation-heavy and inflexible calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Bounded-init Grid Refinement (BGR) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $δ$-Guided Bit Switching ($δ$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$\times$ speedup over full-precision baselines on advanced DiT models while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.

DVD-Quant: Data-free Video Diffusion Transformers Quantization

TL;DR

Diffusion Transformers enable high-fidelity video generation but are hindered by heavy compute and memory demands. DVD-Quant delivers a data-free PTQ framework for Video DiTs by combining Bounded-init Grid Refinement, Auto-scaling Rotated Quantization, and δ-Guided Bit Switching to reduce quantization error without calibration data while adapting bit-width across timesteps. The approach achieves approximately speedup over full-precision baselines, maintains visual fidelity, and uniquely enables W4A4 PTQ for video generation; it also proves compatible with cache-based acceleration like TeaCache. These results advance practical deployment of high-quality video diffusion models on resource-constrained hardware, expanding accessibility and real-time applicability.

Abstract

Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on computation-heavy and inflexible calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Bounded-init Grid Refinement (BGR) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) -Guided Bit Switching (-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2 speedup over full-precision baselines on advanced DiT models while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.

Paper Structure

This paper contains 15 sections, 9 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: $\texttt{$\mathcal{DVD}$-Quant}$ generates high-fidelity videos under both W4A6 (mixed-precision) and W4A4 settings, while baseline methods fail under low-bit activation quantization. $\texttt{$\mathcal{DVD}$-Quant}$ remains effective even in such extreme scenarios.
  • Figure 2: Overview of $\texttt{$\mathcal{DVD}$-Quant}$. Bounded-init Grid Refinement and Auto-scaling Rotated Quantization are data-free methods designed to reduce quantization errors for weights and activations, respectively. $\delta$-Guided Bit Switching adaptively assigns bit-widths to different time steps.
  • Figure 3: Quantization error comparison: MinMax minmax vs. BGR across layers on HunyuanVideo hunyuan.
  • Figure 4: Visualization of activation distribution before and after rotation.
  • Figure 5: Visual comparisons between $\texttt{$\mathcal{DVD}$-Quant}$ and BF16 baseline hunyuan, alongside with quantization methods: MinMax minmax, SmoothQuant xiao2023smoothquant, Quarot ashkboos2024quarot and ViDiT-Q viditq on HunyuanVideo. * indicates 8 for baselines (W4A8) and 6 for $\texttt{$\mathcal{DVD}$-Quant}$ (W4A6, mixed-precision).
  • ...and 1 more figures