ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

Feice Huang; Zuliang Han; Xing Zhou; Yihuang Chen; Lifei Zhu; Haoqian Wang

ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

Feice Huang, Zuliang Han, Xing Zhou, Yihuang Chen, Lifei Zhu, Haoqian Wang

TL;DR

The paper tackles the memory and latency hurdles of diffusion transformers by introducing ConvRot, a group-wise Regular Hadamard Transform-based rotation that suppresses both row-wise and column-wise outliers. This enables training-free W4A4 inference via ConvLinear4bit, significantly reducing memory and improving speed while preserving image quality. The authors provide a theoretical foundation for regular Hadamard matrices, including a Kronecker construction, and demonstrate substantial end-to-end gains on FLUX.1-dev with competitive visual fidelity. The work positions rotation-based quantization as a practical tool for deploying large diffusion models on commodity hardware.

Abstract

Diffusion transformers have demonstrated strong capabilities in generating high-quality images. However, as model size increases, the growing memory footprint and inference latency pose significant challenges for practical deployment. Recent studies in large language models (LLMs) show that rotation-based techniques can smooth outliers and enable 4-bit quantization, but these approaches often incur substantial overhead and struggle with row-wise outliers in diffusion transformers. To address these challenges, we propose ConvRot, a group-wise rotation-based quantization method that leverages regular Hadamard transform (RHT) to suppress both row-wise and column-wise outliers while reducing complexity from quadratic to linear. Building on this, we design ConvLinear4bit, a plug-and-play module that integrates rotation, quantization, GEMM, and dequantization, enabling W4A4 inference without retraining and preserving visual quality. Experiments on FLUX.1-dev demonstrate a 2.26$\times$ speedup and 4.05$\times$ memory reduction while maintaining image fidelity. To our knowledge, this is the first application of rotation-based quantization for plug-and-play W4A4 inference in diffusion transformers.

ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

TL;DR

Abstract

ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)