Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu, Yan Wen, Xudong Tan, Tao Chen
TL;DR
This work tackles the challenge of 4-bit floating-point quantization for diffusion models, a domain where traditional INT quantization and PTQ-based fine-tuning struggle. It introduces Mixup-Sign Floating Point Quantization (MSFP) to handle activation asymmetry by applying unsigned FP with a zero point to anomalous-activation layers, while retaining signed FP for normal layers, and it leverages a timestep-aware LoRA router (TALoRA) to allocate multiple LoRAs across diffusion timesteps. To align fine-tuning with the actual quantization impact, the method includes a denoising-factor aligned loss (DFA) that scales the loss by the denoising factor $\gamma_t$. Through extensive experiments on DDIM and LDM pipelines across multiple datasets, the approach achieves state-of-the-art 4-bit FP diffusion model performance, with 6-bit results closely approaching full precision and clear gains over PTQ baselines, signaling practical viability for efficient diffusion-model deployment.
Abstract
Model quantization reduces the bit-width of weights and activations, improving memory efficiency and inference speed in diffusion models. However, achieving 4-bit quantization remains challenging. Existing methods, primarily based on integer quantization and post-training quantization fine-tuning, struggle with inconsistent performance. Inspired by the success of floating-point (FP) quantization in large language models, we explore low-bit FP quantization for diffusion models and identify key challenges: the failure of signed FP quantization to handle asymmetric activation distributions, the insufficient consideration of temporal complexity in the denoising process during fine-tuning, and the misalignment between fine-tuning loss and quantization error. To address these challenges, we propose the mixup-sign floating-point quantization (MSFP) framework, first introducing unsigned FP quantization in model quantization, along with timestep-aware LoRA (TALoRA) and denoising-factor loss alignment (DFA), which ensure precise and stable fine-tuning. Extensive experiments show that we are the first to achieve superior performance in 4-bit FP quantization for diffusion models, outperforming existing PTQ fine-tuning methods in 4-bit INT quantization.
