Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Hyogon Ryu, Seohyun Lim, Hyunjung Shim
TL;DR
TuneQDM addresses the resource bottlenecks of fine-tuning billion-parameter diffusion models by introducing memory-efficient techniques for quantized weights. It decomposes quantization scales into inter- and intra-channel components and assigns distinct scales per timestep interval, capturing both weight-update patterns and timestep roles without expanding parameter count. Empirical results show TuneQDM achieving high subject and prompt fidelity close to full-precision models while using significantly less memory, outperforming the baseline PEQA approach, especially in 4-bit settings. This work enables practical personalization and deployment of diffusion models on resource-constrained platforms by reducing memory and computation without sacrificing performance.
Abstract
The emergence of billion-parameter diffusion models such as Stable Diffusion XL, Imagen, and DALL-E 3 has significantly propelled the domain of generative AI. However, their large-scale architecture presents challenges in fine-tuning and deployment due to high resource demands and slow inference speed. This paper explores the relatively unexplored yet promising realm of fine-tuning quantized diffusion models. Our analysis revealed that the baseline neglects the distinct patterns in model weights and the different roles throughout time steps when finetuning the diffusion model. To address these limitations, we introduce a novel memory-efficient fine-tuning method specifically designed for quantized diffusion models, dubbed TuneQDM. Our approach introduces quantization scales as separable functions to consider inter-channel weight patterns. Then, it optimizes these scales in a timestep-specific manner for effective reflection of the role of each time step. TuneQDM achieves performance on par with its full-precision counterpart while simultaneously offering significant memory efficiency. Experimental results demonstrate that our method consistently outperforms the baseline in both single-/multi-subject generations, exhibiting high subject fidelity and prompt fidelity comparable to the full precision model.
