Table of Contents
Fetching ...

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

Hyogon Ryu, Seohyun Lim, Hyunjung Shim

TL;DR

TuneQDM addresses the resource bottlenecks of fine-tuning billion-parameter diffusion models by introducing memory-efficient techniques for quantized weights. It decomposes quantization scales into inter- and intra-channel components and assigns distinct scales per timestep interval, capturing both weight-update patterns and timestep roles without expanding parameter count. Empirical results show TuneQDM achieving high subject and prompt fidelity close to full-precision models while using significantly less memory, outperforming the baseline PEQA approach, especially in 4-bit settings. This work enables practical personalization and deployment of diffusion models on resource-constrained platforms by reducing memory and computation without sacrificing performance.

Abstract

The emergence of billion-parameter diffusion models such as Stable Diffusion XL, Imagen, and DALL-E 3 has significantly propelled the domain of generative AI. However, their large-scale architecture presents challenges in fine-tuning and deployment due to high resource demands and slow inference speed. This paper explores the relatively unexplored yet promising realm of fine-tuning quantized diffusion models. Our analysis revealed that the baseline neglects the distinct patterns in model weights and the different roles throughout time steps when finetuning the diffusion model. To address these limitations, we introduce a novel memory-efficient fine-tuning method specifically designed for quantized diffusion models, dubbed TuneQDM. Our approach introduces quantization scales as separable functions to consider inter-channel weight patterns. Then, it optimizes these scales in a timestep-specific manner for effective reflection of the role of each time step. TuneQDM achieves performance on par with its full-precision counterpart while simultaneously offering significant memory efficiency. Experimental results demonstrate that our method consistently outperforms the baseline in both single-/multi-subject generations, exhibiting high subject fidelity and prompt fidelity comparable to the full precision model.

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

TL;DR

TuneQDM addresses the resource bottlenecks of fine-tuning billion-parameter diffusion models by introducing memory-efficient techniques for quantized weights. It decomposes quantization scales into inter- and intra-channel components and assigns distinct scales per timestep interval, capturing both weight-update patterns and timestep roles without expanding parameter count. Empirical results show TuneQDM achieving high subject and prompt fidelity close to full-precision models while using significantly less memory, outperforming the baseline PEQA approach, especially in 4-bit settings. This work enables practical personalization and deployment of diffusion models on resource-constrained platforms by reducing memory and computation without sacrificing performance.

Abstract

The emergence of billion-parameter diffusion models such as Stable Diffusion XL, Imagen, and DALL-E 3 has significantly propelled the domain of generative AI. However, their large-scale architecture presents challenges in fine-tuning and deployment due to high resource demands and slow inference speed. This paper explores the relatively unexplored yet promising realm of fine-tuning quantized diffusion models. Our analysis revealed that the baseline neglects the distinct patterns in model weights and the different roles throughout time steps when finetuning the diffusion model. To address these limitations, we introduce a novel memory-efficient fine-tuning method specifically designed for quantized diffusion models, dubbed TuneQDM. Our approach introduces quantization scales as separable functions to consider inter-channel weight patterns. Then, it optimizes these scales in a timestep-specific manner for effective reflection of the role of each time step. TuneQDM achieves performance on par with its full-precision counterpart while simultaneously offering significant memory efficiency. Experimental results demonstrate that our method consistently outperforms the baseline in both single-/multi-subject generations, exhibiting high subject fidelity and prompt fidelity comparable to the full precision model.
Paper Structure (30 sections, 5 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 5 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: Comparison between fine-tuning a full precision model and fine-tuning a quantized diffusion model with the baseline. Unlike the fp model, the baseline cannot achieve both prompt fidelity and subject fidelity simultaneously. Up to 400 iterations, it retains high image quality but fails to accurately reflect reference features. After 500 iterations, the ocean disappears, and further training leads to noticeable artifacts. Blue boxes indicate where the ocean is present, while red boxes highlight areas where the ocean should be but is missing. A unique token, [V], is used as an identifier describing images provided by users.
  • Figure 2: Weight change ratio after the fine-tuning. The left side describes the weight change ratio of fp model and baseline in 2D image plots, and the right side describes it in an inter-channel-wise boxplot. There is a clear difference between the baseline and the fp model.
  • Figure 3: Scenario of utilizing quantized pretrained models. Above: Full model is loaded and fine-tuned. Below: Quantized model is loaded and used. As the model sizes increase, fine-tuning requires significant computational cost. Therefore, directly fine-tuning the quantized model offers various efficiency advantages for users.
  • Figure 4: Multi-channel-wise-scale. Left: Our method requires quantization to be performed only once on a pretrained model, enabling subsequent fine-tuning across various tasks without large computation costs. Right: When switching tasks, the scale pairs should be switched together. This simplifies task switching and allows the quantized model to be easily adapted to different tasks.
  • Figure 5: Qualitative comparisons of single-subject generation. We compared the fp model, TuneQDM, and baseline that fine-tuned on the target images. Subject fidelity and prompt fidelity were assessed for images generated by both TuneQDM and the baseline.
  • ...and 9 more figures