Table of Contents
Fetching ...

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Teng Hu, Jiangning Zhang, Ran Yi, Hongrui Huang, Yabiao Wang, Lizhuang Ma

TL;DR

Diffusion-model fine-tuning remains costly when adapting large pre-trained models to new domains. This work introduces SaRA, a progressive sparse low-rank adaptation that reuses temporarily ineffective parameters—the smallest $|p|$ entries—via sparse updates while preserving priors. A nuclear-norm-based low-rank constraint mitigates overfitting, and a progressive parameter adjustment strategy reselects remaining below-threshold parameters to maximize utilization. Unstructured backpropagation further reduces memory, enabling a one-line-code integration and competitive backwards-compatible performance, outperforming LoRA in prior preservation and often achieving better VLHI across SD variants. Overall, SaRA offers a model-agnostic, memory-efficient, plug-and-play PEFT alternative that enhances downstream generative capabilities while maintaining pre-trained priors.

Abstract

In recent years, the development of diffusion models has led to significant progress in image and video generation tasks, with pre-trained models like the Stable Diffusion series playing a crucial role. Inspired by model pruning which lightens large pre-trained models by removing unimportant parameters, we propose a novel model fine-tuning method to make full use of these ineffective parameters and enable the pre-trained model with new task-specified capabilities. In this work, we first investigate the importance of parameters in pre-trained diffusion models, and discover that the smallest 10% to 20% of parameters by absolute values do not contribute to the generation process. Based on this observation, we propose a method termed SaRA that re-utilizes these temporarily ineffective parameters, equating to optimizing a sparse weight matrix to learn the task-specific knowledge. To mitigate overfitting, we propose a nuclear-norm-based low-rank sparse training scheme for efficient fine-tuning. Furthermore, we design a new progressive parameter adjustment strategy to make full use of the re-trained/finetuned parameters. Finally, we propose a novel unstructural backpropagation strategy, which significantly reduces memory costs during fine-tuning. Our method enhances the generative capabilities of pre-trained models in downstream applications and outperforms traditional fine-tuning methods like LoRA in maintaining model's generalization ability. We validate our approach through fine-tuning experiments on SD models, demonstrating significant improvements. SaRA also offers a practical advantage that requires only a single line of code modification for efficient implementation and is seamlessly compatible with existing methods.

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

TL;DR

Diffusion-model fine-tuning remains costly when adapting large pre-trained models to new domains. This work introduces SaRA, a progressive sparse low-rank adaptation that reuses temporarily ineffective parameters—the smallest entries—via sparse updates while preserving priors. A nuclear-norm-based low-rank constraint mitigates overfitting, and a progressive parameter adjustment strategy reselects remaining below-threshold parameters to maximize utilization. Unstructured backpropagation further reduces memory, enabling a one-line-code integration and competitive backwards-compatible performance, outperforming LoRA in prior preservation and often achieving better VLHI across SD variants. Overall, SaRA offers a model-agnostic, memory-efficient, plug-and-play PEFT alternative that enhances downstream generative capabilities while maintaining pre-trained priors.

Abstract

In recent years, the development of diffusion models has led to significant progress in image and video generation tasks, with pre-trained models like the Stable Diffusion series playing a crucial role. Inspired by model pruning which lightens large pre-trained models by removing unimportant parameters, we propose a novel model fine-tuning method to make full use of these ineffective parameters and enable the pre-trained model with new task-specified capabilities. In this work, we first investigate the importance of parameters in pre-trained diffusion models, and discover that the smallest 10% to 20% of parameters by absolute values do not contribute to the generation process. Based on this observation, we propose a method termed SaRA that re-utilizes these temporarily ineffective parameters, equating to optimizing a sparse weight matrix to learn the task-specific knowledge. To mitigate overfitting, we propose a nuclear-norm-based low-rank sparse training scheme for efficient fine-tuning. Furthermore, we design a new progressive parameter adjustment strategy to make full use of the re-trained/finetuned parameters. Finally, we propose a novel unstructural backpropagation strategy, which significantly reduces memory costs during fine-tuning. Our method enhances the generative capabilities of pre-trained models in downstream applications and outperforms traditional fine-tuning methods like LoRA in maintaining model's generalization ability. We validate our approach through fine-tuning experiments on SD models, demonstrating significant improvements. SaRA also offers a practical advantage that requires only a single line of code modification for efficient implementation and is seamlessly compatible with existing methods.
Paper Structure (30 sections, 13 equations, 28 figures, 10 tables, 1 algorithm)

This paper contains 30 sections, 13 equations, 28 figures, 10 tables, 1 algorithm.

Figures (28)

  • Figure 1: The reparameterized fine-tuning methods (b) address the additional inference latency introduced by additive fine-tuning methods (a) through reparameterizing the pre-trained weights from a global view. Selective fine-tuning methods (c) improve upon global parameter updates by employing sparse updates, which better preserve the model prior by freezing most of the pre-trained parameters. Our SaRA (d) further enhances (c) by significantly reducing memory costs and achieving superior performance in both adaptation capability and prior preservation.
  • Figure 2: (a) Weight distributions of the pre-trained parameters in Stable Diffusion (SD) 1.4, 1.5, 2.0, and 3.0, which are all similar to a Gaussian distribution, therefore a large number of parameters are around $0$. (b) The performance (FID and CLIP Score ) of SD Models when the parameters in the pre-trained models with absolute values smaller than a certain threshold are set to $0$.
  • Figure 3: The changes of parameters whose absolute values are bewlow the $1\%$ theshold $\theta_t$ during full-parameter fine-tuning. The blue and yellow curves show the proportions of parameters originated from both the initially below-threshold $P_{M}$ and the initially above-threshold $P_{1-M}$.
  • Figure 4: Visualization of our unstructural backpropagation. a) LoRA stores an additional intermediate variable $X'_{i+1}$ in each LoRA layer, and b) selective PEFT methods store the gradients for the whole parameters matrices, causing a waste of memory and computation resources. c) In contrast, our Unstructural Backpropagation method extracts the trainable parameters, sets them as independent leaf nodes, and only retains gradients for them, which largely reduces the memory cost.
  • Figure 5: Computation cost on memory and time of different PEFT methods.
  • ...and 23 more figures