Table of Contents
Fetching ...

FouRA: Fourier Low Rank Adaptation

Shubhankar Borse, Shreya Kadambi, Nilesh Prasad Pandey, Kartikeya Bhardwaj, Viswanath Ganapathy, Sweta Priyadarshi, Risheek Garrepalli, Rafael Esteves, Munawar Hayat, Fatih Porikli

TL;DR

FouRA tackles distribution collapse and data copying in LoRA-finetuned diffusion models by performing low-rank adaptation in the Fourier frequency domain and introducing an input-dependent adaptive rank gate. It learns decorrelated, compact spectral subspaces and folds the adapter strength $\alpha$ into the low-rank subspace, enabling robust merging of multiple adapters. The work provides theoretical results showing that frequency-domain tuning yields a more compact singular-value spectrum and lower generalization error for an optimal rank, while empirically improving image quality and diversity (LPIPS) and delivering competitive GLUE results for language tasks. Overall, FouRA presents a flexible, training-free approach to enhance parameter-efficient fine-tuning across vision and language domains, enabling diverse, high-quality outputs and effective multi-adapter fusion.

Abstract

While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples. This effect becomes more pronounced at higher values of adapter strength and for adapters with higher ranks which are fine-tuned on smaller datasets. To address these challenges, we present FouRA, a novel low-rank method that learns projections in the Fourier domain along with learning a flexible input-dependent adapter rank selection strategy. Through extensive experiments and analysis, we show that FouRA successfully solves the problems related to data copying and distribution collapse while significantly improving the generated image quality. We demonstrate that FouRA enhances the generalization of fine-tuned models thanks to its adaptive rank selection. We further show that the learned projections in the frequency domain are decorrelated and prove effective when merging multiple adapters. While FouRA is motivated for vision tasks, we also demonstrate its merits for language tasks on the GLUE benchmark.

FouRA: Fourier Low Rank Adaptation

TL;DR

FouRA tackles distribution collapse and data copying in LoRA-finetuned diffusion models by performing low-rank adaptation in the Fourier frequency domain and introducing an input-dependent adaptive rank gate. It learns decorrelated, compact spectral subspaces and folds the adapter strength into the low-rank subspace, enabling robust merging of multiple adapters. The work provides theoretical results showing that frequency-domain tuning yields a more compact singular-value spectrum and lower generalization error for an optimal rank, while empirically improving image quality and diversity (LPIPS) and delivering competitive GLUE results for language tasks. Overall, FouRA presents a flexible, training-free approach to enhance parameter-efficient fine-tuning across vision and language domains, enabling diverse, high-quality outputs and effective multi-adapter fusion.

Abstract

While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples. This effect becomes more pronounced at higher values of adapter strength and for adapters with higher ranks which are fine-tuned on smaller datasets. To address these challenges, we present FouRA, a novel low-rank method that learns projections in the Fourier domain along with learning a flexible input-dependent adapter rank selection strategy. Through extensive experiments and analysis, we show that FouRA successfully solves the problems related to data copying and distribution collapse while significantly improving the generated image quality. We demonstrate that FouRA enhances the generalization of fine-tuned models thanks to its adaptive rank selection. We further show that the learned projections in the frequency domain are decorrelated and prove effective when merging multiple adapters. While FouRA is motivated for vision tasks, we also demonstrate its merits for language tasks on the GLUE benchmark.
Paper Structure (52 sections, 2 theorems, 15 equations, 25 figures, 10 tables)

This paper contains 52 sections, 2 theorems, 15 equations, 25 figures, 10 tables.

Key Result

Corollary 4.0.1

Additionally, the generalization bound is more stable when the singular value distribution of adapter weights $\Delta\mathbf{W}$ is more compact.

Figures (25)

  • Figure 1: Distribution collapse with LoRA. Visual results generated by the Realistic Vision 3.0 model trained with LoRA and FouRA, for "Blue Fire" and "Origami" style adapters across four seeds. While LoRA images suffer from distribution collapse and lack diversity, we observe diverse images generated by FouRA.
  • Figure 2: LoRA v/s FouRA. For FouRA, we transform feature maps to frequency domain, where we learn up and down adapter projections along-with our proposed adaptive rank gating module.
  • Figure 3: Operational diagram of FouRA. Illustrating the components of Eq. \ref{['eq:freq']}.
  • Figure 4: Singular value spread for FouRA v/s LoRA.
  • Figure 5: Average Effective Rank of FouRA. Figure a. and b. shows plots for the average effective rank for various layers of the FouRA U-Net (Darker lines correspond to higher resolutions) and Figure c. compares the average effective rank of FouRA to SoRA. FouRA's effective rank reduces with the feature resolution, and it also reduces as the diffusion process proceeds, owing to lesser changes required towards the end.
  • ...and 20 more figures

Theorems & Definitions (4)

  • Corollary 4.0.1
  • proof
  • proof
  • Proposition 1