Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models
Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas
TL;DR
The paper tackles efficient adaptation of large pre-trained diffusion models by proposing Spectrum Aware Fine-Tuning (SODA), which jointly tunes the spectral magnitudes and singular vectors of weight matrices. SODA leverages a spectrum decomposition $\mathbf{W}_0 = \mathbf{W}_0^{spec} \mathbf{W}_0^{basis}$ and updates the spectrum $\Delta\boldsymbol{S}$ alongside an orthogonal basis updated via a Kronecker-structured rotation $\mathbf{R}$ on the Stiefel manifold. It offers two decomposition modalities, SVD-based and QR/LQ-based, to realize parameter-efficient yet expressive fine-tuning, demonstrated on text-to-image diffusion personalization tasks (subject and style) with extensive ablations. The results show that SODA surpasses strong baselines like LoRA and OFT in both fidelity and style-preserving compositional generation, highlighting the value of exploiting spectral priors for high-capacity yet efficient fine-tuning in diffusion models.
Abstract
Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. Using the Kronecker product and efficient Stiefel optimizers, we achieve parameter-efficient adaptation of orthogonal matrices. We introduce Spectral Orthogonal Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity. Extensive evaluations on text-to-image diffusion models demonstrate SODA's effectiveness, offering a spectrum-aware alternative to existing fine-tuning methods.
