Spectral Adapter: Fine-Tuning in Spectral Space

Fangzhao Zhang; Mert Pilanci

Spectral Adapter: Fine-Tuning in Spectral Space

Fangzhao Zhang, Mert Pilanci

TL;DR

This work studies the enhancement of current PEFT methods by incorporating the spectral information of pretrained weight matrices into the fine-tuning procedure, and shows that the proposed fine-tuning model enables better parameter efficiency and tuning performance as well as benefits multi-adapter fusion.

Abstract

Recent developments in Parameter-Efficient Fine-Tuning (PEFT) methods for pretrained deep neural networks have captured widespread interest. In this work, we study the enhancement of current PEFT methods by incorporating the spectral information of pretrained weight matrices into the fine-tuning procedure. We investigate two spectral adaptation mechanisms, namely additive tuning and orthogonal rotation of the top singular vectors, both are done via first carrying out Singular Value Decomposition (SVD) of pretrained weights and then fine-tuning the top spectral space. We provide a theoretical analysis of spectral fine-tuning and show that our approach improves the rank capacity of low-rank adapters given a fixed trainable parameter budget. We show through extensive experiments that the proposed fine-tuning model enables better parameter efficiency and tuning performance as well as benefits multi-adapter fusion.

Spectral Adapter: Fine-Tuning in Spectral Space

TL;DR

Abstract

Paper Structure (27 sections, 2 theorems, 8 equations, 16 figures, 8 tables)

This paper contains 27 sections, 2 theorems, 8 equations, 16 figures, 8 tables.

Introduction
Spectral Adapter: Incorporating Spectral Information into Fine-Tuning
Theoretical Insights
Adapter Rank Capacity
Weight Subspace Alignment
Empirical Results: The Impact of Spectral Information
Language Model Fine-Tuning: Enhancing Fine-Tuning Results with Spectral AdapterA
Diffusion Model Fusion: Improving Multi-Object Fine-Tuning with Spectral AdapterA
Diffusion Model Expressiveness: Improving Parameter Efficiency with Spectral AdapterR
Final Note: A Closer Look at SVD Cost
Conclusion and Limitations
Acknowledgement
Prior Work
Rank Capacity Proof
Cayley Parameterization Proof
...and 12 more sections

Key Result

Lemma 3.1

Suppose that $W\in\mathbb{R}^{n\times m}$ is an arbitrary full row-rank matrix and $n\le m$ without loss of generality. Consider rank-r LoRA and rank-r additive spectral adapter, which have an equal number of trainable parameters. We have

Figures (16)

Figure 1: Training loss of fine-tuning Llama3 8B model with Orca Math dataset mitra2024orcamath and evaluation score on GSM8K benchmark cobbe2021training. We follow experimental setup in qdora, see Appendix \ref{['loss_image_detail']} for details. All methods except full fine-tuning maintain approximately $0.23\%$ trainable parameters.
Figure 2: Compared to LoRA which proposes to add low-rank trainable matrices to pretrained weights, we study two types of spectral adapters: Spectral Adapter$^A$ considers additively tuning the top columns of singular vector matrices and Spectral Adapter$^R$ considers orthogonally rotating the top columns of singular vector matrices.
Figure 3: Top singular vector of pretrained weight recognizes more ideal neuron direction. Illustration plot for Section \ref{['subspace_sec']}.
Figure 4: Distributing different concept tunings along different spectral space helps with identity preservation in multi-adapter fusion, see Section \ref{['a_exp']} for details.
Figure 5: Generation results of Chilloutmix diffusion model chillout with different fused adapters tuned on three custom animal concepts. See Section \ref{['fourobj']} for details.
...and 11 more figures

Theorems & Definitions (4)

Lemma 3.1
Lemma 4.1
proof
proof

Spectral Adapter: Fine-Tuning in Spectral Space

TL;DR

Abstract

Spectral Adapter: Fine-Tuning in Spectral Space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (4)