Table of Contents
Fetching ...

Weight Spectra Induced Efficient Model Adaptation

Chongjie Si, Xuankun Yang, Muqing Liu, Yadao Wang, Xiaokang Yang, Wenbo Su, Bo Zheng, Wei Shen

TL;DR

The paper investigates how parameter-efficient fine-tuning (PEFT) updates weight matrices of large foundation models. Through a systematic spectral analysis, it shows that fine-tuning mostly amplifies the top singular values and reorients the corresponding dominant directions, while the rest of the spectrum remains stable, implying task knowledge is injected into a low-dimensional subspace. Motivated by these findings, the authors propose SpecLoRA, a spectral-directed extension of LoRA that learns to rescale the top singular directions via a learnable spectral mask, implemented efficiently with a Hadamard-based approach. Empirical results across NLP, commonsense reasoning, and vision benchmarks demonstrate that SpecLoRA consistently outperforms strong baselines with minimal parameter overhead, confirming the value of aligning adaptation with the spectral structure of pre-trained weights.

Abstract

Large-scale foundation models have demonstrated remarkable versatility across a wide range of downstream tasks. However, fully fine-tuning these models incurs prohibitive computational costs, motivating the development of Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA, which introduces low-rank updates to pre-trained weights. Despite their empirical success, the underlying mechanisms by which PEFT modifies model parameters remain underexplored. In this work, we present a systematic investigation into the structural changes of weight matrices during fully fine-tuning. Through singular value decomposition (SVD), we reveal that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact, suggesting that task-specific knowledge is injected into a low-dimensional subspace. Furthermore, we find that the dominant singular vectors are reoriented in task-specific directions, whereas the non-dominant subspace remains stable. Building on these insights, we propose a novel method that leverages learnable rescaling of top singular directions, enabling precise modulation of the most influential components without disrupting the global structure. Our approach achieves consistent improvements over strong baselines across multiple tasks, highlighting the efficacy of structurally informed fine-tuning.

Weight Spectra Induced Efficient Model Adaptation

TL;DR

The paper investigates how parameter-efficient fine-tuning (PEFT) updates weight matrices of large foundation models. Through a systematic spectral analysis, it shows that fine-tuning mostly amplifies the top singular values and reorients the corresponding dominant directions, while the rest of the spectrum remains stable, implying task knowledge is injected into a low-dimensional subspace. Motivated by these findings, the authors propose SpecLoRA, a spectral-directed extension of LoRA that learns to rescale the top singular directions via a learnable spectral mask, implemented efficiently with a Hadamard-based approach. Empirical results across NLP, commonsense reasoning, and vision benchmarks demonstrate that SpecLoRA consistently outperforms strong baselines with minimal parameter overhead, confirming the value of aligning adaptation with the spectral structure of pre-trained weights.

Abstract

Large-scale foundation models have demonstrated remarkable versatility across a wide range of downstream tasks. However, fully fine-tuning these models incurs prohibitive computational costs, motivating the development of Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA, which introduces low-rank updates to pre-trained weights. Despite their empirical success, the underlying mechanisms by which PEFT modifies model parameters remain underexplored. In this work, we present a systematic investigation into the structural changes of weight matrices during fully fine-tuning. Through singular value decomposition (SVD), we reveal that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact, suggesting that task-specific knowledge is injected into a low-dimensional subspace. Furthermore, we find that the dominant singular vectors are reoriented in task-specific directions, whereas the non-dominant subspace remains stable. Building on these insights, we propose a novel method that leverages learnable rescaling of top singular directions, enabling precise modulation of the most influential components without disrupting the global structure. Our approach achieves consistent improvements over strong baselines across multiple tasks, highlighting the efficacy of structurally informed fine-tuning.

Paper Structure

This paper contains 22 sections, 8 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Singular value distributions of selected weight matrices before and after fine-tuning. We visualize the singular value spectra of attention Q, K, V matrices and MLP Up/Down projection matrices from randomly selected layers of LLaMA3-8B. Fine-tuning primarily amplifies the top singular values while leaving the rest largely unchanged.
  • Figure 2: Cosine similarity between corresponding singular vectors of pre-trained and fine-tuned weights. For each selected layer and matrix, we compute the cosine similarity between singular vectors at the same index. Top singular directions exhibit low similarity, while lower directions remain closely aligned.
  • Figure 3: Ablation study on the number of trainable parameters (i.e., rank setting) of SpecLoRA.
  • Figure 4: Ablation study on the number of trainable parameters (i.e., rank setting) of SpecLoRA.