Table of Contents
Fetching ...

SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He

TL;DR

The paper tackles the challenge of efficiently fine-tuning large language models while preserving pre-trained world knowledge and safety, a key problem in parameter-efficient fine-tuning (PEFT).It introduces Subspace-Constrained LoRA (SC-LoRA), which identifies a low-rank subspace via the top eigenvectors of $\\Delta\\mathrm{Cov} = (1-\\beta)\\mathrm{Cov}_+ -\\beta\\mathrm{Cov}_-$ and initializes LoRA adapters so that $B_{init}$ and $A_{init}$ enforce $B_{init}A_{init}x = \\Pi_S(h)$, keeping updates in $S$.The method balances learning the fine-tuning data and preserving preserved knowledge with a hyperparameter $\\beta$, and demonstrates superior performance on world knowledge and safety-preservation tasks compared to existing LoRA initializations.These results suggest SC-LoRA provides a practical, theory-grounded approach to knowledge-preserving, efficient fine-tuning for LLMs, with broad implications for safe and reliable PEFT deployments.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suffers from slow convergence speed and knowledge forgetting problems. Recent studies have leveraged the power of designed LoRA initialization, to enhance the fine-tuning efficiency, or to preserve knowledge in the pre-trained LLM. However, none of these works can address the two cases at the same time. To this end, we introduce Subspace-Constrained LoRA (SC-LoRA), a novel LoRA initialization framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation. We achieve this by constraining the output of trainable LoRA adapters in a low-rank subspace, where the context information of fine-tuning data is most preserved while the context information of preserved knowledge is least retained, in a balanced way. Such constraint enables the trainable weights to primarily focus on the main features of fine-tuning data while avoiding damaging the preserved knowledge features. We provide theoretical analysis on our method, and conduct extensive experiments including safety preservation and world knowledge preservation, on various downstream tasks. In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting, surpassing contemporary LoRA initialization methods.

SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

TL;DR

The paper tackles the challenge of efficiently fine-tuning large language models while preserving pre-trained world knowledge and safety, a key problem in parameter-efficient fine-tuning (PEFT).It introduces Subspace-Constrained LoRA (SC-LoRA), which identifies a low-rank subspace via the top eigenvectors of $\\Delta\\mathrm{Cov} = (1-\\beta)\\mathrm{Cov}_+ -\\beta\\mathrm{Cov}_-$ and initializes LoRA adapters so that $B_{init}$ and $A_{init}$ enforce $B_{init}A_{init}x = \\Pi_S(h)$, keeping updates in $S$.The method balances learning the fine-tuning data and preserving preserved knowledge with a hyperparameter $\\beta$, and demonstrates superior performance on world knowledge and safety-preservation tasks compared to existing LoRA initializations.These results suggest SC-LoRA provides a practical, theory-grounded approach to knowledge-preserving, efficient fine-tuning for LLMs, with broad implications for safe and reliable PEFT deployments.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suffers from slow convergence speed and knowledge forgetting problems. Recent studies have leveraged the power of designed LoRA initialization, to enhance the fine-tuning efficiency, or to preserve knowledge in the pre-trained LLM. However, none of these works can address the two cases at the same time. To this end, we introduce Subspace-Constrained LoRA (SC-LoRA), a novel LoRA initialization framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation. We achieve this by constraining the output of trainable LoRA adapters in a low-rank subspace, where the context information of fine-tuning data is most preserved while the context information of preserved knowledge is least retained, in a balanced way. Such constraint enables the trainable weights to primarily focus on the main features of fine-tuning data while avoiding damaging the preserved knowledge features. We provide theoretical analysis on our method, and conduct extensive experiments including safety preservation and world knowledge preservation, on various downstream tasks. In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting, surpassing contemporary LoRA initialization methods.

Paper Structure

This paper contains 25 sections, 2 theorems, 18 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathrm{Cov}_+, \mathrm{Cov}_-$ be the covariance matrices of random vectors $X_+\sim\mathcal{P}_+$ and $X_-\sim\mathcal{P}_-$, respectively: And let Then do eigenvalue decomposition of $\Delta\mathrm{Cov}$ and take the first $r$ eigenvectors $\{q_i\}_{i\in [r]}$ with the largest eigenvalues. Then, if following condition holds, the reward $R(S)$ is maximized:

Figures (6)

  • Figure 1: Comparison of LoRA with default Kaiming initialization and our proposed SC-LoRA. (a) LoRA initializes down-projection matrix $A$ by Gaussian noise and up-projection matrix $B$ by zero matrix. (b) Our SC-LoRA initializes $A$ by $Q_r^\top W_0$ and B by $Q_r$, where $Q_r$ consists of $r$ orthonormal vectors as columns obtained by Algorithm \ref{['alg:init-pseudocode']}.
  • Figure 2: Relations between $\beta$ and knowledge preservation performance. The experiment setting of the left figure is described in Section \ref{['subsec: benign finetuning']}, while that of the right figure is described in Section \ref{['subsec: world knowledge']}. Lower harmfulness score or higher world knowledge score indicates better performance on knowledge preservation.
  • Figure 3: Relations between $\beta$ and fine-tuning performance. The experiment setting of the left figure is described in Section \ref{['subsec: benign finetuning']}, while that of the right figure is conducted in Section \ref{['subsec: world knowledge']}. The right figure shows clear monotonicity with $\beta$, while such trend does not occur in the left figure.
  • Figure 4: Sensitivity to the rank $r$ on Samsum.
  • Figure 5: Pareto plot of learning rate fine-tuning on task Samsum.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • proof
  • proof