RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

Haoyu Wang; Tianci Liu; Ruirui Li; Monica Cheng; Tuo Zhao; Jing Gao

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

Haoyu Wang, Tianci Liu, Ruirui Li, Monica Cheng, Tuo Zhao, Jing Gao

TL;DR

A novel PEFT method, which conducts row and column-wise sparse low-rank adaptation (RoseLoRA) and guarantees the lower bound of the sparsity with respective to the matrix product, to ensure efficient and precise model updates.

Abstract

Pre-trained language models, trained on large-scale corpora, demonstrate strong generalizability across various NLP tasks. Fine-tuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However, during inference, the product of these matrices updates all pre-trained parameters, complicating tasks like knowledge editing that require selective updates. We propose a novel PEFT method, which conducts \textbf{r}ow and c\textbf{o}lumn-wise spar\textbf{se} \textbf{lo}w-\textbf{r}ank \textbf{a}daptation (RoseLoRA), to address this challenge. RoseLoRA identifies and updates only the most important parameters for a specific task, maintaining efficiency while preserving other model knowledge. By adding a sparsity constraint on the product of low-rank matrices and converting it to row and column-wise sparsity, we ensure efficient and precise model updates. Our theoretical analysis guarantees the lower bound of the sparsity with respective to the matrix product. Extensive experiments on five benchmarks across twenty datasets demonstrate that RoseLoRA outperforms baselines in both general fine-tuning and knowledge editing tasks.

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

TL;DR

Abstract

Paper Structure (26 sections, 3 theorems, 17 equations, 3 figures, 8 tables)

This paper contains 26 sections, 3 theorems, 17 equations, 3 figures, 8 tables.

Introduction
Related Works
Parameter Efficient Fine-Tuning (PEFT)
Knowledge Editing
Preliminary
Low-rank Adaptation
Sensitivity-based Importance Score for Pruning
Methodology
Row and Column-wise Sparse Low-rank Adaptation
Optimization
Experiment
Datasets and Experiment Settings
Datasets.
Baselines.
Performance Comparison
...and 11 more sections

Key Result

Proposition 1

The sparsity of $\bm{B}\bm{A}$ is greater or equal to $\max\{0,1+\sum_{i=1}^{r}(s(\bm{A}_{i*})+s(\bm{B}_{*i})-s(\bm{A}_{i*})s(\bm{B}_{*i}))-r\}$.

Figures (3)

Figure 1: The framework of proposed RoseLoRA.
Figure 2: The sparsity of the product of matrix $\bm{B}$ and $\bm{A}$ with different column and row sparsity.
Figure 3: Accuracy of LoRA and RoseLoRA with different amount of Math10K training data on GSM8K and SVAMP.

Theorems & Definitions (6)

Example 1
Proposition 1
Lemma 1
proof
Proposition 1
proof

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

TL;DR

Abstract

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)