PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

Qibin Wang; Xiaolin Hu; Weikai Xu; Wei Liu; Jian Luan; Bin Wang

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

Qibin Wang, Xiaolin Hu, Weikai Xu, Wei Liu, Jian Luan, Bin Wang

TL;DR

PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight while leveraging semantic and linguistic information inherent in pre-trained weight.

Abstract

Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their ability to avoid excessive inference costs. However, LoRA still encounters the following challenges: (1) Limitation of low-rank assumption; and (2) Its initialization method may be suboptimal. To this end, we propose PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight. It achieves this by selecting skeletons from the pre-trained weight matrix and only learning a small matrix instead. Experiments demonstrate that PMSS outperforms LoRA and other fine-tuning methods across tasks with much less trainable parameters. We demonstrate its effectiveness, especially in handling complex tasks such as DROP benchmark(+3.4%/+5.9% on LLaMA2-7B/13B) and math reasoning(+12.89%/+5.61%/+3.11% on LLaMA2-7B, Mistral-7B and Gemma-7B of GSM8K). The code and model will be released soon.

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

TL;DR

Abstract

Paper Structure (27 sections, 17 equations, 2 figures, 11 tables, 1 algorithm)

This paper contains 27 sections, 17 equations, 2 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Intrinsic Dimension and Subspace Learning
Column Subset Selection, CUR and Interpolative Decomposition
Parameter-Efficient Fine-Tuning
Preliminary
Methodology
Formulation of PMSS
Fine-tuning Happens in Constraining Skeleton Subspaces
Parameter Efficiency and Low-Cost High-Rank Updates
Comparison with Other Works
Experiments
DROP Benchmark
Commonsense Reasoning
Math Reasoning
...and 12 more sections

Figures (2)

Figure 1: An overview of LoRA and our proposed PMSS. The distinction lies in that PMSS freezes $C$ and $R$ and only updates $U$ during the fine-tuning stage. Note that select denotes we select the row and column skeletons from the original pre-trained matrices to construct matrices $C$ and $R$, which ensures update happens in the subspace spanned by skeletons of the original weight. Further, $C$ and $R$ can be compactly represented by one-dimensional index vectors.
Figure 2: Benchmark of different fine-tuning methods on the DROP dataset. Illustration of the $F_1$ score (y-axis) with different numbers ratio(%) of trainable parameters (x-axis) using LLaMA2-7B as the base model.

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

TL;DR

Abstract

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)