Table of Contents
Fetching ...

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

Qibin Wang, Xiaolin Hu, Weikai Xu, Wei Liu, Jian Luan, Bin Wang

TL;DR

PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight while leveraging semantic and linguistic information inherent in pre-trained weight.

Abstract

Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their ability to avoid excessive inference costs. However, LoRA still encounters the following challenges: (1) Limitation of low-rank assumption; and (2) Its initialization method may be suboptimal. To this end, we propose PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight. It achieves this by selecting skeletons from the pre-trained weight matrix and only learning a small matrix instead. Experiments demonstrate that PMSS outperforms LoRA and other fine-tuning methods across tasks with much less trainable parameters. We demonstrate its effectiveness, especially in handling complex tasks such as DROP benchmark(+3.4%/+5.9% on LLaMA2-7B/13B) and math reasoning(+12.89%/+5.61%/+3.11% on LLaMA2-7B, Mistral-7B and Gemma-7B of GSM8K). The code and model will be released soon.

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

TL;DR

PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight while leveraging semantic and linguistic information inherent in pre-trained weight.

Abstract

Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their ability to avoid excessive inference costs. However, LoRA still encounters the following challenges: (1) Limitation of low-rank assumption; and (2) Its initialization method may be suboptimal. To this end, we propose PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight. It achieves this by selecting skeletons from the pre-trained weight matrix and only learning a small matrix instead. Experiments demonstrate that PMSS outperforms LoRA and other fine-tuning methods across tasks with much less trainable parameters. We demonstrate its effectiveness, especially in handling complex tasks such as DROP benchmark(+3.4%/+5.9% on LLaMA2-7B/13B) and math reasoning(+12.89%/+5.61%/+3.11% on LLaMA2-7B, Mistral-7B and Gemma-7B of GSM8K). The code and model will be released soon.
Paper Structure (27 sections, 17 equations, 2 figures, 11 tables, 1 algorithm)

This paper contains 27 sections, 17 equations, 2 figures, 11 tables, 1 algorithm.

Figures (2)

  • Figure 1: An overview of LoRA and our proposed PMSS. The distinction lies in that PMSS freezes $C$ and $R$ and only updates $U$ during the fine-tuning stage. Note that select denotes we select the row and column skeletons from the original pre-trained matrices to construct matrices $C$ and $R$, which ensures update happens in the subspace spanned by skeletons of the original weight. Further, $C$ and $R$ can be compactly represented by one-dimensional index vectors.
  • Figure 2: Benchmark of different fine-tuning methods on the DROP dataset. Illustration of the $F_1$ score (y-axis) with different numbers ratio(%) of trainable parameters (x-axis) using LLaMA2-7B as the base model.