SBoRA: Low-Rank Adaptation with Regional Weight Updates

Lai-Man Po; Yuyang Liu; Haoxuan Wu; Tianqi Zhang; Wing-Yin Yu; Zhuohan Wang; Zeyu Jiang; Kun Li

SBoRA: Low-Rank Adaptation with Regional Weight Updates

Lai-Man Po, Yuyang Liu, Haoxuan Wu, Tianqi Zhang, Wing-Yin Yu, Zhuohan Wang, Zeyu Jiang, Kun Li

TL;DR

The empirical results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning, and the effectiveness of QSBoRA on quantized LLaMA models of varying scales, highlighting its potential for efficient adaptation to new tasks.

Abstract

This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA, while improving learning performance. By utilizing orthogonal standard basis vectors to initialize one of the low-rank matrices (either $\mathbf{A}$ or $\mathbf{B}$), SBoRA facilitates regional weight updates and memory-efficient fine-tuning. This results in two variants, SBoRA-FA and SBoRA-FB, where only one of the matrices is updated, leading to a sparse update matrix $\mathrmΔ \mathbf{W}$ with predominantly zero rows or columns. Consequently, most of the fine-tuned model's weights $(\mathbf{W}_0+\mathrmΔ \mathbf{W})$ remain unchanged from the pre-trained weights, akin to the modular organization of the human brain, which efficiently adapts to new tasks. Our empirical results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning. Furthermore, we evaluate the effectiveness of QSBoRA on quantized LLaMA models of varying scales, highlighting its potential for efficient adaptation to new tasks. Code is available at https://github.com/cityuhkai/SBoRA

SBoRA: Low-Rank Adaptation with Regional Weight Updates

TL;DR

Abstract

), SBoRA facilitates regional weight updates and memory-efficient fine-tuning. This results in two variants, SBoRA-FA and SBoRA-FB, where only one of the matrices is updated, leading to a sparse update matrix

with predominantly zero rows or columns. Consequently, most of the fine-tuned model's weights

remain unchanged from the pre-trained weights, akin to the modular organization of the human brain, which efficiently adapts to new tasks. Our empirical results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning. Furthermore, we evaluate the effectiveness of QSBoRA on quantized LLaMA models of varying scales, highlighting its potential for efficient adaptation to new tasks. Code is available at https://github.com/cityuhkai/SBoRA

Paper Structure (14 sections, 13 equations, 4 figures, 5 tables)

This paper contains 14 sections, 13 equations, 4 figures, 5 tables.

Introduction
Related Work
Full Fine-Tuning (FFT) and Parameter-Efficient Fine-Tuning (PEFT)
Low-Rank Adaptation(LoRA)
Variants of LoRA
Standard Basis Low-Rank Adaptation (SBoRA)
Orthogonal Standard Basis
Regional Weight Update
Complexity Analysis of SBoRA
Experiment
Evaluating SBoRA on Commonsense Reasoning Tasks
Evaluating SBoRA on Arithmetic Reasoning
QSBoRA Evaluation on MMLU
Conclusion

Figures (4)

Figure 1: Four fine-tuning strategies: (a) Full Fine-Tuning (FFT), (b) LoRA, (c) SBoRA-FA, and (d) SBoRA-FB.
Figure 2: The diagram illustrates the regional weight update process of SBoRA, showcasing distinct $\mathbf{W}_0+\mathrm{\Delta}\mathbf{W}$ computing procedures of SBoRA-FA(upper) and SBoRA-FB (lower). The diagram employs different colors to represent frozen, trainable, and zero parameters.
Figure 3: GPU usage and training time for LLaMA-7B on arithmetic reasoning tasks. Results for rank 64 (left) and 32 (right) are displayed. Y-axis: GPU usage; X-axis: training time. Total training time is labeled for each method.
Figure 4: GPU usage and training time for LLaMA3-8B on arithmetic reasoning tasks. Results for rank 64 (left) and 32 (right) are displayed. Y-axis: GPU usage; X-axis: training time. Total training time is labeled for each method.

SBoRA: Low-Rank Adaptation with Regional Weight Updates

TL;DR

Abstract

SBoRA: Low-Rank Adaptation with Regional Weight Updates

Authors

TL;DR

Abstract

Table of Contents

Figures (4)