ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Shuhua Shi; Shaohan Huang; Minghui Song; Zhoujun Li; Zihan Zhang; Haizhen Huang; Furu Wei; Weiwei Deng; Feng Sun; Qi Zhang

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

TL;DR

ResLoRA is the first work that combines the residual path with LoRA, an improved framework of LoRA that can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA.

Abstract

As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to fine-tune large language models (LLMs). However, updating the weights of LoRA blocks effectively and expeditiously is challenging due to the long calculation path in the original model. To address this, we propose ResLoRA, an improved framework of LoRA. By adding residual paths during training and using merging approaches to eliminate these extra paths during inference, our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method. To the best of our knowledge, ResLoRA is the first work that combines the residual path with LoRA. The code of our method is available at https://github.com/microsoft/LMOps/tree/main/reslora .

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

TL;DR

Abstract

Paper Structure (20 sections, 16 equations, 7 figures, 7 tables)

This paper contains 20 sections, 16 equations, 7 figures, 7 tables.

Introduction
Related Works
Method
LoRA Blocks
ResLoRA Blocks
Merging Approaches
Mathematics Analyse
Experiments
Experimental Setup
Natural Language Generating
Natural Language Understanding
Text to Image
Ablation studies
Analysis
Conclusion
...and 5 more sections

Figures (7)

Figure 1: An illustration of ResLoRA method on accuracy for SVAMPsvamp. ResLoRA achieves a 2.5x faster convergence speed and improves performance by 14.3%.
Figure 2: Structures of LoRA and ResLoRA
Figure 3: Results of text-to-image task. We compare images generated by LoRA and ResLoRA$_{is}$.
Figure 4: Training loss with different $pre\_num$ values on SVAMP. $pre\_num=-1$ means each ResLoRA block uses all previous ResLoRA blocks.
Figure 5: Difference of the weights of trained matrices between LoRA and ResLoRA$_{bs}$ blocks. We fine-tune models on SVAMP both for 20 epochs, and observe their difference. The ResLoRA$_{bs}$ blocks have been merged.
...and 2 more figures

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

TL;DR

Abstract

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Authors

TL;DR

Abstract

Table of Contents

Figures (7)