Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates
Yixing Xu, Chao Li, Xuanwu Yin, Spandan Tiwari, Dong Li, Ashish Sirasao, Emad Barsoum
TL;DR
The paper targets the limitations of LoRA in parameter-efficient fine-tuning by introducing Dual LoRA, which splits the low-rank update into magnitude and direction groups learned via ReLU and Sign to better emulate gradient-based FFT updates. It presents a four-matrix architecture with separate magnitude and direction components and uses straight-through estimates to backpropagate through the Sign function, enabling higher effective update rank. Across NLG, NLU, and commonsense tasks on models including LLaMA and GPT-2 families, Dual LoRA consistently surpasses LoRA and other state-of-the-art PEFT methods with the same trainable parameter budget. The work demonstrates that the induced inductive bias yields higher-rank updates and improved adaptability, suggesting broad applicability to PEFT enhancements beyond LoRA.
Abstract
Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language generation (NLG), understanding (NLU), and commonsense reasoning datasets on GPT-2, RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.
