Table of Contents
Fetching ...

Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates

Yixing Xu, Chao Li, Xuanwu Yin, Spandan Tiwari, Dong Li, Ashish Sirasao, Emad Barsoum

TL;DR

The paper targets the limitations of LoRA in parameter-efficient fine-tuning by introducing Dual LoRA, which splits the low-rank update into magnitude and direction groups learned via ReLU and Sign to better emulate gradient-based FFT updates. It presents a four-matrix architecture with separate magnitude and direction components and uses straight-through estimates to backpropagate through the Sign function, enabling higher effective update rank. Across NLG, NLU, and commonsense tasks on models including LLaMA and GPT-2 families, Dual LoRA consistently surpasses LoRA and other state-of-the-art PEFT methods with the same trainable parameter budget. The work demonstrates that the induced inductive bias yields higher-rank updates and improved adaptability, suggesting broad applicability to PEFT enhancements beyond LoRA.

Abstract

Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language generation (NLG), understanding (NLU), and commonsense reasoning datasets on GPT-2, RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.

Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates

TL;DR

The paper targets the limitations of LoRA in parameter-efficient fine-tuning by introducing Dual LoRA, which splits the low-rank update into magnitude and direction groups learned via ReLU and Sign to better emulate gradient-based FFT updates. It presents a four-matrix architecture with separate magnitude and direction components and uses straight-through estimates to backpropagate through the Sign function, enabling higher effective update rank. Across NLG, NLU, and commonsense tasks on models including LLaMA and GPT-2 families, Dual LoRA consistently surpasses LoRA and other state-of-the-art PEFT methods with the same trainable parameter budget. The work demonstrates that the induced inductive bias yields higher-rank updates and improved adaptability, suggesting broad applicability to PEFT enhancements beyond LoRA.

Abstract

Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language generation (NLG), understanding (NLU), and commonsense reasoning datasets on GPT-2, RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.

Paper Structure

This paper contains 18 sections, 16 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The architecture of the original LoRA and our proposed Dual LoRA. The low-rank update matrices are separated into the magnitude group and the direction group.
  • Figure 2: Average accuracy on commonsense reasoning datasets using LLaMA3-8B as the baseline model with $r_1=\{2,4,\cdot\cdot\cdot,30\}$ and $r_2=32-r_1$ in the experiments. The red line is the proposed Dual LoRA, the blue/orange lines represent DoRA/LoRA with different ranks.
  • Figure 3: The average rank of $\Delta W$ for LoRA, magnitude group of Dual LoRA, direction group of Dual LoRA, and the overall Dual LoRA. The experiments are conducted on LLaMA2-7B.