Table of Contents
Fetching ...

RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models

Yilang Zhang, Bingcong Li, Georgios B. Giannakis

TL;DR

The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence and identifying the optimal low-rank factorization per step that minimizes an upper bound on the loss.

Abstract

Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence and noticeable performance degradation, due to inconsistent and imbalanced weight updates induced by its nonunique low-rank factorizations. To overcome these limitations, this article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss. The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence. Extensive experiments evaluate RefLoRA on natural language understanding, and commonsense reasoning tasks with popular large language models including DeBERTaV3, LLaMA-7B, LLaMA2-7B and LLaMA3-8B. The numerical tests corroborate that RefLoRA converges faster, outperforms various benchmarks, and enjoys negligible computational overhead compared to state-of-the-art LoRA variants.

RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models

TL;DR

The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence and identifying the optimal low-rank factorization per step that minimizes an upper bound on the loss.

Abstract

Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence and noticeable performance degradation, due to inconsistent and imbalanced weight updates induced by its nonunique low-rank factorizations. To overcome these limitations, this article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss. The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence. Extensive experiments evaluate RefLoRA on natural language understanding, and commonsense reasoning tasks with popular large language models including DeBERTaV3, LLaMA-7B, LLaMA2-7B and LLaMA3-8B. The numerical tests corroborate that RefLoRA converges faster, outperforms various benchmarks, and enjoys negligible computational overhead compared to state-of-the-art LoRA variants.

Paper Structure

This paper contains 32 sections, 8 theorems, 58 equations, 4 figures, 9 tables, 2 algorithms.

Key Result

Lemma 1

With Assumption as:full-rank in effect, it holds that Moreover, if $\:\mathbf{P}_t \in \mathrm{O} (r)$, then $\Delta \tilde{\mathbf{W}}_t = \Delta \mathbf{W}_t$.

Figures (4)

  • Figure 1: Visualization of loss $\ell (\mathbf{W}_t + \Delta \tilde{\mathbf{W}}_t)$ and upper bound \ref{['eq:alt-upper-bound']}. LoRA corresponds to $\mathbf{S}_t = \mathbf{I}_r$, while our refactoring (ref.) optimizes $\mathbf{S}_t$.
  • Figure 2: Comparison of LoRA, ScaledGD, and RefLoRA for matrix factorization
  • Figure 3: Images generated from Stable Diffusion fine-tuned with different approaches.
  • Figure 4: Convergence and complexity comparison

Theorems & Definitions (15)

  • Lemma 1
  • Proposition 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • proof
  • proof
  • proof
  • proof
  • ...and 5 more