Table of Contents
Fetching ...

RandLoRA: Full-rank parameter-efficient fine-tuning of large models

Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad

TL;DR

RandLoRA addresses LoRA's limitation of low-rank updates by introducing a full-rank PEFT that combines fixed random low-rank bases through learned diagonal scalings, preserving memory efficiency. The method is motivated by theoretical convergence considerations and validated across vision, language, and vision-language tasks, where it bridges or eliminates the gap to standard fine-tuning at equivalent parameter budgets, particularly for CLIP-like architectures. Activation and loss-landscape analyses show RandLoRA yields representations and optimization trajectories closer to full fine-tuning than LoRA, with sparse random bases offering potential compute savings. While RandLoRA incurs some training-time overhead, it provides a scalable, practical route to robust full-rank fine-tuning on large models and opens avenues for hybrids and compute-optimized variants.

Abstract

Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. The low-rank nature of the weight update inherently limits the representation power of fine-tuned models, however, thus potentially compromising performance on complex tasks. This raises a critical question: when a performance gap between LoRA and standard fine-tuning is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing RandLoRA, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where RandLoRA significantly reduces -- and sometimes eliminates -- the performance gap between standard fine-tuning and LoRA, demonstrating its efficacy.

RandLoRA: Full-rank parameter-efficient fine-tuning of large models

TL;DR

RandLoRA addresses LoRA's limitation of low-rank updates by introducing a full-rank PEFT that combines fixed random low-rank bases through learned diagonal scalings, preserving memory efficiency. The method is motivated by theoretical convergence considerations and validated across vision, language, and vision-language tasks, where it bridges or eliminates the gap to standard fine-tuning at equivalent parameter budgets, particularly for CLIP-like architectures. Activation and loss-landscape analyses show RandLoRA yields representations and optimization trajectories closer to full fine-tuning than LoRA, with sparse random bases offering potential compute savings. While RandLoRA incurs some training-time overhead, it provides a scalable, practical route to robust full-rank fine-tuning on large models and opens avenues for hybrids and compute-optimized variants.

Abstract

Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. The low-rank nature of the weight update inherently limits the representation power of fine-tuned models, however, thus potentially compromising performance on complex tasks. This raises a critical question: when a performance gap between LoRA and standard fine-tuning is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing RandLoRA, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where RandLoRA significantly reduces -- and sometimes eliminates -- the performance gap between standard fine-tuning and LoRA, demonstrating its efficacy.

Paper Structure

This paper contains 48 sections, 3 theorems, 17 equations, 5 figures, 15 tables.

Key Result

Theorem 4.1

Let $W$ be a fixed $D \times d$ matrix, with $D > d$ and $\text{rank}(W) = d$. Fix $1 \leq n \leq d$, such that $d = nr$. The matrix $W$ can be factorized using SVD as where $U_j \in \mathbb{R}^{D\times r}$, $V_j \in \mathbb{R}^{r \times d}$ are partitions of the left and right singular vectors, and $\Sigma_j \in \mathbb{R}^{r \times r}$ contains r singular values. For each $1 \leq j \leq n$, let

Figures (5)

  • Figure 1: LoRA becomes limited by the rank of its update. We train DinoV2 and CLIP to classify 21 image datasets and LLama3-8B to solve 8 commonsense reasoning tasks.
  • Figure 2: Tuning CLIP and DinoV2 vision encoders for image classification. Accuracy averaged over 21 datasets. We additionally report max GPU VRAM usage during training.
  • Figure 3: Tuning CLIP's vision and language encoders for image classification. Accuracy averaged over 22 datasets. We additionally report max GPU VRAM usage during training.
  • Figure 4: How close do RandLoRA and LoRA get to standard fine-tuning ? We compare CKA scores of RandLoRA and LoRA with fine-tuned activations (top) and the mode connectivity in the loss landscape of UCF101 (bottom)
  • Figure 5: Mode connectivity in the loss landscape when tuning CLIP for image classification. Interactive 3D figures are available in the supplementary material

Theorems & Definitions (7)

  • Theorem 4.1
  • Lemma D.1
  • proof
  • Lemma D.2
  • proof
  • proof
  • proof