Table of Contents
Fetching ...

1LoRA: Summation Compression for Very Low-Rank Adaptation

Alessio Quercia, Zhuo Cao, Arya Bangun, Richard D. Paul, Abigail Morrison, Ira Assent, Hanno Scharr

TL;DR

1LoRA introduces a summation-based, extremely memory-efficient fine-tuning approach for very low-rank adaptation, reducing trainable parameters per layer to $d$ by using a fixed input compression (the feature sum) and a single trainable decompression vector. It outperforms state-of-the-art PEFT baselines across depth estimation, vision-language tasks, diffusion-model fine-tuning, and image classification while offering favorable memory and compute profiles that enable broader, layer-wise tuning on large models. The method also benefits from complementary integrations with normalization fine-tuning and shows robust applicability across diverse domains, suggesting strong practical impact for scalable deployment of large pretrained models. Overall, 1LoRA demonstrates that fixed, interpretable input compression combined with a tiny trainable vector can match or exceed more expressive low-rank methods while dramatically reducing resource demands.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods have transformed the approach to fine-tuning large models for downstream tasks by enabling the adjustment of significantly fewer parameters than those in the original model matrices. In this work, we study the "very low rank regime", where we fine-tune the lowest amount of parameters per linear layer for each considered PEFT method. We propose 1LoRA (Summation Low-Rank Adaptation), a compute, parameter and memory efficient fine-tuning method which uses the feature sum as fixed compression and a single trainable vector as decompression. Differently from state-of-the-art PEFT methods like LoRA, VeRA, and the recent MoRA, 1LoRA uses fewer parameters per layer, reducing the memory footprint and the computational cost. We extensively evaluate our method against state-of-the-art PEFT methods on multiple fine-tuning tasks, and show that our method not only outperforms them, but is also more parameter, memory and computationally efficient. Moreover, thanks to its memory efficiency, 1LoRA allows to fine-tune more evenly across layers, instead of focusing on specific ones (e.g. attention layers), improving performance further.

1LoRA: Summation Compression for Very Low-Rank Adaptation

TL;DR

1LoRA introduces a summation-based, extremely memory-efficient fine-tuning approach for very low-rank adaptation, reducing trainable parameters per layer to by using a fixed input compression (the feature sum) and a single trainable decompression vector. It outperforms state-of-the-art PEFT baselines across depth estimation, vision-language tasks, diffusion-model fine-tuning, and image classification while offering favorable memory and compute profiles that enable broader, layer-wise tuning on large models. The method also benefits from complementary integrations with normalization fine-tuning and shows robust applicability across diverse domains, suggesting strong practical impact for scalable deployment of large pretrained models. Overall, 1LoRA demonstrates that fixed, interpretable input compression combined with a tiny trainable vector can match or exceed more expressive low-rank methods while dramatically reducing resource demands.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods have transformed the approach to fine-tuning large models for downstream tasks by enabling the adjustment of significantly fewer parameters than those in the original model matrices. In this work, we study the "very low rank regime", where we fine-tune the lowest amount of parameters per linear layer for each considered PEFT method. We propose 1LoRA (Summation Low-Rank Adaptation), a compute, parameter and memory efficient fine-tuning method which uses the feature sum as fixed compression and a single trainable vector as decompression. Differently from state-of-the-art PEFT methods like LoRA, VeRA, and the recent MoRA, 1LoRA uses fewer parameters per layer, reducing the memory footprint and the computational cost. We extensively evaluate our method against state-of-the-art PEFT methods on multiple fine-tuning tasks, and show that our method not only outperforms them, but is also more parameter, memory and computationally efficient. Moreover, thanks to its memory efficiency, 1LoRA allows to fine-tune more evenly across layers, instead of focusing on specific ones (e.g. attention layers), improving performance further.

Paper Structure

This paper contains 29 sections, 3 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Comparing our method 1LoRA to LoRA. Left: LoRA learns the low-rank decomposition $\Delta W = BA$, where $A\in \mathbb{R}^{r \times k}$ and $B\in \mathbb{R}^{d \times r}$. Right: 1LoRA replaces the matrices A and B with a sum over the input features $x$ as compression and a learnable vector $b\in \mathbb{R}^{1 \times d}$ as decompression: $\Delta W = b\mathds{1}^T$, where the feature sum is $\mathds{1}^Tx = \sum_{i=1}^k x_i$, with $\mathds{1}$ being a vector of length $k$ containing only ones. This reduces the trainable parameters from $r \times k + d \times r$ in LoRA to $d$ per layer for 1LoRA.
  • Figure 2: RMSE ($\downarrow$) of pre-trained DepthAnything model fine-tuned using PEFT methods. Bubble size is proportional to the number of parameters, except for "All", which is capped due to space limitations. Bottom left (and smallest bubble) is better.
  • Figure 3: Validation loss of pre-trained LLaMA-2 7b fine-tuned to Meta-Math. Bubble size is proportional to the number of parameters. Bottom left (and smallest bubble) is better.
  • Figure 4: Validation loss of pre-trained LLaMA-2 13b fine-tuned to Meta-Math. Missing competitors cannot be fine-tuned with the given memory budget. Bubble size is proportional to the number of parameters. Bottom left (and smallest bubble) is better.
  • Figure 5: FID of pretrained DiT fine-tuned to Food-101. Note that DoRA is missing as it cannot be fine-tuned with the given memory budget. Bubble size is proportional to the number of parameters. Bottom left (and smallest bubble) is better.
  • ...and 6 more figures