Table of Contents
Fetching ...

LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horvath

TL;DR

LoFT addresses a critical gap in parameter-efficient fine-tuning by aligning both gradient updates and optimizer dynamics within a low-rank subspace, effectively mirroring full fine-tuning. It introduces alternating updates, gradient scaling, optimizer-state calibration, projected full updates, and gradient clipping to ensure the low-rank adaptation tracks full-model optimization; in the full-rank limit it exactly recovers AdamW. The approach narrows the performance gap to full fine-tuning across language and vision tasks and remains robust at very low ranks, all without extra inference cost or hyperparameter tuning. Empirical results on LLaMA variants and ViT-Base demonstrate that LoFT often matches or surpasses full fine-tuning and standard LoRA/DoRA baselines while maintaining memory and compute efficiency, validating its practical impact for scalable, parameter-efficient model adaptation.

Abstract

Large pre-trained models are commonly adapted to downstream tasks using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), which injects small trainable low-rank matrices instead of updating all weights. While LoRA dramatically reduces trainable parameters with little overhead, it can still underperform full fine-tuning in accuracy and often converges more slowly. We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer's internal dynamics with those of updating all model weights. LoFT not only learns weight updates in a low-rank subspace (like LoRA) but also properly projects the optimizer's first and second moments (Adam's momentum and variance) into the same subspace, mirroring full-model updates. By aligning the low-rank update itself with the full update, LoFT eliminates the need for tuning extra hyperparameters, e.g., LoRA scaling factor $α$. Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning and consistently outperforms standard LoRA-style methods, all without increasing inference cost.

LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

TL;DR

LoFT addresses a critical gap in parameter-efficient fine-tuning by aligning both gradient updates and optimizer dynamics within a low-rank subspace, effectively mirroring full fine-tuning. It introduces alternating updates, gradient scaling, optimizer-state calibration, projected full updates, and gradient clipping to ensure the low-rank adaptation tracks full-model optimization; in the full-rank limit it exactly recovers AdamW. The approach narrows the performance gap to full fine-tuning across language and vision tasks and remains robust at very low ranks, all without extra inference cost or hyperparameter tuning. Empirical results on LLaMA variants and ViT-Base demonstrate that LoFT often matches or surpasses full fine-tuning and standard LoRA/DoRA baselines while maintaining memory and compute efficiency, validating its practical impact for scalable, parameter-efficient model adaptation.

Abstract

Large pre-trained models are commonly adapted to downstream tasks using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), which injects small trainable low-rank matrices instead of updating all weights. While LoRA dramatically reduces trainable parameters with little overhead, it can still underperform full fine-tuning in accuracy and often converges more slowly. We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer's internal dynamics with those of updating all model weights. LoFT not only learns weight updates in a low-rank subspace (like LoRA) but also properly projects the optimizer's first and second moments (Adam's momentum and variance) into the same subspace, mirroring full-model updates. By aligning the low-rank update itself with the full update, LoFT eliminates the need for tuning extra hyperparameters, e.g., LoRA scaling factor . Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning and consistently outperforms standard LoRA-style methods, all without increasing inference cost.

Paper Structure

This paper contains 43 sections, 2 theorems, 28 equations, 11 figures, 11 tables, 1 algorithm.

Key Result

Lemma 1

Let $U_0 = \tilde{U}_r X_0$ and $V_0 = \tilde{V}_r Y_0$, where $X_0, Y_0 \in \mathbb{R}^{r \times r}$ are full rank matrices. Then, LoFT-GD with momentum applied to the matrix factorization problem exactly recovers GD with momentum applied to $f(W) = \frac{1}{2}\|W - A\|_F^2$ initialized at $W_0 =

Figures (11)

  • Figure 1: LoFT visualization. LoFT can be interpreted as the tightest approximation to full fine-tuning under the constraint that each update lies in the subspace defined by $V$ (when updating $U$). The LoFT-AdamW update consists of a momentum and second-moment estimate constructed using projected gradients. The final update is then projected back onto the subspace of $V$ to respect the low-rank constraint. When $V$ is the updated component instead of $U$, the roles of $U$ and $V$ are simply exchanged, and the update is applied to $W^\top$ instead of $W$.
  • Figure 2: Comparison of LoRA, LoFT, and Full Fine-tuning with Adam on $f(W) = \|W - A\|_F^2$.
  • Figure 3: Rank-wise comparison of LoFT against LoRA (left) and DoRA (right) on LLaMA-7B across commonsense reasoning tasks. LoFT maintains significantly higher accuracy, especially at low ranks. Percentage gains denote improvement of LoFT over the respective baseline at each rank.
  • Figure 4: Task-wise performance comparison across LoRA (green), DoRA (red), and LoFT (blue) at lower ranks $(r=\{4,2,1\})$ on LLaMA-7B. LoFT maintains high performance across all tasks, even under extreme compression, unlike baselines that degrade sharply on several benchmarks.
  • Figure 5: Training loss (log-scale) on HAM10000.
  • ...and 6 more figures

Theorems & Definitions (10)

  • remark 1
  • remark 2
  • remark 3
  • remark 4
  • remark 5
  • remark 6
  • Lemma 1
  • proof
  • Lemma 2
  • proof