LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horvath
TL;DR
LoFT addresses a critical gap in parameter-efficient fine-tuning by aligning both gradient updates and optimizer dynamics within a low-rank subspace, effectively mirroring full fine-tuning. It introduces alternating updates, gradient scaling, optimizer-state calibration, projected full updates, and gradient clipping to ensure the low-rank adaptation tracks full-model optimization; in the full-rank limit it exactly recovers AdamW. The approach narrows the performance gap to full fine-tuning across language and vision tasks and remains robust at very low ranks, all without extra inference cost or hyperparameter tuning. Empirical results on LLaMA variants and ViT-Base demonstrate that LoFT often matches or surpasses full fine-tuning and standard LoRA/DoRA baselines while maintaining memory and compute efficiency, validating its practical impact for scalable, parameter-efficient model adaptation.
Abstract
Large pre-trained models are commonly adapted to downstream tasks using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), which injects small trainable low-rank matrices instead of updating all weights. While LoRA dramatically reduces trainable parameters with little overhead, it can still underperform full fine-tuning in accuracy and often converges more slowly. We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer's internal dynamics with those of updating all model weights. LoFT not only learns weight updates in a low-rank subspace (like LoRA) but also properly projects the optimizer's first and second moments (Adam's momentum and variance) into the same subspace, mirroring full-model updates. By aligning the low-rank update itself with the full update, LoFT eliminates the need for tuning extra hyperparameters, e.g., LoRA scaling factor $α$. Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning and consistently outperforms standard LoRA-style methods, all without increasing inference cost.
