Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
Kaustubh Ponkshe, Raghav Singhal, Eduard Gorbunov, Alexey Tumanov, Samuel Horvath, Praneeth Vepakomma
TL;DR
LoRA-SB introduces a principled initialization strategy to simulate full fine-tuning within low-rank subspaces, addressing core limitations of LoRA-XS by ensuring optimal gradient approximation and removing scaling-factor tuning. The method derives a closed-form solution for the update in the LoRA-XS subspace and initializes the low-rank factors using a truncated SVD of the first FT step, yielding orthonormal bases that preserve update directions and guarantee loss reduction. Empirically, LoRA-SB outperforms LoRA and LoRA-XS across arithmetic, commonsense, and natural language understanding tasks while using 27–90x fewer trainable parameters and incurring negligible initialization overhead. This approach significantly enhances parameter efficiency in PEFT, enabling near-full FT performance with substantial computational savings and practical deployment benefits.
Abstract
Low-rank adapters have become standard for efficiently fine-tuning large language models, but they often fall short of achieving the performance of full fine-tuning. We propose a method, LoRA Silver Bullet or LoRA-SB, that approximates full fine-tuning within low-rank subspaces using a carefully designed initialization strategy. We theoretically demonstrate that the architecture of LoRA-XS, which inserts a learnable r x r matrix between B and A while keeping other matrices fixed, provides the precise conditions needed for this approximation. We leverage its constrained update space to achieve optimal scaling for high-rank gradient updates while removing the need for scaling factor tuning. We prove that our initialization offers an optimal low-rank approximation of the initial gradient and preserves update directions throughout training. Extensive experiments across mathematical reasoning, commonsense reasoning, and language understanding tasks demonstrate that our approach exceeds the performance of LoRA (and baselines) while using 27-90 times fewer learnable parameters, and comprehensively outperforms LoRA-XS. Our findings establish that it is possible to simulate full fine-tuning in low-rank subspaces, and achieve significant parameter efficiency gains without sacrificing performance. Our code is publicly available at: https://github.com/CERT-Lab/lora-sb.
