The Primacy of Magnitude in Low-Rank Adaptation

Zicheng Zhang; Haoran Li; Yifeng Zhang; Guoqiang Gong; Jiaxing Wang; Junxing Hu; Pengzhang Liu; Qixia Jiang

The Primacy of Magnitude in Low-Rank Adaptation

Zicheng Zhang, Haoran Li, Yifeng Zhang, Guoqiang Gong, Jiaxing Wang, Junxing Hu, Pengzhang Liu, Qixia Jiang

TL;DR

This work reframes LoRA training dynamics around the update-magnitude of weight changes, showing that magnitude controls convergence and expressiveness. It proves that low-rank structure inherently limits update magnitudes and that spectral initializations boost performance primarily by amplifying updates, not by embedding knowledge. To preserve efficiency while achieving spectral gains, the authors propose LoRAM, a magnitude-driven initialization using deterministic orthogonal bases (DST) scaled by pretrained weight statistics, eliminating SVD overhead. Empirical results across NLP and vision-language benchmarks demonstrate that LoRAM matches or surpasses spectral methods while maintaining LoRA’s parameter, memory, and compute efficiency. This introduces a unifying perspective that connects learning-rate, scaling, and initialization through the lens of update magnitude, with practical implications for robust, scalable PEFT deployment.

Abstract

Low-Rank Adaptation (LoRA) offers a parameter-efficient paradigm for tuning large models. While recent spectral initialization methods improve convergence and performance over the naive "Noise & Zeros" scheme, their extra computational and storage overhead undermines efficiency. In this paper, we establish update magnitude as the fundamental driver of LoRA performance and propose LoRAM, a magnitude-driven "Basis & Basis" initialization scheme that matches spectral methods without their inefficiencies. Our key contributions are threefold: (i) Magnitude of weight updates determines convergence. We prove low-rank structures intrinsically bound update magnitudes, unifying hyperparameter tuning in learning rate, scaling factor, and initialization as mechanisms to optimize magnitude regulation. (ii) Spectral initialization succeeds via magnitude amplification. We demystify that the presumed knowledge-driven benefit of the spectral component essentially arises from the boost in the weight update magnitude. (iii) A novel and compact initialization strategy, LoRAM, scales deterministic orthogonal bases using pretrained weight magnitudes to simulate spectral gains. Extensive experiments show that LoRAM serves as a strong baseline, retaining the full efficiency of LoRA while matching or outperforming spectral initialization across benchmarks.

The Primacy of Magnitude in Low-Rank Adaptation

TL;DR

Abstract

The Primacy of Magnitude in Low-Rank Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)