OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation
Jialong Han, Si Zhang, Ke Zhang
TL;DR
OSoRA introduces a SVD-based low-rank adaptation for fine-tuning large language models by updating only the top-$r$ singular values and an output-dimension vector while freezing the corresponding singular vectors. With a trainable parameter budget of $(r+d)$, it achieves parameter efficiency competitive with VeRA and far surpasses LoRA in parameter scaling, while leveraging informed initialization from pretrained weights. Empirical results across common sense reasoning and mathematics tasks demonstrate that OSoRA attains comparable or superior performance to state-of-the-art PEFT methods, and ablation studies confirm that jointly training $S_r$ and $O$ is crucial for optimal adaptation. The method merges into the base weights without additional inference cost, offering a practical path to efficiently fine-tune very large models under limited compute, albeit with SVD computation and fixed subspace considerations as limitations.
Abstract
Fine-tuning Large Language Models (LLMs) has become increasingly challenging due to their massive scale and associated computational costs. Parameter-Efficient Fine-Tuning (PEFT) methodologies have been proposed as computational alternatives; however, their implementations still require significant resources. In this paper, we present OSoRA (Output-Dimension and Singular-Value Initialized Low-Rank Adaptation), a novel PEFT method for LLMs. OSoRA extends Low-Rank Adaptation (LoRA) by integrating Singular Value Decomposition (SVD) with learnable scaling vectors in a unified framework. It first performs an SVD of pre-trained weight matrices, then optimizes an output-dimension vector during training, while keeping the corresponding singular vector matrices frozen. OSoRA substantially reduces computational resource requirements by minimizing the number of trainable parameters during fine-tuning. Comprehensive evaluations across mathematical reasoning, common sense reasoning, and other benchmarks demonstrate that OSoRA achieves comparable or superior performance to state-of-the-art methods like LoRA and VeRA, while maintaining a linear parameter scaling even as the rank increases to higher dimensions. Our ablation studies further confirm that jointly training both the singular values and the output-dimension vector is critical for optimal performance.
