Table of Contents
Fetching ...

OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation

Jialong Han, Si Zhang, Ke Zhang

TL;DR

OSoRA introduces a SVD-based low-rank adaptation for fine-tuning large language models by updating only the top-$r$ singular values and an output-dimension vector while freezing the corresponding singular vectors. With a trainable parameter budget of $(r+d)$, it achieves parameter efficiency competitive with VeRA and far surpasses LoRA in parameter scaling, while leveraging informed initialization from pretrained weights. Empirical results across common sense reasoning and mathematics tasks demonstrate that OSoRA attains comparable or superior performance to state-of-the-art PEFT methods, and ablation studies confirm that jointly training $S_r$ and $O$ is crucial for optimal adaptation. The method merges into the base weights without additional inference cost, offering a practical path to efficiently fine-tune very large models under limited compute, albeit with SVD computation and fixed subspace considerations as limitations.

Abstract

Fine-tuning Large Language Models (LLMs) has become increasingly challenging due to their massive scale and associated computational costs. Parameter-Efficient Fine-Tuning (PEFT) methodologies have been proposed as computational alternatives; however, their implementations still require significant resources. In this paper, we present OSoRA (Output-Dimension and Singular-Value Initialized Low-Rank Adaptation), a novel PEFT method for LLMs. OSoRA extends Low-Rank Adaptation (LoRA) by integrating Singular Value Decomposition (SVD) with learnable scaling vectors in a unified framework. It first performs an SVD of pre-trained weight matrices, then optimizes an output-dimension vector during training, while keeping the corresponding singular vector matrices frozen. OSoRA substantially reduces computational resource requirements by minimizing the number of trainable parameters during fine-tuning. Comprehensive evaluations across mathematical reasoning, common sense reasoning, and other benchmarks demonstrate that OSoRA achieves comparable or superior performance to state-of-the-art methods like LoRA and VeRA, while maintaining a linear parameter scaling even as the rank increases to higher dimensions. Our ablation studies further confirm that jointly training both the singular values and the output-dimension vector is critical for optimal performance.

OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation

TL;DR

OSoRA introduces a SVD-based low-rank adaptation for fine-tuning large language models by updating only the top- singular values and an output-dimension vector while freezing the corresponding singular vectors. With a trainable parameter budget of , it achieves parameter efficiency competitive with VeRA and far surpasses LoRA in parameter scaling, while leveraging informed initialization from pretrained weights. Empirical results across common sense reasoning and mathematics tasks demonstrate that OSoRA attains comparable or superior performance to state-of-the-art PEFT methods, and ablation studies confirm that jointly training and is crucial for optimal adaptation. The method merges into the base weights without additional inference cost, offering a practical path to efficiently fine-tune very large models under limited compute, albeit with SVD computation and fixed subspace considerations as limitations.

Abstract

Fine-tuning Large Language Models (LLMs) has become increasingly challenging due to their massive scale and associated computational costs. Parameter-Efficient Fine-Tuning (PEFT) methodologies have been proposed as computational alternatives; however, their implementations still require significant resources. In this paper, we present OSoRA (Output-Dimension and Singular-Value Initialized Low-Rank Adaptation), a novel PEFT method for LLMs. OSoRA extends Low-Rank Adaptation (LoRA) by integrating Singular Value Decomposition (SVD) with learnable scaling vectors in a unified framework. It first performs an SVD of pre-trained weight matrices, then optimizes an output-dimension vector during training, while keeping the corresponding singular vector matrices frozen. OSoRA substantially reduces computational resource requirements by minimizing the number of trainable parameters during fine-tuning. Comprehensive evaluations across mathematical reasoning, common sense reasoning, and other benchmarks demonstrate that OSoRA achieves comparable or superior performance to state-of-the-art methods like LoRA and VeRA, while maintaining a linear parameter scaling even as the rank increases to higher dimensions. Our ablation studies further confirm that jointly training both the singular values and the output-dimension vector is critical for optimal performance.

Paper Structure

This paper contains 26 sections, 12 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Parameter count comparison among adaptation methods at varying ranks on Qwen2-7B model. The results demonstrate that LoRA exhibits exponential growth in trainable parameters with increasing rank, whereas both VeRA and OSoRA maintain efficient linear scaling in their parameter count.
  • Figure 2: Schematic comparison of LoRA (left), VeRA (middle) and OSoRA (right). LoRA adapts pretrained weights $W_0\in\mathbb{R}^{d\times k}$ by training low-rank matrices $A\in\mathbb{R}^{r\times k}$ and $B\in\mathbb{R}^{d\times r}$. VeRA keeps these matrices frozen but introduces learnable scaling vectors $d\in\mathbb{R}^{r}$ and $b\in\mathbb{R}^{d}$. OSoRA applies SVD to decompose $W_0$ into singular vectors $U_r\in\mathbb{R}^{d\times r}$ and $V_r\in\mathbb{R}^{k\times r}$ with corresponding singular values $S_r\in\mathbb{R}^{r}$. During fine-tuning, only $S_r$ and a learnable all-ones vector $O\in\mathbb{R}^{d}$ are updated, while the singular vector matrices remain fixed.
  • Figure 3: Ablation study on the impact of training different components in OSoRA. The figure compares accuracy on mathematical tasks (MATH and GSM8K) across three variants: standard OSoRA with both $S_r$ and $O$ trained, OSoRA$^*$ with only $O$ trained (fixed $S_r$), and OSoRA$^{**}$ with only $S_r$ trained (fixed $O$). The results highlight that joint training of both components achieves the best performance, while fixing the output dimension vector $O$ leads to the largest degradation in model accuracy.