Table of Contents
Fetching ...

Dual Decomposition of Weights and Singular Value Low Rank Adaptation

Jialong Han, Si Zhang, Ke Zhang

TL;DR

DuDe presents a principled PEFT method that couples dual weight decomposition with SVD-based initialization to address training instability and inefficient knowledge transfer in LoRA-like adaptations. By splitting weights into frozen and trainable components and initializing the update with top-SVD factors, DuDe achieves stable gradients and better knowledge preservation, yielding superior performance across commonsense reasoning and domain-knowledge benchmarks, as well as improved robustness to seeds and rank settings. Theoretical gradient analyses and extensive experiments substantiate the approach, showing strong results such as up to 48.35% average MMLU accuracy and 62.53% ±1.59 GSM8K accuracy, highlighting practical impact for efficient and reliable LLM adaptation. Limitations include added initialization overhead and higher memory usage, with future work aiming to extend to more architectures and multi-task scenarios while deepening the theoretical understanding of initialization strategies.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a critical paradigm for adapting Large Language Models (LLMs) to downstream tasks, among which Low-rank Adaptation (LoRA) represents one of the most widely adopted methodologies. However, existing LoRA-based approaches exhibit two fundamental limitations: unstable training dynamics and inefficient knowledge transfer from pre-trained models, both stemming from random initialization of adapter parameters. To overcome these challenges, we propose DuDe, a novel approach that decomposes weight matrices into magnitude and direction components, employing Singular Value Decomposition (SVD) for principled initialization. Our comprehensive evaluation demonstrates DuDe's superior performance and robustness, achieving up to 48.35\% accuracy on MMLU and 62.53\% ($\pm$ 1.59) accuracy on GSM8K. Our theoretical analysis and empirical validation collectively demonstrate that DuDe's decomposition strategy enhances optimization stability and better preserves pre-trained representations, particularly for domain-specific tasks requiring specialized knowledge. The combination of robust empirical performance and rigorous theoretical foundations establishes DuDe as a significant contribution to PEFT methodologies for LLMs.

Dual Decomposition of Weights and Singular Value Low Rank Adaptation

TL;DR

DuDe presents a principled PEFT method that couples dual weight decomposition with SVD-based initialization to address training instability and inefficient knowledge transfer in LoRA-like adaptations. By splitting weights into frozen and trainable components and initializing the update with top-SVD factors, DuDe achieves stable gradients and better knowledge preservation, yielding superior performance across commonsense reasoning and domain-knowledge benchmarks, as well as improved robustness to seeds and rank settings. Theoretical gradient analyses and extensive experiments substantiate the approach, showing strong results such as up to 48.35% average MMLU accuracy and 62.53% ±1.59 GSM8K accuracy, highlighting practical impact for efficient and reliable LLM adaptation. Limitations include added initialization overhead and higher memory usage, with future work aiming to extend to more architectures and multi-task scenarios while deepening the theoretical understanding of initialization strategies.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a critical paradigm for adapting Large Language Models (LLMs) to downstream tasks, among which Low-rank Adaptation (LoRA) represents one of the most widely adopted methodologies. However, existing LoRA-based approaches exhibit two fundamental limitations: unstable training dynamics and inefficient knowledge transfer from pre-trained models, both stemming from random initialization of adapter parameters. To overcome these challenges, we propose DuDe, a novel approach that decomposes weight matrices into magnitude and direction components, employing Singular Value Decomposition (SVD) for principled initialization. Our comprehensive evaluation demonstrates DuDe's superior performance and robustness, achieving up to 48.35\% accuracy on MMLU and 62.53\% ( 1.59) accuracy on GSM8K. Our theoretical analysis and empirical validation collectively demonstrate that DuDe's decomposition strategy enhances optimization stability and better preserves pre-trained representations, particularly for domain-specific tasks requiring specialized knowledge. The combination of robust empirical performance and rigorous theoretical foundations establishes DuDe as a significant contribution to PEFT methodologies for LLMs.

Paper Structure

This paper contains 23 sections, 11 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The blue parts in the figure represent frozen components, while the orange parts represent trainable components. (a) shows the diagrams of LoRA and PiSSA. The difference between them is that LoRA initializes matrix $B\in\mathbb{R}^{d\times r}$ to 0 and matrix $A\in\mathbb{R}^{r\times d}$ to Kaiming uniform distribution, while PiSSA first performs SVD on matrix $W_0$ to obtain $W_0=U\Sigma V^\top$, then sets $B=U_r\sqrt{\Sigma_r}$, $A=\sqrt{\Sigma_r}V_r^\top$, and $W_0=W_0-BA$. (b) shows the diagrams of DoRA and DuDe. $m\in\mathbb{R}^{k}$ is the magnitude vector. For the direction matrix, DoRA initializes matrices $B$ and $A$ in the same way as LoRA, while DuDe initializes matrices $B$ and $A$ in the same way as PiSSA.
  • Figure 2: Comparison of Full finetuning, DuDe and other PEFT methods on Mistral 7B v0.2 model: (a) Training loss, (b) Gradient norm during training on MetaMathQA-395K dataset for 3 epochs, and (c) Evaluation accuracy on GSM8K dataset measured every 200 steps over 3000 total training steps.
  • Figure 3: Performance comparison between LoRA and DuDe on MMLU tasks with varying rank settings. (a) Average accuracy across all MMLU categories shows DuDe consistently outperforming LoRA, especially at larger ranks. (b) Weighted average accuracy demonstrates similar trends, with DuDe maintaining superior performance across all rank configurations.
  • Figure 4: Average accuracy of DuDe and LoRA on MMLU tasks with different seeds.