Table of Contents
Fetching ...

Optimizing Fine-Tuning through Advanced Initialization Strategies for Low-Rank Adaptation

Yongfu Xue

TL;DR

The paper addresses the initialization bottleneck in LoRA-based parameter-efficient fine-tuning by introducing IniLoRA, which optimizes a low-rank decomposition BA to approximate the original weight matrix W0 and fixes the residual during training. It further explores two initialization variants, IniLoRA-α and IniLoRA-β, to expand the initialization design space. Through weight-approximation experiments and extensive NLU/NLG benchmarks, IniLoRA demonstrates consistent improvements over LoRA and other PEFT methods, with the α and β variants often yielding the best results. The work provides practical insights into how initialization strategy and weight-approximation quality affect convergence, scalability, and robustness in PEFT for large language models.

Abstract

The rapid development of parameter-efficient fine-tuning methods has noticeably improved the efficiency of adapting large language models. Among these, LoRA has gained widespread popularity due to its strong balance of effectiveness and parameter efficiency. However, LoRA relies on initializing two low-rank matrices whose product is zero, which limits its ability to effectively activate and leverage the original model weights-creating a potential bottleneck for optimal performance. To address this limitation, we propose \textbf{IniLoRA}, a novel initialization strategy that initializes the low-rank matrices to closely approximate the original model weights. Experimental results indicate that IniLoRA achieves better performance than LoRA across a range of models and tasks. Additionally, we introduce two variants, IniLoRA-$α$ and IniLoRA-$β$, both leveraging distinct initialization methods to enhance performance further.

Optimizing Fine-Tuning through Advanced Initialization Strategies for Low-Rank Adaptation

TL;DR

The paper addresses the initialization bottleneck in LoRA-based parameter-efficient fine-tuning by introducing IniLoRA, which optimizes a low-rank decomposition BA to approximate the original weight matrix W0 and fixes the residual during training. It further explores two initialization variants, IniLoRA-α and IniLoRA-β, to expand the initialization design space. Through weight-approximation experiments and extensive NLU/NLG benchmarks, IniLoRA demonstrates consistent improvements over LoRA and other PEFT methods, with the α and β variants often yielding the best results. The work provides practical insights into how initialization strategy and weight-approximation quality affect convergence, scalability, and robustness in PEFT for large language models.

Abstract

The rapid development of parameter-efficient fine-tuning methods has noticeably improved the efficiency of adapting large language models. Among these, LoRA has gained widespread popularity due to its strong balance of effectiveness and parameter efficiency. However, LoRA relies on initializing two low-rank matrices whose product is zero, which limits its ability to effectively activate and leverage the original model weights-creating a potential bottleneck for optimal performance. To address this limitation, we propose \textbf{IniLoRA}, a novel initialization strategy that initializes the low-rank matrices to closely approximate the original model weights. Experimental results indicate that IniLoRA achieves better performance than LoRA across a range of models and tasks. Additionally, we introduce two variants, IniLoRA- and IniLoRA-, both leveraging distinct initialization methods to enhance performance further.

Paper Structure

This paper contains 18 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Our IniLoRA method and its two variants extend the LoRA framework by introducing an advanced initialization for the low-rank matrices $A$ and $B$.
  • Figure 2: Correlation between performance and the degree of weight approximation in IniLoRA fine-tuning.
  • Figure 3: Impact of increasing Gaussian initialization standard deviation on model performance for LLaMA2-7B.
  • Figure 4: Performance comparison of different weight initialization methods, highlighting the consistent superior performance of Kaiming initialization.
  • Figure 5: Comparison of training loss between IniLoRA and LoRA at various ranks, demonstrating faster convergence and lower final loss for IniLoRA.
  • ...and 1 more figures