Table of Contents
Fetching ...

$D^2LoRA$: Data-Driven LoRA Initialization for Low Resource Tasks

Javad SeraJ, Mohammad Mahdi Mohajeri, Mohammad Javad Dousti

TL;DR

This work tackles data-scarce fine-tuning of large language models by introducing D2LoRA, a two-phase LoRA initialization that uses a warm-up on high-quality general data before task-specific adaptation. The method is mathematically defined with $W_{ ext{final}} = W_{ ext{base}} + oldsymbol{ riangle W}_{ ext{LoRA}}$ where $oldsymbol{ riangle W}_{ ext{LoRA}} = extbf{A} extbf{B}$ and reveals that $ ext{Perf}(W_{ ext{D2LoRA}}(m,n),t)$ surpasses vanilla LoRA in low-data regimes, with gains diminishing as in-domain data increases. Empirically, D2LoRA yields a 1% improvement on GSM8K and about a 2-point ROUGE gain for title generation, while also reducing training cost and alleviating data demands; it maintains competitive performance without increasing catastrophic forgetting. The approach supports efficient, multi-task adaptation under limited data, offering a practical path to scale PEFT methods across diverse tasks with reduced data requirements.

Abstract

Tuning large language models is essential for optimizing their performance across diverse applications, particularly in scenarios with limited data availability. Tuning large language models in scarce data scenarios is crucial, particularly given that the convergence speed of the LoRA method is lower than that of full fine-tuning. In this paper, we present an analysis of post-training methods including Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Odds Ratio Preference Optimization (ORPO) within the context of task-specific learning using the LoRA method. Next we introduce $D^2LoRA$, a data-driven approach for initializing LoRA metrics that enhances training efficiency, especially in limited-data settings. Our experiments compare $D^2LoRA$ with vanilla LoRA in terms of performance and catastrophic forgetting under extremely data-constrained conditions. The results demonstrate that $D^2LoRA$ achieves a 1% improvement GSM8K benchmark and a 2-point improvement in ROUGE score in title generation tasks. $D^2LoRA$ facilitates the adaptation of LLMs to multiple tasks even when task-specific data is scarce, thereby reducing training expenses and offering data cost.

$D^2LoRA$: Data-Driven LoRA Initialization for Low Resource Tasks

TL;DR

This work tackles data-scarce fine-tuning of large language models by introducing D2LoRA, a two-phase LoRA initialization that uses a warm-up on high-quality general data before task-specific adaptation. The method is mathematically defined with where and reveals that surpasses vanilla LoRA in low-data regimes, with gains diminishing as in-domain data increases. Empirically, D2LoRA yields a 1% improvement on GSM8K and about a 2-point ROUGE gain for title generation, while also reducing training cost and alleviating data demands; it maintains competitive performance without increasing catastrophic forgetting. The approach supports efficient, multi-task adaptation under limited data, offering a practical path to scale PEFT methods across diverse tasks with reduced data requirements.

Abstract

Tuning large language models is essential for optimizing their performance across diverse applications, particularly in scenarios with limited data availability. Tuning large language models in scarce data scenarios is crucial, particularly given that the convergence speed of the LoRA method is lower than that of full fine-tuning. In this paper, we present an analysis of post-training methods including Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Odds Ratio Preference Optimization (ORPO) within the context of task-specific learning using the LoRA method. Next we introduce , a data-driven approach for initializing LoRA metrics that enhances training efficiency, especially in limited-data settings. Our experiments compare with vanilla LoRA in terms of performance and catastrophic forgetting under extremely data-constrained conditions. The results demonstrate that achieves a 1% improvement GSM8K benchmark and a 2-point improvement in ROUGE score in title generation tasks. facilitates the adaptation of LLMs to multiple tasks even when task-specific data is scarce, thereby reducing training expenses and offering data cost.

Paper Structure

This paper contains 20 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Performance comparison of three training methods (SFT, DPO, ORPO) on the GSM8K benchmark
  • Figure 2: Comparison of D2LoRA and vanilla LoRA across different training methods (SFT, DPO, ORPO) on the GSM8K benchmark.
  • Figure 3: Impact of task-specific training on general knowledge and reasoning accuracy on the ARC Benchmark. As task-specific training data increases, the original model's general reasoning ability declines.
  • Figure 4: Comparison of accuracy between vanilla LoRA and D2LoRA on the GSM8K benchmark in a data-constrained setting, with training on 100, 200, 500, and 1000 samples.