Table of Contents
Fetching ...

Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning

Chongjie Si, Zhiyi Shi, Shifan Zhang, Xiaokang Yang, Hanspeter Pfister, Wei Shen

TL;DR

The paper tackles the resource-intensive nature of fine-tuning large language models by refining Parameter Efficient Fine-Tuning (PEFT) through a formal notion of Task-specific directions (TSDs). It introduces a rigorous framework for defining TSDs, analyzes their properties, and develops three methods—LoRA-Dash, LoRA-Init, and LoRA-TSD—that exploit TSDs during LoRA-based fine-tuning. Across commonsense reasoning, natural language understanding, and subject-driven generation, the proposed methods consistently improve performance, often surpassing full fine-tuning under limited parameter budgets and showing strong transfer to other PEFT approaches. The work advances practical PEFT by offering a principled, task-aware use of directional changes in weight space, with broad implications for efficient adaptation of LLMs.

Abstract

Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions (TSDs), which are critical for transitioning large models from pretrained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks. Additionally, based on our exploration of TSD, we focus on an important issue in PEFT: the initialization of LoRA. While some works have pointed out the significance of initialization for LoRA's performance and proposed various strategies, these methods are often empirical and not task-specific. To address this issue, we propose LoRA-Init. Starting from TSD, we identify the directions that require the most adjustment during fine-tuning for downstream tasks. By initializing the matrices in LoRA with these directions, LoRA-Init significantly enhances LoRA's performance. Moreover, we can combine LoRA-Dash and LoRA-Init to create the final version of LoRA based on TSDs, which we refer to as LoRA-TSD. Extensive experiments have conclusively demonstrated the effectiveness of these methods, and in-depth analyses further reveal the underlying mechanisms behind their success.

Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning

TL;DR

The paper tackles the resource-intensive nature of fine-tuning large language models by refining Parameter Efficient Fine-Tuning (PEFT) through a formal notion of Task-specific directions (TSDs). It introduces a rigorous framework for defining TSDs, analyzes their properties, and develops three methods—LoRA-Dash, LoRA-Init, and LoRA-TSD—that exploit TSDs during LoRA-based fine-tuning. Across commonsense reasoning, natural language understanding, and subject-driven generation, the proposed methods consistently improve performance, often surpassing full fine-tuning under limited parameter budgets and showing strong transfer to other PEFT approaches. The work advances practical PEFT by offering a principled, task-aware use of directional changes in weight space, with broad implications for efficient adaptation of LLMs.

Abstract

Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions (TSDs), which are critical for transitioning large models from pretrained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks. Additionally, based on our exploration of TSD, we focus on an important issue in PEFT: the initialization of LoRA. While some works have pointed out the significance of initialization for LoRA's performance and proposed various strategies, these methods are often empirical and not task-specific. To address this issue, we propose LoRA-Init. Starting from TSD, we identify the directions that require the most adjustment during fine-tuning for downstream tasks. By initializing the matrices in LoRA with these directions, LoRA-Init significantly enhances LoRA's performance. Moreover, we can combine LoRA-Dash and LoRA-Init to create the final version of LoRA based on TSDs, which we refer to as LoRA-TSD. Extensive experiments have conclusively demonstrated the effectiveness of these methods, and in-depth analyses further reveal the underlying mechanisms behind their success.
Paper Structure (38 sections, 10 equations, 9 figures, 12 tables)

This paper contains 38 sections, 10 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Left: The change rates of $\mathbf{W}$’s core bases, based on $\mathbf{W}^*$, vary significantly across each basis, and the bases with higher change rates tend to be concentrated towards the end. Middle: The change rates for the directions corresponding to the smallest 10 singular values of $\mathbf{W}$. The directions associated with the smallest singular values do not always exhibit the highest change rates. Right: After sorting the change rates from highest to lowest, it is evident that only a few directions have significant change rates, while most exhibit very low change rates. The weights are taken from the 16th layer of LLaMA-7B, and the change rates are scaled (Sec. \ref{['sec: supp detail ft LLaMA']} for more details).
  • Figure 2: The results when fine-tuning Qwen2.5-7B on math reasoning task are similar to those in Fig. \ref{['fig:change W star']}
  • Figure 3: We track the precision and recall of predicted directions every 100 training steps in the query, key and value layers of the LLaMA-7B during LoRA fine-tuning, analyzing how well the continuous updated $\Delta\mathbf{W}$ captures TSDs. Left: We compute the average precision/recall across all query, key, and value layers, showing the model’s ability to retain task-specific knowledge for each training step. Across various rank settings of LoRA, these precision/recall rates consistently exceed 0.75/0.70, indicating that $\Delta\mathbf{W}$ reliably captures and integrates TSD information. Right: For a rank setting of $r=32$, we compute the average precision/recall across all steps for each query, key and value layers, revealing their sensitivity to TSDs. The majority of layers maintain an average precision/recall above 0.75/0.70, showing the robustness to capture TSD information.
  • Figure 4: Frameworks of LoRA-TSD, LoRA-Dash and LoRA-Init.
  • Figure 5: Comparison of generated images from LoRA and LoRA-TSD on subject-driven generation task. Our method consistently aligns more closely with the subjects in the input images and adheres better to the given prompts than LoRA.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4