Table of Contents
Fetching ...

TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation

Daiye Miao, Yufang Liu, Jie Wang, Changzhi Sun, Yunke Zhang, Demei Yan, Shaokang Dong, Qi Zhang, Yuanbin Wu

TL;DR

TASO tackles the redundancy inherent in LoRA-based parameter-efficient fine-tuning by identifying task-specific core regions via gradient-based parameter importance, then enforcing a sparse, task-aligned LoRA structure before fine-tuning. It combines row-wise and column-wise pruning with rank-1 LoRA and introduces a sparsity-aware learning-rate scaling to preserve optimization dynamics under structured sparsity, all anchored by a formal importance metric $\mathcal{I}_i(\theta) = \left| \theta_i \cdot \frac{\partial \mathcal{L}}{\partial \theta_i} \right|$. Empirical results on decoder- and encoder-based models show TASO matches or exceeds LoRA performance at substantially reduced trainable parameter counts (e.g., around $2$–$3$ million vs tens of millions), and even affords faster training relative to iterative pruning baselines. The work also connects to lottery-ticket concepts, demonstrating high sparsity with competitive performance and revealing that core regions align with structurally important dimensions, offering a scalable, composable approach to cross-task adaptation. Overall, TASO provides a practical, theory-informed path to extreme parameter-efficient tuning with broad implications for efficient deployment of large language models.

Abstract

LoRA has become one of the most widely used parameter-efficient fine-tuning methods due to its simplicity and effectiveness. However, numerous studies have shown that LoRA often introduces substantial parameter redundancy, which not only increases the number of trainable parameters but also hinders the effectiveness of fine-tuning. Since identifying redundant parameters in LoRA is inherently difficult, how to eliminate them efficiently and accurately remains a challenging problem. In this paper, we propose TASO, a redundancy reduction method that leverages importance information from the pretrained model's weights to mitigate LoRA redundancy. Specifically, we estimate parameter importance on downstream tasks and identify task-specific core regions based on the distribution of importance scores. The location information of these core regions is then used to determine the sparse structure of LoRA modules, enabling redundancy removal before fine-tuning. Our approach significantly reduces the number of trainable parameters required for task adaptation, while providing a novel task-aligned perspective for LoRA redundancy reduction. Experimental results demonstrate that, with a parameter budget comparable to LoRA with rank $r = 1$, TASO consistently outperforms standard LoRA across multiple tasks, achieving strong fine-tuning performance while effectively eliminating redundant parameters.

TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation

TL;DR

TASO tackles the redundancy inherent in LoRA-based parameter-efficient fine-tuning by identifying task-specific core regions via gradient-based parameter importance, then enforcing a sparse, task-aligned LoRA structure before fine-tuning. It combines row-wise and column-wise pruning with rank-1 LoRA and introduces a sparsity-aware learning-rate scaling to preserve optimization dynamics under structured sparsity, all anchored by a formal importance metric . Empirical results on decoder- and encoder-based models show TASO matches or exceeds LoRA performance at substantially reduced trainable parameter counts (e.g., around million vs tens of millions), and even affords faster training relative to iterative pruning baselines. The work also connects to lottery-ticket concepts, demonstrating high sparsity with competitive performance and revealing that core regions align with structurally important dimensions, offering a scalable, composable approach to cross-task adaptation. Overall, TASO provides a practical, theory-informed path to extreme parameter-efficient tuning with broad implications for efficient deployment of large language models.

Abstract

LoRA has become one of the most widely used parameter-efficient fine-tuning methods due to its simplicity and effectiveness. However, numerous studies have shown that LoRA often introduces substantial parameter redundancy, which not only increases the number of trainable parameters but also hinders the effectiveness of fine-tuning. Since identifying redundant parameters in LoRA is inherently difficult, how to eliminate them efficiently and accurately remains a challenging problem. In this paper, we propose TASO, a redundancy reduction method that leverages importance information from the pretrained model's weights to mitigate LoRA redundancy. Specifically, we estimate parameter importance on downstream tasks and identify task-specific core regions based on the distribution of importance scores. The location information of these core regions is then used to determine the sparse structure of LoRA modules, enabling redundancy removal before fine-tuning. Our approach significantly reduces the number of trainable parameters required for task adaptation, while providing a novel task-aligned perspective for LoRA redundancy reduction. Experimental results demonstrate that, with a parameter budget comparable to LoRA with rank , TASO consistently outperforms standard LoRA across multiple tasks, achieving strong fine-tuning performance while effectively eliminating redundant parameters.

Paper Structure

This paper contains 30 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of TASO. We compute task-specific importance from SFT loss to determine core regions, perform structured pruning of LoRA, and scale learning rate based on sparsity level.
  • Figure 2: Visualization of task-specific important weights identified by sensitivity analysis. Heatmaps show the top-5% important parameters in the self-attention query matrices of DeBERTa-v3 on RTE, LLaMA3.2 3B on GSM8K, and Qwen2.5 3B on GSM8K. The important weights concentrate in specific rows and columns of the matrices.
  • Figure 3: Left: Accuracy vs. sparsity curve with TASO highlighted. Middle: Visualization of key mask sparsity. Right: LoRA training runtime for TASO vs. IMP on four tasks.
  • Figure 4: Accuracy on the RTE task as a function of the pruning hyperparameter $p$, which indicates the fraction of non-zero values after pruned. The $x$-axis is shown from $0.40$ (left) to $0$ (right) to highlight performance under increasing sparsity.