TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
Daiye Miao, Yufang Liu, Jie Wang, Changzhi Sun, Yunke Zhang, Demei Yan, Shaokang Dong, Qi Zhang, Yuanbin Wu
TL;DR
TASO tackles the redundancy inherent in LoRA-based parameter-efficient fine-tuning by identifying task-specific core regions via gradient-based parameter importance, then enforcing a sparse, task-aligned LoRA structure before fine-tuning. It combines row-wise and column-wise pruning with rank-1 LoRA and introduces a sparsity-aware learning-rate scaling to preserve optimization dynamics under structured sparsity, all anchored by a formal importance metric $\mathcal{I}_i(\theta) = \left| \theta_i \cdot \frac{\partial \mathcal{L}}{\partial \theta_i} \right|$. Empirical results on decoder- and encoder-based models show TASO matches or exceeds LoRA performance at substantially reduced trainable parameter counts (e.g., around $2$–$3$ million vs tens of millions), and even affords faster training relative to iterative pruning baselines. The work also connects to lottery-ticket concepts, demonstrating high sparsity with competitive performance and revealing that core regions align with structurally important dimensions, offering a scalable, composable approach to cross-task adaptation. Overall, TASO provides a practical, theory-informed path to extreme parameter-efficient tuning with broad implications for efficient deployment of large language models.
Abstract
LoRA has become one of the most widely used parameter-efficient fine-tuning methods due to its simplicity and effectiveness. However, numerous studies have shown that LoRA often introduces substantial parameter redundancy, which not only increases the number of trainable parameters but also hinders the effectiveness of fine-tuning. Since identifying redundant parameters in LoRA is inherently difficult, how to eliminate them efficiently and accurately remains a challenging problem. In this paper, we propose TASO, a redundancy reduction method that leverages importance information from the pretrained model's weights to mitigate LoRA redundancy. Specifically, we estimate parameter importance on downstream tasks and identify task-specific core regions based on the distribution of importance scores. The location information of these core regions is then used to determine the sparse structure of LoRA modules, enabling redundancy removal before fine-tuning. Our approach significantly reduces the number of trainable parameters required for task adaptation, while providing a novel task-aligned perspective for LoRA redundancy reduction. Experimental results demonstrate that, with a parameter budget comparable to LoRA with rank $r = 1$, TASO consistently outperforms standard LoRA across multiple tasks, achieving strong fine-tuning performance while effectively eliminating redundant parameters.
