Efficient Model Editing with Task-Localized Sparse Fine-tuning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
TL;DR
The paper addresses the challenge of efficiently editing large pre-trained models without incurring heavy linearization costs or risking interference between tasks. It introduces TaLoS, a sparse fine-tuning method that enforces function localization and exploits weight disentanglement by updating only the least-sensitive parameters identified via the diagonal Fisher Information $F_{[j,j]}$. This approach yields a near-linearized training regime and scalable task arithmetic, demonstrated by superior results in Task Addition and Task Negation across vision and language domains, along with structured, hardware-friendly sparsity patterns. The findings suggest practical benefits for deploying adaptable foundation models with modular, conflict-free task vectors, while providing insights into the localization and sparsity structure of transformer parameters, particularly in attention projections.
Abstract
Task arithmetic has emerged as a promising approach for editing models by representing task-specific knowledge as composable task vectors. However, existing methods rely on network linearization to derive task vectors, leading to computational bottlenecks during training and inference. Moreover, linearization alone does not ensure weight disentanglement, the key property that enables conflict-free composition of task vectors. To address this, we propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization and sharing information across tasks. We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks, and that sparsely updating only these parameters allows for promoting weight disentanglement during fine-tuning. Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation. By enabling modular parameter editing, our approach fosters practical deployment of adaptable foundation models in real-world applications.
