Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
Kuangpu Guo, Yuhe Ding, Jian Liang, Zilei Wang, Ran He
TL;DR
This work tackles the persistent degradation in multi-task model merging by revealing that parameter conflicts suppress task-specific information even among similar tasks. It introduces DTS, a lightweight, approximation-based framework using SVD decomposition, four-group thresholding, and per-group scaling to preserve task-specific signals with only 1% extra storage per task. A data-free variant extends DTS to unseen tasks by weighting information via semantic similarity, enabling robust generalization without access to training data. Across vision, NLP, and generation backbones, DTS consistently surpasses state-of-the-art baselines and demonstrates strong unseen-task generalization, making it practical for memory-constrained deployments.
Abstract
Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1\% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.
