Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer
Guodong Du, Zitao Fang, Jing Li, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Honghai Liu, Min Zhang
TL;DR
This work tackles the redundancy and generalization gap of fine-tuned checkpoints by pruning task-vector deltas with Neural Parameter Search (NPS). NPS decomposes the fine-tuning delta $\tau$ into low-rank subspaces and optimizes weights $\{w_m\}$ via CMA-ES to form a pruned update, yielding $\hat{\theta}_{ft} = \theta_{pre} + \sum_m w_m q_m$ before applying a sparsity mask. The authors demonstrate three practical applications—knowledge transfer, knowledge fusion, and knowledge compression—showing that NPS can mitigate catastrophic forgetting, enable robust multi-task fusion, and substantially reduce storage while preserving near-original performance across vision, NLP, and multimodal benchmarks. The approach delivers gradient-free, lightweight pruning with broad applicability, achieving notable gains in accuracy and compression (e.g., up to $+3.0\%$ in vision fusion and nearly 99% normalized accuracy in compression), making it valuable for scalable deployment of fine-tuned models. Overall, NPS provides a simple yet effective tool for slimming fine-tuned models while preserving transferability and resilience across tasks and domains.
Abstract
Foundation models and their checkpoints have significantly advanced deep learning, boosting performance across various applications. However, fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy. Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate forgetting, reduce interference when merging model parameters across tasks, and improve compression efficiency. In this context, developing an effective pruning strategy for fine-tuned models is crucial. Leveraging the advantages of the task vector mechanism, we preprocess fine-tuned models by calculating the differences between them and the original model. Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models. This method enhances pruning efficiency by searching through neural parameters of task vectors within low-rank subspaces. Our method has three key applications: enhancing knowledge transfer through pairwise model interpolation, facilitating effective knowledge fusion via model merging, and enabling the deployment of compressed models that retain near-original performance while significantly reducing storage costs. Extensive experiments across vision, NLP, and multi-modal benchmarks demonstrate the effectiveness and robustness of our approach, resulting in substantial performance gains. The code is publicly available at: https://github.com/duguodong7/NPS-Pruning.
