Table of Contents
Fetching ...

NAN: A Training-Free Solution to Coefficient Estimation in Model Merging

Chongjie Si, Kangtao Lv, Jingjing Jiang, Yadao Wang, Yongwei Wang, Xiaokang Yang, Wenbo Su, Bo Zheng, Wei Shen

TL;DR

The paper tackles the challenge of merging independently fine-tuned models without access to raw data, critiquing heuristic coefficient strategies. It recasts model merging as a least-squares problem, deriving that optimal coefficients align with each model's information content, and shows that under normalized inputs the merge approximates a sample-size-weighted average. Building on this theory, it introduces NAN, a training-free plugin that uses the inverse Frobenius norm of each model's weights to set merging coefficients, scalable to various merging strategies. Extensive experiments across vision, language, and vision-language tasks demonstrate NAN's robust gains and generality, underscoring its practical viability as a plug-in for modular, data-free multi-task fusion.

Abstract

Model merging offers a training-free alternative to multi-task learning by combining independently fine-tuned models into a unified one without access to raw data. However, existing approaches often rely on heuristics to determine the merging coefficients, limiting their scalability and generality. In this work, we revisit model merging through the lens of least-squares optimization and show that the optimal merging weights should scale with the amount of task-specific information encoded in each model. Based on this insight, we propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm. NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies. Extensive experiments on show that NAN consistently improves performance of baseline methods.

NAN: A Training-Free Solution to Coefficient Estimation in Model Merging

TL;DR

The paper tackles the challenge of merging independently fine-tuned models without access to raw data, critiquing heuristic coefficient strategies. It recasts model merging as a least-squares problem, deriving that optimal coefficients align with each model's information content, and shows that under normalized inputs the merge approximates a sample-size-weighted average. Building on this theory, it introduces NAN, a training-free plugin that uses the inverse Frobenius norm of each model's weights to set merging coefficients, scalable to various merging strategies. Extensive experiments across vision, language, and vision-language tasks demonstrate NAN's robust gains and generality, underscoring its practical viability as a plug-in for modular, data-free multi-task fusion.

Abstract

Model merging offers a training-free alternative to multi-task learning by combining independently fine-tuned models into a unified one without access to raw data. However, existing approaches often rely on heuristics to determine the merging coefficients, limiting their scalability and generality. In this work, we revisit model merging through the lens of least-squares optimization and show that the optimal merging weights should scale with the amount of task-specific information encoded in each model. Based on this insight, we propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm. NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies. Extensive experiments on show that NAN consistently improves performance of baseline methods.

Paper Structure

This paper contains 8 sections, 10 equations, 3 tables.