Fine-tuning Aligned Classifiers for Merging Outputs: Towards a Superior Evaluation Protocol in Model Merging
Fanshuang Kong, Richong Zhang, Zhijie Nie, Ziqiao Wang, Qiang Sun
TL;DR
This paper identifies a misalignment between the outputs of merged models and the fine-tuned classifiers used for evaluation in classification tasks, showing that merging outputs already contain the necessary classification information despite parameter changes. It demonstrates that this misalignment can converge to an orthogonal transformation, which can be corrected with simple, low-parameter alignment to significantly boost evaluation accuracy and merging performance. To address this, the authors propose FT-Classifier Eval, a few-shot unlabeled data–driven protocol that learns an aligned classifier for the merged outputs without changing model structure. Across NLP and CV tasks, FT-Classifier Eval yields higher accuracy and more faithful assessments of merging methods than the traditional Current Eval, suggesting a practical path to better evaluation and deployment of merged models.
Abstract
Model merging combines multiple fine-tuned models into a single one via parameter fusion, achieving improvements across many tasks. However, in the classification task, we find a misalignment issue between merging outputs and the fine-tuned classifier, which limits its effectiveness. In this paper, we first demonstrate the following observations: (1) Merging outputs exhibit the comparable cluster effect with fine-tuned outputs, and already contain necessary classification information; (2) The misalignment between merging outputs and the fine-tuned classifier can converge to an orthogonal transformation, and alleviating this misalignment can significantly enhance the performance of merging models. Based on these observations, we then propose a new protocol FT-Classifier, which fine-tunes an aligned classifier with few-shot unlabeled samples, enabling better evaluation of merging methods and improved classification performance.
