Table of Contents
Fetching ...

Fine-tuning Aligned Classifiers for Merging Outputs: Towards a Superior Evaluation Protocol in Model Merging

Fanshuang Kong, Richong Zhang, Zhijie Nie, Ziqiao Wang, Qiang Sun

TL;DR

This paper identifies a misalignment between the outputs of merged models and the fine-tuned classifiers used for evaluation in classification tasks, showing that merging outputs already contain the necessary classification information despite parameter changes. It demonstrates that this misalignment can converge to an orthogonal transformation, which can be corrected with simple, low-parameter alignment to significantly boost evaluation accuracy and merging performance. To address this, the authors propose FT-Classifier Eval, a few-shot unlabeled data–driven protocol that learns an aligned classifier for the merged outputs without changing model structure. Across NLP and CV tasks, FT-Classifier Eval yields higher accuracy and more faithful assessments of merging methods than the traditional Current Eval, suggesting a practical path to better evaluation and deployment of merged models.

Abstract

Model merging combines multiple fine-tuned models into a single one via parameter fusion, achieving improvements across many tasks. However, in the classification task, we find a misalignment issue between merging outputs and the fine-tuned classifier, which limits its effectiveness. In this paper, we first demonstrate the following observations: (1) Merging outputs exhibit the comparable cluster effect with fine-tuned outputs, and already contain necessary classification information; (2) The misalignment between merging outputs and the fine-tuned classifier can converge to an orthogonal transformation, and alleviating this misalignment can significantly enhance the performance of merging models. Based on these observations, we then propose a new protocol FT-Classifier, which fine-tunes an aligned classifier with few-shot unlabeled samples, enabling better evaluation of merging methods and improved classification performance.

Fine-tuning Aligned Classifiers for Merging Outputs: Towards a Superior Evaluation Protocol in Model Merging

TL;DR

This paper identifies a misalignment between the outputs of merged models and the fine-tuned classifiers used for evaluation in classification tasks, showing that merging outputs already contain the necessary classification information despite parameter changes. It demonstrates that this misalignment can converge to an orthogonal transformation, which can be corrected with simple, low-parameter alignment to significantly boost evaluation accuracy and merging performance. To address this, the authors propose FT-Classifier Eval, a few-shot unlabeled data–driven protocol that learns an aligned classifier for the merged outputs without changing model structure. Across NLP and CV tasks, FT-Classifier Eval yields higher accuracy and more faithful assessments of merging methods than the traditional Current Eval, suggesting a practical path to better evaluation and deployment of merged models.

Abstract

Model merging combines multiple fine-tuned models into a single one via parameter fusion, achieving improvements across many tasks. However, in the classification task, we find a misalignment issue between merging outputs and the fine-tuned classifier, which limits its effectiveness. In this paper, we first demonstrate the following observations: (1) Merging outputs exhibit the comparable cluster effect with fine-tuned outputs, and already contain necessary classification information; (2) The misalignment between merging outputs and the fine-tuned classifier can converge to an orthogonal transformation, and alleviating this misalignment can significantly enhance the performance of merging models. Based on these observations, we then propose a new protocol FT-Classifier, which fine-tunes an aligned classifier with few-shot unlabeled samples, enabling better evaluation of merging methods and improved classification performance.

Paper Structure

This paper contains 33 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: T-SNE visualization of three representative merging methods on AG News and DBpedia. The blue points represent the sentence embeddings of the $f_t$ model, while the red points represent the results of the $f_m$ model. The color shades indicate different labels. The value indicates the classification accuracy of merging methods. For results on other datasets, please refer to Appendix \ref{['sec:app:t-sne']}.
  • Figure 2: The performance of KNN Eval compared with Current Eval under varying numbers of few-shot examples $k$. Fine-tuned represents the results obtained using $f_t$, which serves as the evaluation upper bound. The $x$-axis represents the $k$, and the $y$-axis represents the average Accuracy for all merging tasks. Corresponding detailed results while $k=5$ are shown in Table \ref{['tab:ft-classifier']}. Experimental details and explanations are provided in Section \ref{['sec:Experiments']}.
  • Figure 3: Illustrations of some variants referred to in this paper.
  • Figure 4: Comparison of different transformations. WA represents Weight Averaging, TA represents Task Arithmetic.
  • Figure 5: The performance of FT-Classifier Eval under varying numbers of few-shot examples $k$. Corresponding detailed results while $k=5$ are shown in Table \ref{['tab:ft-classifier']}.
  • ...and 6 more figures