Table of Contents
Fetching ...

SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

Aecheon Jung, Seunghwan Lee, Dongyoon Han, Sungeun Hong

TL;DR

This work reframes model merging as a pursuit of cross-task synergy rather than mere non-interference. It introduces SyMerge, a lightweight, test-time adaptive framework that jointly optimizes a single task-specific layer and encoder merging coefficients, guided by expert predictions through self-labeling on unlabeled data. The approach yields state-of-the-art results across vision, dense prediction, and NLP benchmarks and demonstrates that the adapted layer transfers effectively to other merging methods, enhancing functional alignment between tasks. The findings highlight the practical impact of minimal task-specific adaptation for robust, scalable multi-task merging under distribution shifts, while also acknowledging dependence on the quality of expert models.

Abstract

Model merging offers an efficient alternative to multi-task learning by combining independently fine-tuned models, but most prior approaches focus mainly on avoiding task interference. We argue instead that the real potential of merging lies in achieving synergy, where tasks enhance one another. Our intuition comes from a pilot study showing that when a classifier trained on one task is paired with the encoder of another, the resulting cross-task performance strongly predicts merge quality. Moreover, adapting even a single task-specific layer can substantially improve this compatibility, suggesting a simple yet powerful lever for synergy. Building on this insight, we introduce SyMerge, a lightweight framework that jointly optimizes one task-specific layer and merging coefficients. To ensure stability without labels, SyMerge employs a robust self-labeling strategy guided by expert model predictions, avoiding the pitfalls of entropy-based adaptation. This minimalist yet principled design achieves state-of-the-art results across vision, dense prediction, and NLP benchmarks, while also producing adapted layers that transfer effectively to other merging methods. Our code is available at https://aim-skku.github.io/SyMerge/

SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

TL;DR

This work reframes model merging as a pursuit of cross-task synergy rather than mere non-interference. It introduces SyMerge, a lightweight, test-time adaptive framework that jointly optimizes a single task-specific layer and encoder merging coefficients, guided by expert predictions through self-labeling on unlabeled data. The approach yields state-of-the-art results across vision, dense prediction, and NLP benchmarks and demonstrates that the adapted layer transfers effectively to other merging methods, enhancing functional alignment between tasks. The findings highlight the practical impact of minimal task-specific adaptation for robust, scalable multi-task merging under distribution shifts, while also acknowledging dependence on the quality of expert models.

Abstract

Model merging offers an efficient alternative to multi-task learning by combining independently fine-tuned models, but most prior approaches focus mainly on avoiding task interference. We argue instead that the real potential of merging lies in achieving synergy, where tasks enhance one another. Our intuition comes from a pilot study showing that when a classifier trained on one task is paired with the encoder of another, the resulting cross-task performance strongly predicts merge quality. Moreover, adapting even a single task-specific layer can substantially improve this compatibility, suggesting a simple yet powerful lever for synergy. Building on this insight, we introduce SyMerge, a lightweight framework that jointly optimizes one task-specific layer and merging coefficients. To ensure stability without labels, SyMerge employs a robust self-labeling strategy guided by expert model predictions, avoiding the pitfalls of entropy-based adaptation. This minimalist yet principled design achieves state-of-the-art results across vision, dense prediction, and NLP benchmarks, while also producing adapted layers that transfer effectively to other merging methods. Our code is available at https://aim-skku.github.io/SyMerge/
Paper Structure (26 sections, 2 theorems, 3 equations, 24 figures, 14 tables)

This paper contains 26 sections, 2 theorems, 3 equations, 24 figures, 14 tables.

Key Result

Proposition 1

Assume cross-task linearity CTL_ICML2024 so that $f(x;\frac{1}{2}(\theta_i + \theta_j)) \approx \frac{1}{2}f(x;\theta_i)+\frac{1}{2}f(x;\theta_j)$, and suppose the loss function is convex in its output. Then the merged model $f_{merge}(x)=f(x;\frac{1}{2}(\theta_i+\theta_j))$ has an expected loss bel

Figures (24)

  • Figure 1: Training-Free methods collapse under corruption. Worse than test-time methods on clean data as well, and far more degraded under corruption.
  • Figure 2: Cross-task vs. Merge Performance. Positive correlation observed across 20 vision tasks with regression fit and 95% confidence interval.
  • Figure 3: Two-stage pilot study protocol and its results on 8 cross-tasks using ViT-B/32. (a) We first enhance a classifier's functional alignment by training it on representations from a general-purpose merged encoder. We then measure this enhancement by evaluating the trained classifier's cross-task performance when paired with the encoder of a different, individual task. (b) The heatmap shows the accuracy gain (%p) over the baseline across 8 tasks (x-axis) under various merged encoder configurations (y-axis, via merging coefficient). The consistently positive gains (red) demonstrate the protocol's effectiveness in enhancing functional alignment.
  • Figure 4: Spearman correlation of proxy losses with ground truth cross-entropy loss. We compare coefficients for Entropy and Ours using merged weights before and after training. A coefficient closer to +1 indicates a more reliable proxy for the true objective.
  • Figure 5: Cross-task transferability check. Classifiers trained with our method replace the original zero-shot classifiers (in a pre-trained model) connected to merged encoders (Task Arithmetic and AdaMerging) for evaluating different tasks. This replacement yields substantial performance gains on both merged and cross-task evaluations without training on target tasks, demonstrating the high transferability and improved functional alignment.
  • ...and 19 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 1
  • proof