Table of Contents
Fetching ...

Parameter-Efficient Interventions for Enhanced Model Merging

Marcin Osial, Daniel Marczak, Bartosz Zieliński

TL;DR

The paper addresses representation bias in post-hoc multi-task model merging by introducing IntervMerge, which injects task-specific refinements across transformer blocks via lightweight adapters \Phi_b^t and a бюджет-friendly mini-intervention mechanism. Building on a post-merge baseline (e.g., AdaMerging, Surgery), IntervMerge leverages low-rank projections (W_1, W_2) with rank r to adjust representations at multiple depths, improving stability and cross-task alignment while reducing parameter overhead. The approach is evaluated on eight diverse image classification datasets using ViT backbones, with analyses of token/block placement, intervention rank, and data-availability, showing substantial gains over state-of-the-art methods and demonstrating strong parameter efficiency. The results suggest that distributing small, task-specific edits throughout the network yields robust bias mitigation and practical benefits for real-world continual merging scenarios.

Abstract

Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data. However, current methods face challenges due to representation bias, which can interfere with tasks performance. As a remedy, we propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model using taskspecific interventions. To further enhance its efficiency, we introduce mini-interventions, which modify only part of the representation, thereby reducing the additional parameters without compromising performance. Experimental results demonstrate that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.

Parameter-Efficient Interventions for Enhanced Model Merging

TL;DR

The paper addresses representation bias in post-hoc multi-task model merging by introducing IntervMerge, which injects task-specific refinements across transformer blocks via lightweight adapters \Phi_b^t and a бюджет-friendly mini-intervention mechanism. Building on a post-merge baseline (e.g., AdaMerging, Surgery), IntervMerge leverages low-rank projections (W_1, W_2) with rank r to adjust representations at multiple depths, improving stability and cross-task alignment while reducing parameter overhead. The approach is evaluated on eight diverse image classification datasets using ViT backbones, with analyses of token/block placement, intervention rank, and data-availability, showing substantial gains over state-of-the-art methods and demonstrating strong parameter efficiency. The results suggest that distributing small, task-specific edits throughout the network yields robust bias mitigation and practical benefits for real-world continual merging scenarios.

Abstract

Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data. However, current methods face challenges due to representation bias, which can interfere with tasks performance. As a remedy, we propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model using taskspecific interventions. To further enhance its efficiency, we introduce mini-interventions, which modify only part of the representation, thereby reducing the additional parameters without compromising performance. Experimental results demonstrate that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.

Paper Structure

This paper contains 25 sections, 5 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: IntervMerge consistently demonstrates superior performance compared to the state-of-the-art Surgery approach in multi-task model merging. This advantage is particularly evident when utilizing our efficient mini-intervention mechanism, which achieves better results than Surgery while employing three times fewer parameters. It is important to note that IntervMerge may exhibit more additional parameters than Surgery for certain ranks, as it is applied across many network blocks.
  • Figure 2: Various solutions of MTL have different issues. Multiple individually trained models (a) require storing and serving separate weights for each task. Traditional model merging (b) schemes combine multiple individual models into one but often lead to performance degradation. Surgery (c) addresses the problem of representational bias but only on the final layer of the encoder. Our IntervMerge (d) aims to overcome those limitations by applying lightweight interventions across the whole network, mitigating interference between tasks.
  • Figure 3: Illustration of the mini-intervention approach, where specific part $[j\!:\!p]$ of the representation $z_b$ is modified by intervention $\Phi_b^t$ to produce the updated representation $h'_b$.
  • Figure 4: Representations of various methods obtained for the RESISC45 dataset. The gray lines connect individual samples between Surgery and IntervMerge. Consequently, the representations of IntervMerge are closer to those obtained by the task-specific model.
  • Figure 5: Utilizing a stitched network demonstrates that IntervMerge achieves significantly higher accuracy than the Surgery method, reflecting improved consistency with task-specific representations. We averaged the results across eight datasets.
  • ...and 1 more figures