Parameter-Efficient Interventions for Enhanced Model Merging
Marcin Osial, Daniel Marczak, Bartosz Zieliński
TL;DR
The paper addresses representation bias in post-hoc multi-task model merging by introducing IntervMerge, which injects task-specific refinements across transformer blocks via lightweight adapters \Phi_b^t and a бюджет-friendly mini-intervention mechanism. Building on a post-merge baseline (e.g., AdaMerging, Surgery), IntervMerge leverages low-rank projections (W_1, W_2) with rank r to adjust representations at multiple depths, improving stability and cross-task alignment while reducing parameter overhead. The approach is evaluated on eight diverse image classification datasets using ViT backbones, with analyses of token/block placement, intervention rank, and data-availability, showing substantial gains over state-of-the-art methods and demonstrating strong parameter efficiency. The results suggest that distributing small, task-specific edits throughout the network yields robust bias mitigation and practical benefits for real-world continual merging scenarios.
Abstract
Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data. However, current methods face challenges due to representation bias, which can interfere with tasks performance. As a remedy, we propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model using taskspecific interventions. To further enhance its efficiency, we introduce mini-interventions, which modify only part of the representation, thereby reducing the additional parameters without compromising performance. Experimental results demonstrate that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.
