Table of Contents
Fetching ...

Resolving Interference (RI): Disentangling Models for Improved Model Merging

Pratik Ramesh, George Stoica, Arun Iyer, Leshem Choshen, Judy Hoffman

Abstract

Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the representation of the merged model relative to its constituent models. Reducing cross-task interference is key to improving merging performance. To address this issue, we propose our method, Resolving Interference (RI), a light-weight adaptation framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only unlabeled auxiliary data as input (i.e., no task-data is needed), allowing it to be applied in data-scarce scenarios. RI consistently improves the performance of state-of-the-art merging methods by up to 3.8% and generalization to unseen domains by up to 2.3%. We also find RI to be robust to the source of auxiliary input while being significantly less sensitive to tuning of merging hyperparameters. Our codebase is available at: https://github.com/pramesh39/resolving_interference

Resolving Interference (RI): Disentangling Models for Improved Model Merging

Abstract

Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the representation of the merged model relative to its constituent models. Reducing cross-task interference is key to improving merging performance. To address this issue, we propose our method, Resolving Interference (RI), a light-weight adaptation framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only unlabeled auxiliary data as input (i.e., no task-data is needed), allowing it to be applied in data-scarce scenarios. RI consistently improves the performance of state-of-the-art merging methods by up to 3.8% and generalization to unseen domains by up to 2.3%. We also find RI to be robust to the source of auxiliary input while being significantly less sensitive to tuning of merging hyperparameters. Our codebase is available at: https://github.com/pramesh39/resolving_interference
Paper Structure (30 sections, 4 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 30 sections, 4 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Resolving Interference (RI) is a lightweight adaptation strategy that mitigates cross-task interference, enhancing the performance of existing model-merging techniques.
  • Figure 2: Resolving Interference (RI) for the dog classifier involves passing unlabeled auxiliary images through (i) the frozen pretrained backbone $f(\theta_{0})$, (ii) the frozen dog classifier $f(\theta_{0}+\tau_{\text{dog}})$, and (iii) a trainable copy $f(\theta_{0}+\tau_{\text{dog}}^{*})$. A twin distillation loss preserves output distribution across the dog head $h_{\text{dog}}$ while forcing output across all other heads (e.g., $h_{\text{cat}}$) to match the pretrained backbone, producing an adapted task vector $\tau_{\text{dog}}^{*}$. The same adaptation strategy is repeated for all expert models.
  • Figure 3: The Twin-Distillation loss (Eq.\ref{['eq:ri_loss']}) on auxiliary data reduces sharply (left), followed by a similar decline in the cross-task interference of the merged model measured on task-specific data (middle), which leads to significant improvement when using existing merging techniques(right).
  • Figure 4: What makes a good source of auxiliary data for RI? RI benefits from a wide range of auxiliary sources, including synthetic and real data, as long as they exhibit sufficient visual diversity. While gains are observed across all sources, datasets that are closer to the target task distribution yield the largest improvements, with task data (oracle) providing the strongest performance.
  • Figure 5: What makes a good source of auxiliary data for RI?