Table of Contents
Fetching ...

Bridging Domains through Subspace-Aware Model Merging

Levy Chaves, Chao Zhou, Rebekka Burkholz, Eduardo Valle, Sandra Avila

TL;DR

This work proposes SCORE (Subspace COnflict-Resolving mErging), a method designed to alleviate singular subspace conflicts in model merging, and consistently outperforms existing model merging approaches in domain generalization settings across a variety of architectures and model scales.

Abstract

Model merging integrates multiple task-specific models into a single consolidated one. Recent research has made progress in improving merging performance for in-distribution or multi-task scenarios, but domain generalization in model merging remains underexplored. We investigate how merging models fine-tuned on distinct domains affects generalization to unseen domains. Through an analysis of parameter competition in the task matrix using singular value decomposition, we show that merging models trained under different distribution shifts induces stronger conflicts between their subspaces compared to traditional multi-task settings. To mitigate this issue, we propose SCORE (Subspace COnflict-Resolving mErging), a method designed to alleviate such singular subspace conflicts. SCORE finds a shared orthogonal basis by computing the principal components of the concatenated leading singular vectors of all models. It then projects each task matrix into the shared basis, pruning off-diagonal components to remove conflicting singular directions. SCORE consistently outperforms, on average, existing model merging approaches in domain generalization settings across a variety of architectures and model scales, demonstrating its effectiveness and scalability.

Bridging Domains through Subspace-Aware Model Merging

TL;DR

This work proposes SCORE (Subspace COnflict-Resolving mErging), a method designed to alleviate singular subspace conflicts in model merging, and consistently outperforms existing model merging approaches in domain generalization settings across a variety of architectures and model scales.

Abstract

Model merging integrates multiple task-specific models into a single consolidated one. Recent research has made progress in improving merging performance for in-distribution or multi-task scenarios, but domain generalization in model merging remains underexplored. We investigate how merging models fine-tuned on distinct domains affects generalization to unseen domains. Through an analysis of parameter competition in the task matrix using singular value decomposition, we show that merging models trained under different distribution shifts induces stronger conflicts between their subspaces compared to traditional multi-task settings. To mitigate this issue, we propose SCORE (Subspace COnflict-Resolving mErging), a method designed to alleviate such singular subspace conflicts. SCORE finds a shared orthogonal basis by computing the principal components of the concatenated leading singular vectors of all models. It then projects each task matrix into the shared basis, pruning off-diagonal components to remove conflicting singular directions. SCORE consistently outperforms, on average, existing model merging approaches in domain generalization settings across a variety of architectures and model scales, demonstrating its effectiveness and scalability.
Paper Structure (27 sections, 8 equations, 28 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 8 equations, 28 figures, 4 tables, 1 algorithm.

Figures (28)

  • Figure 1: Evaluating model merging for domain generalization. We merge fine-tuned models on a number of domains and evaluate the merged model on an unseen domain. We repeat the leave-one-out process for each available domain.
  • Figure 2: Pairwise similarity of fine-tuned models measured by the Subspace Alignment Ratio (SAR). (a) Models fine-tuned for 8 datasets from ilharcoediting2023. (b) Models fine-tuned for 6 domains of DomainNet domainnet. The similarity between multi-domain models is much higher than between multi-task models, with much more overlap between the subspaces occupied, creating opportunity for conflicts.
  • Figure 3: $\Sigma_{score}$ (see Alg. \ref{['algo:proposal']}) computed over 6 DomainNet domains across different attention layers of a ViT-B-32. Left: high agreement/low conflict, with a strong main diagonal and clean off-diagonal. Middle: high agreement/high conflict, with both the main diagonal and off-diagonal blocks significantly occupied. Right: low agreement/hight conflict, with off-diagonal elements dominating and main diagonal nearly absent, indicating that the shared basis ($U_{\perp}$ and $V_{\perp}$) fails to capture consistent singular directions across domains and that those domains intensely compete for subspaces in the shared representation.
  • Figure 4: Pairwise $\theta_{avg}$ (in radians). Models fine-tuned for 8 MTL datasets(left). Models fine-tuned for 6 domains of FedISIC (right).
  • Figure 5: Per-domain results for ViT-B-32 on the PACS dataset for each model merging method in our study.
  • ...and 23 more figures