Table of Contents
Fetching ...

Subspace-Boosted Model Merging

Ronald Skorobogat, Karsten Roth, Mariana-Iuliana Georgescu

TL;DR

The paper identifies rank collapse as a fundamental limitation of Task Arithmetic-based model merging, where common information increasingly dominates task-specific signals as more experts are merged. It introduces Subspace Boosting to recover suppressed task directions by boosting underutilized singular values in the task-vector space, significantly boosting merging performance across vision and language benchmarks (often by >10%) and across multiple merging methods. For interpretability, it develops Higher-Order GSVD to project task vectors into a shared subspace, enabling direct comparison of experts via Alignment Matrices and even enabling principled expert selection. The combination of Subspace Boosting and HO-GSVD provides both practical performance gains and a transparent framework for understanding and choosing among merged experts.

Abstract

Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we empirically and theoretically analyze this limitation, proving that for Task Arithmetic-based methods, as more experts are merged, the common information dominates the task-specific information, leading to inevitable rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 experts by large margins of more than 10% when evaluated on both vision and language benchmarks. Moreover, we propose employing Higher-Order Generalized Singular Value Decomposition to quantify task similarity, offering a new interpretable perspective on model merging. Code and models are available at https://github.com/ronskoro/Subspace-Boosting.

Subspace-Boosted Model Merging

TL;DR

The paper identifies rank collapse as a fundamental limitation of Task Arithmetic-based model merging, where common information increasingly dominates task-specific signals as more experts are merged. It introduces Subspace Boosting to recover suppressed task directions by boosting underutilized singular values in the task-vector space, significantly boosting merging performance across vision and language benchmarks (often by >10%) and across multiple merging methods. For interpretability, it develops Higher-Order GSVD to project task vectors into a shared subspace, enabling direct comparison of experts via Alignment Matrices and even enabling principled expert selection. The combination of Subspace Boosting and HO-GSVD provides both practical performance gains and a transparent framework for understanding and choosing among merged experts.

Abstract

Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we empirically and theoretically analyze this limitation, proving that for Task Arithmetic-based methods, as more experts are merged, the common information dominates the task-specific information, leading to inevitable rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 experts by large margins of more than 10% when evaluated on both vision and language benchmarks. Moreover, we propose employing Higher-Order Generalized Singular Value Decomposition to quantify task similarity, offering a new interpretable perspective on model merging. Code and models are available at https://github.com/ronskoro/Subspace-Boosting.

Paper Structure

This paper contains 32 sections, 7 theorems, 35 equations, 14 figures, 10 tables, 2 algorithms.

Key Result

Proposition 1

Let ${\Delta}_m$ be the average of $N$ task vectors, decomposed into common and task-specific (noise) components. As $N \to \infty$: The singular values of the task-specific subspace decay at a rate of $\mathcal{O}(1/\sqrt{N})$ compared to the singular values of the common subspace, which remain asy

Figures (14)

  • Figure 1: Overview of our contributions. (a) Popular merging methods such as Task Arithmetic (TA) ilharco2023task, TIES Yadav-NIPS-2023 and Consensus Merging (CSM) wang2024localizing, suffer from rank collapse, correlating with low performance. (b) To prevent rank collapse, we introduce Subspace Boosting, which mitigates it by boosting neglected singular values, vastly improving performance. (c) Finally, for interpretability, we use HO-GSVD, transforming individual models to share the same subspace, enabling direct comparison.
  • Figure 2: Stable rank in merged ViT-B/16 models. (a-c) The stable rank is decomposed across various attention and MLP sublayers of three layer blocks. (d) As more models are merged, the stable rank decreases across a majority of layers, strongly correlating with the performance.
  • Figure 3: Evolution of the Singular Value Distribution. As more experts are merged, higher absolute and relative mass is placed on fewer singular vectors; encouraging the rank collapse. This indicates that information becomes concentrated in fewer dominant dimensions.
  • Figure 4: Higher-Order Generalized SVD (HO-GSVD). Unlike normal Singular Value Decomposition (SVD) which decomposes matrices into individual $A_i = U_i\Sigma_iV_i$, HO-GSVD allows for decompositions into shared right singular subspaces $V$.
  • Figure 5: (a)Distribution of Generalized Singular Values across different task vectors and a merged reference of eight task vectors. (b)Model Alignment via HO-GSVD, showcasing how HO-GSVD can be used to contrast expert alignments across a shared decomposition space.
  • ...and 9 more figures

Theorems & Definitions (11)

  • Proposition 1: Singular Value Decay of Averaged Task-Specific Information
  • Proposition 2: Asymptotic Stable Rank Collapse
  • Proposition 3: Inherent Limitation of Task Arithmetic-Based Merging
  • Proposition 1: Formal Statement: Singular Value Decay of Averaged Task-Specific Information
  • proof
  • Proposition 2: Formal Statement: Asymptotic Stable Rank Collapse
  • proof
  • Proposition 3: Formal Statement: Inherent Limitation of Task Arithmetic-Based Merging
  • proof
  • Proposition 4: Formal Statement: Signal-to-Noise Trade-off in Subspace Boosting
  • ...and 1 more