When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

Yayuan Li; Ze Peng; Jian Zhang; Jintao Guo; Yue Duan; Yinghuan Shi

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

Yayuan Li, Ze Peng, Jian Zhang, Jintao Guo, Yue Duan, Yinghuan Shi

TL;DR

The paper investigates spectral over-counting in model merging, where cross-task alignment inflates a few top singular values and overemphasizes shared directions. It introduces Singular Value Calibration (SVC), a training-free, data-free post-processing method that uses a merged column-space basis to quantify subspace overlap and recalibrate singular values without changing directions. SVC computes subspace-wise overlap via projections, derives calibration strengths, and reconstructs a balanced spectrum, yielding consistent gains across vision and language benchmarks and even improving Task Arithmetic by $13.0\%$. The approach is efficient, scalable, and complementary to existing spectral baselines, offering practical improvements for multi-task merging without data or gradient optimization. Overall, SVC provides a principled mechanism to mitigate the detrimental impact of shared knowledge in merged models, enabling more robust and transferable multi-task capabilities.

Abstract

Model merging combines multiple fine-tuned models into a single model by adding their weight updates, providing a lightweight alternative to retraining. Existing methods primarily target resolving conflicts between task updates, leaving the failure mode of over-counting shared knowledge unaddressed. We show that when tasks share aligned spectral directions (i.e., overlapping singular vectors), a simple linear combination repeatedly accumulates these directions, inflating the singular values and biasing the merged model toward shared subspaces. To mitigate this issue, we propose Singular Value Calibration (SVC), a training-free and data-free post-processing method that quantifies subspace overlap and rescales inflated singular values to restore a balanced spectrum. Across vision and language benchmarks, SVC consistently improves strong merging baselines and achieves state-of-the-art performance. Furthermore, by modifying only the singular values, SVC improves the performance of Task Arithmetic by 13.0%. Code is available at: https://github.com/lyymuwu/SVC.

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

TL;DR

. The approach is efficient, scalable, and complementary to existing spectral baselines, offering practical improvements for multi-task merging without data or gradient optimization. Overall, SVC provides a principled mechanism to mitigate the detrimental impact of shared knowledge in merged models, enabling more robust and transferable multi-task capabilities.

Abstract

Paper Structure (39 sections, 2 theorems, 24 equations, 6 figures, 13 tables, 1 algorithm)

This paper contains 39 sections, 2 theorems, 24 equations, 6 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Dynamic Model Merging.
Static Model Merging.
Spectral View of Inter-Model Interference
Preliminary
Projections Reveal Spectral Over-Counting
Projecting tasks onto shared space.
Interference as directional projection mismatch.
From interference to singular-value inflation.
Generality of Singular-Value Over-Accumulation
Methodology
Step 1: Merged column-space basis.
Step 2: Subspace-wise overlap from projections.
Step 3: Singular-value calibration and reconstruction.
...and 24 more sections

Key Result

Lemma 3.2

Assume $\Delta \mathbf{W}_{\mathrm{merge}}=\sum_{k=1}^{K}\Delta \mathbf{W}_k$. Fix any task $i$ and subspace $r$, and assume $\|\bm{a}_i^{r}\|_{2}^{2}>0$. Then

Figures (6)

Figure 1: Shared knowledge accumulation in model merging. When merging task matrices ($\Delta \mathbf{W}_i$) from multiple tasks, shared knowledge that aligns across tasks can be over-counted, resulting in singular-value inflation in the merged model's spectrum. This inflation is concentrated in a few top spectral subspaces, causing the merged model to be dominated by shared directions, while task-specific components in the remaining subspaces are suppressed.
Figure 2: Discrepancy between original and calibrated singular values. For weight-space addition, we compare the original singular values $\sigma$ from $\mathrm{SVD}(\Delta \mathbf{W}_{\mathrm{merge}})$ with the calibrated values $\sigma^{\star}$, where $\sigma^{\star}$ is obtained by first computing the task-wise optimal scalings $(\gamma_{i}^{r})^\star$ from Eq. (\ref{['eq:inflated singular-value']}) and then averaging them across tasks within each subspace. A clear gap $\Delta=\sigma-\sigma^{\star}$ appears in top spectral subspaces, indicating systematic spectral over-counting and singular-value inflation.
Figure 3: Cross terms concentrate in top spectral subspaces. We visualize $\langle \bm{a}_i^{r},\bm{a}_j^{r}\rangle$ across tasks for small $r$, showing predominantly positive overlap that induces over-counting.
Figure 4: Generality of the singular-value gap. We compare the original singular values $\sigma$ with the calibrated values $\sigma^{\star}$, where $\sigma^{\star}$ applies the subspace-wise average of the task-wise optimal scalings $\gamma_{i}^{r\star}$ from Eq. (\ref{['eq:inflated singular-value']}).The gap between $\sigma$ and $\sigma^{\star}$ persists across representative merging methods, indicating that singular-value inflation and overly small singular values coexist.
Figure 5: Effect of hyperparameter $\alpha$ in SVC. When only suppressing over-counting ($\alpha=1$), SVC yields a stable improvement. In contrast, additionally boosting singular values ($\alpha\in(0,1)$) requires caution and can degrade performance as $\alpha$ decreases.
...and 1 more figures

Theorems & Definitions (3)

Remark 3.1: Layer-wise linear view
Lemma 3.2: Cross-term form of projection interference
Theorem 3.3: Projection-optimal calibration and singular-value inflation

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

TL;DR

Abstract

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)