Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Yuatyong Chaichana, Thanapat Trachu, Peerat Limkonchotiwat, Konpat Preechakul, Tirasan Khandhawit, Ekapol Chuangsuwanich
TL;DR
DRM addresses the challenge that finetuned models diverge in weight parameterization, hindering straightforward entrywise merging. It introduces a four‑step pipeline built around $\mathrm{SVD}$ on concatenated weight deltas to obtain a shared basis, followed by per‑task renormalization, pruning, and merging via sign election and disjoint averaging; the merged delta is then mapped back to the original parameter space. Across vision and language models, including ViT, DeBERTa, T5, and Llama3.1‑8B, DRM achieves state‑of‑the‑art results in both full finetuning and LoRA setups, with renormalization identified as the key factor enabling stable joint representations. The work demonstrates a practical, data‑efficient approach to constructing multitask models by fusing existing finetuned checkpoints without retraining from scratch. Overall, DRM provides a principled, scalable solution to robust knowledge fusion in neural networks with broad applicability to cross‑domain merging.
Abstract
In the era of large-scale training, model merging has evolved into a tool for creating multitasking models efficiently. It enables the knowledge of models to be fused, without the need for heavy computation as required in traditional multitask learning. Existing merging methods often assume that entries at identical positions in weight matrices serve the same function, enabling straightforward entry-wise comparison and merging. However, this assumption overlooks the complexity of finetuned neural networks, where neurons may develop distinct feature compositions, making direct entry-wise merging problematic. We present Decom-Renorm-Merge (DRM), a simple yet effective approach that leverages Singular Value Decomposition to decompose and coordinate weight matrices into an aligned joint space, where entry-wise merging becomes possible. We showcase the effectiveness of DRM across various settings ranging from smaller encoder-based such as ViT and DeBERTa, encoder-decoder-based such as T5, and larger decoder-based such as Llama3.1-8B. Our experimental results show that DRM outperforms several state-of-the-art merging techniques across full finetuning and low-rank adaptation settings. Moreover, our analysis reveals renormalization as the crucial component for creating a robust and even joint space for merging, significantly contributing to the method's performance.
