Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
Zecheng Pan, Zhikang Chen, Ding Li, Min Zhang, Sen Cui, Hongshuo Jin, Luqi Tao, Yi Yang, Deheng Ye, Yu Zhang, Tingting Zhu, Tianling Ren
TL;DR
This work tackles the problem of merging fine-tuned task-specific models without accessing historical data or retraining. It introduces Optimal Transport-based Masked Fusion (OTMF), which uses learnable masks guided by Sinkhorn-distance distribution alignment to fuse task vectors while preserving the semantic geometry of each task. The method enables continual fusion with constant memory by reusing the merged model and only updating lightweight masks and, optionally, a classification head, yielding strong accuracy and robustness across vision and language benchmarks. Empirically, OTMF achieves state-of-the-art performance in both accuracy and efficiency, demonstrating practical value for scalable, replay-free multi-task deployment in privacy- or resource-constrained settings.
Abstract
Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.
