FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

Hao Mark Chen; Shell Xu Hu; Wayne Luk; Timothy Hospedales; Hongxiang Fan

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

Hao Mark Chen, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan

TL;DR

This work addresses the challenge of scaling model merging to large pools of open-source, partly unknown fine-tuned checkpoints while maintaining robustness to irrelevant models. It reframes merging as constrained optimization over the convex hull of checkpoints and leverages a Frank-Wolfe–style iterative procedure that selects the most relevant model via a linear minimization oracle and merges it with a stable, feasible update. The proposed FW-Merging framework includes design options such as Hard vs Soft FW and Task-wise vs Layer-wise LMO, achieving strong empirical gains on both language and vision tasks and showing constant memory overhead even as the model pool grows. The results demonstrate that FW-Merging can outperform data-free and data-informed baselines as well as traditional MTL approaches, offering a scalable, data-efficient alternative for merging diverse open-source models with practical impact for multi-task deployment. The work also provides open-source code to facilitate adoption in real-world settings.

Abstract

Model merging has emerged as a promising approach for multi-task learning (MTL), offering a data-efficient alternative to conventional fine-tuning. However, with the rapid development of the open-source AI ecosystem and the increasing availability of fine-tuned foundation models, existing model merging methods face two key limitations: (i) They are primarily designed for in-house fine-tuned models, making them less adaptable to diverse model sources with partially unknown model and task information, (ii) They struggle to scale effectively when merging numerous model checkpoints. To address these challenges, we formulate model merging as a constrained optimization problem and introduce a novel approach: Frank-Wolfe Merging (FW-Merging). Inspired by Frank-Wolfe optimization, our approach iteratively selects the most relevant model in the pool to minimize a linear approximation of the objective function and then executes a local merging similar to the Frank-Wolfe update. The objective function is designed to capture the desired behavior of the target-merged model, while the fine-tuned candidate models define the constraint set. More importantly, FW-Merging serves as an orthogonal technique for existing merging methods, seamlessly integrating with them to further enhance accuracy performance. Our experiments show that FW-Merging scales across diverse model sources, remaining stable with 16 irrelevant models and improving by 15.3% with 16 relevant models on 20 CV tasks, while maintaining constant memory overhead, unlike the linear overhead of data-informed merging methods. Compared with the state-of-the-art approaches, FW-Merging surpasses the data-free merging method by 32.8% and outperforms the data-informed Adamerging by 8.39% when merging 20 ViT models. Our code is open-sourced at github.com/hmarkc/FW-Merging.

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

TL;DR

Abstract

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)