FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion
Anke Tang, Li Shen, Yong Luo, Enneng Yang, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao
TL;DR
FusionBench introduces the first unified benchmark and library for deep model fusion, addressing inconsistent evaluations by providing a modular, Hydra-configured framework with Algorithm, Model Pool, and Task Pool components. It implements a broad taxonomy of fusion methods (ensemble, merging, mixing) and includes model-collection integrations across CLIP-ViT, ResNet-50, GPT-2, and Flan-T5, enabling cross-domain multi-task evaluation. Experimental results show adaptive and MoE-based fusion methods often outperform baselines and pre-trained models, while highlighting generalization and robustness challenges on unseen tasks and corrupted data, as well as scaling potential for large models and LLMs. The work provides extensive documentation, tutorials, and tutorials and invites community contributions to advance standardized evaluation and development in deep model fusion.
Abstract
Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single better-performing model in a cost-effective and data-efficient manner. Although a variety of deep model fusion techniques have been introduced, their evaluations tend to be inconsistent and often inadequate to validate their effectiveness and robustness. We present FusionBench, the first benchmark and a unified library designed specifically for deep model fusion. Our benchmark consists of multiple tasks, each with different settings of models and datasets. This variety allows us to compare fusion methods across different scenarios and model scales. Additionally, FusionBench serves as a unified library for easy implementation and testing of new fusion techniques. FusionBench is open source and actively maintained, with community contributions encouraged. Homepage https://github.com/tanganke/fusion_bench
