Table of Contents
Fetching ...

Navigating the Accuracy-Size Trade-Off with Flexible Model Merging

Akash Dhasade, Divyansh Jhunjhunwala, Milos Vujasinovic, Gauri Joshi, Anne-Marie Kermarrec

TL;DR

The paper addresses the challenge of combining multiple fine-tuned models without data, by analyzing the accuracy-size trade-off across the full spectrum of deployed sizes. It introduces FlexMerge, a data-free, block-level merging framework that greedily fuses task-specific blocks and accommodates multiple merging algorithms within a unified workflow. Key findings show that modest increases in deployed size can yield large accuracy gains, and that algorithm rankings vary with size, motivating evaluation beyond the single-model endstate. The framework demonstrates strong empirical performance across vision, NLP, and multi-modal benchmarks, offering practical benefits in storage, inference, and generalization, with efficient merging and reconstruction behavior. Overall, FlexMerge provides a versatile design space for scalable, data-free multi-task fusion that can adapt to deployment constraints and task counts.

Abstract

Model merging has emerged as an efficient method to combine multiple single-task fine-tuned models. The merged model can enjoy multi-task capabilities without expensive training. While promising, merging into a single model often suffers from an accuracy gap with respect to the fine-tuned models. On the other hand, deploying all individual fine-tuned models incurs high storage costs. We propose FlexMerge, a novel data-free model merging framework that: (a) flexibly generates merged models of varying sizes, spanning the full spectrum from a single merged model to retaining all fine-tuned models; and (b) supports multiple merging algorithms in a unified framework. Using FlexMerge, we systematically characterize the accuracy-size trade-off of different algorithms. Our study reveals two key findings: first, even modestly larger merged models can yield steep accuracy gains (up to 13.5% when just doubling the size); second, algorithm rankings are not consistent as size increases, with some methods overtaking others beyond the one-model regime. These results uncover a new design dimension for model merging: developing and comparing algorithms across the full spectrum of sizes rather than only at the single-model limit. Extensive experiments on vision and NLP benchmarks, with up to 30 tasks, confirm the generality and practicality of FlexMerge.

Navigating the Accuracy-Size Trade-Off with Flexible Model Merging

TL;DR

The paper addresses the challenge of combining multiple fine-tuned models without data, by analyzing the accuracy-size trade-off across the full spectrum of deployed sizes. It introduces FlexMerge, a data-free, block-level merging framework that greedily fuses task-specific blocks and accommodates multiple merging algorithms within a unified workflow. Key findings show that modest increases in deployed size can yield large accuracy gains, and that algorithm rankings vary with size, motivating evaluation beyond the single-model endstate. The framework demonstrates strong empirical performance across vision, NLP, and multi-modal benchmarks, offering practical benefits in storage, inference, and generalization, with efficient merging and reconstruction behavior. Overall, FlexMerge provides a versatile design space for scalable, data-free multi-task fusion that can adapt to deployment constraints and task counts.

Abstract

Model merging has emerged as an efficient method to combine multiple single-task fine-tuned models. The merged model can enjoy multi-task capabilities without expensive training. While promising, merging into a single model often suffers from an accuracy gap with respect to the fine-tuned models. On the other hand, deploying all individual fine-tuned models incurs high storage costs. We propose FlexMerge, a novel data-free model merging framework that: (a) flexibly generates merged models of varying sizes, spanning the full spectrum from a single merged model to retaining all fine-tuned models; and (b) supports multiple merging algorithms in a unified framework. Using FlexMerge, we systematically characterize the accuracy-size trade-off of different algorithms. Our study reveals two key findings: first, even modestly larger merged models can yield steep accuracy gains (up to 13.5% when just doubling the size); second, algorithm rankings are not consistent as size increases, with some methods overtaking others beyond the one-model regime. These results uncover a new design dimension for model merging: developing and comparing algorithms across the full spectrum of sizes rather than only at the single-model limit. Extensive experiments on vision and NLP benchmarks, with up to 30 tasks, confirm the generality and practicality of FlexMerge.

Paper Structure

This paper contains 34 sections, 7 equations, 28 figures, 6 tables, 1 algorithm.

Figures (28)

  • Figure 1: (a) Fine-tuned models are sequences of blocks. FlexMerge iteratively merges block pairs until reaching the desired size ( size $1.75\times$). (b) Algorithm rankings change as size is increased.
  • Figure 2: FlexMerge enables large accuracy gains when just doubling the deployed model size and attains full accuracy well before the maximum size.
  • Figure 3: Merging $8$ (top) and $30$ (bottom) tasks. The accuracy-size trade-off shows rapid initial gains, followed by gradual improvement, reaching near fine-tuning accuracy well before the maximum size.
  • Figure 4: FlexMerge + TA gains $7.2\%$ for (IA)$^3$ going from $1\times$ to $3\times$ and more than $9\%$ for FFT when just doubling the size from $1\times$ to $2\times$. EMR begins with higher accuracy, yet, substantially benefits from increased size.
  • Figure 5: (Left) FlexMerge + TA outperforms Channel Merging + TA across all sizes. (Center, Right) Algorithm rankings shift even at modestly larger sizes, with simpler methods rivaling advanced ones. We show sizes just over Consensus and EMR-Merging's lowest size for a wholistic comparison.
  • ...and 23 more figures