Merge to Mix: Mixing Datasets via Model Merging
Zhixu Silvia Tao, Kasper Vinken, Hao-Wei Yeh, Avi Cooper, Xavier Boix
TL;DR
This work tackles the challenge of selecting dataset mixtures for fine-tuning large language models when task-specific data are limited. It introduces Merge to Mix, a surrogate approach that leverages model merging by averaging individually fine-tuned models to approximate the performance of a model fine-tuned on any dataset mixture, thereby avoiding exhaustive mixture fine-tuning. The authors formalize the notation, propose a surrogate-based optimization, and provide an explicit algorithm that first fine-tunes on each dataset and then evaluates merged composites across all candidate mixtures. Empirically, they demonstrate a strong positive correlation between merged-model performance and actual mixture-fine-tuned performance across vision and language tasks, outperforming similarity-based baselines and approaching oracle performance. This method advances data-centric fine-tuning by enabling scalable, data-efficient mixture selection with potential extensions to weighted mixing and hybrid search strategies.
Abstract
Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $\textit{Merge to Mix}$, that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.
