Table of Contents
Fetching ...

Merge to Mix: Mixing Datasets via Model Merging

Zhixu Silvia Tao, Kasper Vinken, Hao-Wei Yeh, Avi Cooper, Xavier Boix

TL;DR

This work tackles the challenge of selecting dataset mixtures for fine-tuning large language models when task-specific data are limited. It introduces Merge to Mix, a surrogate approach that leverages model merging by averaging individually fine-tuned models to approximate the performance of a model fine-tuned on any dataset mixture, thereby avoiding exhaustive mixture fine-tuning. The authors formalize the notation, propose a surrogate-based optimization, and provide an explicit algorithm that first fine-tunes on each dataset and then evaluates merged composites across all candidate mixtures. Empirically, they demonstrate a strong positive correlation between merged-model performance and actual mixture-fine-tuned performance across vision and language tasks, outperforming similarity-based baselines and approaching oracle performance. This method advances data-centric fine-tuning by enabling scalable, data-efficient mixture selection with potential extensions to weighted mixing and hybrid search strategies.

Abstract

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $\textit{Merge to Mix}$, that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.

Merge to Mix: Mixing Datasets via Model Merging

TL;DR

This work tackles the challenge of selecting dataset mixtures for fine-tuning large language models when task-specific data are limited. It introduces Merge to Mix, a surrogate approach that leverages model merging by averaging individually fine-tuned models to approximate the performance of a model fine-tuned on any dataset mixture, thereby avoiding exhaustive mixture fine-tuning. The authors formalize the notation, propose a surrogate-based optimization, and provide an explicit algorithm that first fine-tunes on each dataset and then evaluates merged composites across all candidate mixtures. Empirically, they demonstrate a strong positive correlation between merged-model performance and actual mixture-fine-tuned performance across vision and language tasks, outperforming similarity-based baselines and approaching oracle performance. This method advances data-centric fine-tuning by enabling scalable, data-efficient mixture selection with potential extensions to weighted mixing and hybrid search strategies.

Abstract

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, , that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.

Paper Structure

This paper contains 21 sections, 13 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Correlation plots between the performance of mixture-fine-tuned models and different metrics on target tasks. The reported correlation is the average of per-dataset-correlation.
  • Figure 2: Individual correlation plot between merged model and mixture-fine-tuned model for each target image classification task.
  • Figure 3: Individual correlation plot between the performance of merged models and mixture-fine-tuned models for each target language task.
  • Figure 4: Correlation plots between the performance of the mixture-fine-tuned models and average of average cosine similarity metric.
  • Figure 5: Correlation plots between the performance of the mixture-fine-tuned models and average of average $L_2$ score.
  • ...and 2 more figures