Table of Contents
Fetching ...

Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari

TL;DR

MoS tackles the challenge of heterogeneity and imbalance in fine-tuning data for large language models by introducing a model-agnostic reinforcement learning framework that learns to optimize data usage via a lightweight scorer network. Three rewards—transferability, difficulty, and learning trajectory—drive dynamic dataset reweighting, enabling the model to allocate attention to datasets that most improve overall performance, with EMA smoothing stabilizing learning. Empirical results across three backbones (Qwen1.5-0.5B, Gemma-2B, Llama-3-8B) on MMLU and MT-bench show that MoS outperforms heuristic baselines and speeds up convergence by about 2.2x, while larger models benefit more. MoSpec extends MoS to task-specific fine-tuning by adjusting rewards to harness diverse datasets for specialized objectives, demonstrating effective generalist-to-specialist transitions without task-overfitting to a validation set.

Abstract

Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each skill while ensuring the model maintains its overall performance requires sophisticated techniques and careful dataset curation. In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. This framework ensures the optimal comprehensive skill development of LLMs by dynamically adjusting the focus on different datasets based on their current learning state. To validate the effectiveness of MoS, we conduct extensive experiments using three diverse LLM backbones on two widely used benchmarks and demonstrate that MoS substantially enhances model performance. Building on the success of MoS, we propose MoSpec, an adaptation for task-specific fine-tuning, which harnesses the utilities of various datasets for a specific purpose. Our work underlines the significance of dataset rebalancing and present MoS as a powerful, general solution for optimizing data usage in the fine-tuning of LLMs for various purposes.

Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

TL;DR

MoS tackles the challenge of heterogeneity and imbalance in fine-tuning data for large language models by introducing a model-agnostic reinforcement learning framework that learns to optimize data usage via a lightweight scorer network. Three rewards—transferability, difficulty, and learning trajectory—drive dynamic dataset reweighting, enabling the model to allocate attention to datasets that most improve overall performance, with EMA smoothing stabilizing learning. Empirical results across three backbones (Qwen1.5-0.5B, Gemma-2B, Llama-3-8B) on MMLU and MT-bench show that MoS outperforms heuristic baselines and speeds up convergence by about 2.2x, while larger models benefit more. MoSpec extends MoS to task-specific fine-tuning by adjusting rewards to harness diverse datasets for specialized objectives, demonstrating effective generalist-to-specialist transitions without task-overfitting to a validation set.

Abstract

Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each skill while ensuring the model maintains its overall performance requires sophisticated techniques and careful dataset curation. In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. This framework ensures the optimal comprehensive skill development of LLMs by dynamically adjusting the focus on different datasets based on their current learning state. To validate the effectiveness of MoS, we conduct extensive experiments using three diverse LLM backbones on two widely used benchmarks and demonstrate that MoS substantially enhances model performance. Building on the success of MoS, we propose MoSpec, an adaptation for task-specific fine-tuning, which harnesses the utilities of various datasets for a specific purpose. Our work underlines the significance of dataset rebalancing and present MoS as a powerful, general solution for optimizing data usage in the fine-tuning of LLMs for various purposes.
Paper Structure (42 sections, 11 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 42 sections, 11 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: The overview of Mixture-of-Skills. The training collection $D_{\textrm{trn}} = \{D_{\textrm{trn}}^i\}_{i=1}^{N}$ consists of various SFT datasets, with $D_{\textrm{trn}}^{i}$ indicating the $i$-th dataset. Please refer to \ref{['sec:method']} for more details.
  • Figure 2: Learned dataset distribution given by Llama-3-8B with different variations of MoS. The $x$-axis indicates the training steps, and the $y$-axis indicates the sampling probabilities of datasets.
  • Figure 3: Training loss curves of heuristic baselines and MoS + Diff + EMA.
  • Figure 4: Learned dataset distribution given by Llama-3-8B with MoSpec + CosSim + EMA (left) and MoSpec + Diff + EMA (right).