Table of Contents
Fetching ...

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Da Pan, Shusen Zhang, Xin Wu, Guosheng Dong, Bin Cui, Tengjiao Wang, Wentao Zhang

TL;DR

VersaTune tackles the challenge of training LLMs to master multiple domains without catastrophic forgetting by aligning fine-tuning data with the model's existing domain knowledge distribution. It introduces a two-phase framework: Phase 1 detects domain knowledge distribution via knowledge-consistency training and a domain-probability estimator, and Phase 2 uses dynamic domain weighting guided by learnable potential and forgetting degree to foster balanced multi-domain capabilities while enabling flexible domain expansion. Empirical results show VersaTune achieving a 35.21% improvement over uniform data weighting, with Qwen-2.5-32B + VersaTune surpassing frontier models in several tasks and reducing non-target forgetting by up to 38.77% during expansion. The approach offers an efficient, robust pathway to versatile, domain-aware LLMs suitable for deployment across law, medicine, finance, science, code, and general knowledge tasks.

Abstract

As demonstrated by the proprietary Large Language Models (LLMs) such as GPT and Claude series, LLMs have the potential to achieve remarkable proficiency across a wide range of domains, including law, medicine, finance, science, code, etc., all within a single model. These capabilities are further augmented during the Supervised Fine-Tuning (SFT) phase. Despite their potential, existing work mainly focuses on domain-specific enhancements during fine-tuning, the challenge of which lies in catastrophic forgetting of knowledge across other domains. In this study, we introduce **VersaTune**, a novel data composition framework designed for enhancing LLMs' overall multi-domain capabilities during training. We begin with detecting the distribution of domain-specific knowledge within the base model, followed by the training data composition that aligns with the model's existing knowledge distribution. During the subsequent training process, domain weights are dynamically adjusted based on their learnable potential and forgetting degree. Experimental results indicate that VersaTune is effective in multi-domain fostering, with an improvement of 35.21\% in the overall multi-ability performances compared to uniform domain weights. Furthermore, we find that Qwen-2.5-32B + VersaTune even surpasses frontier models, including GPT-4o, Claude3.5-Sonnet and DeepSeek-V3 by 0.86\%, 4.76\% and 4.60\%. Additionally, in scenarios where flexible expansion of a specific domain is required, VersaTune reduces the performance degradation in other domains by 38.77\%, while preserving the training efficacy of the target domain.

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

TL;DR

VersaTune tackles the challenge of training LLMs to master multiple domains without catastrophic forgetting by aligning fine-tuning data with the model's existing domain knowledge distribution. It introduces a two-phase framework: Phase 1 detects domain knowledge distribution via knowledge-consistency training and a domain-probability estimator, and Phase 2 uses dynamic domain weighting guided by learnable potential and forgetting degree to foster balanced multi-domain capabilities while enabling flexible domain expansion. Empirical results show VersaTune achieving a 35.21% improvement over uniform data weighting, with Qwen-2.5-32B + VersaTune surpassing frontier models in several tasks and reducing non-target forgetting by up to 38.77% during expansion. The approach offers an efficient, robust pathway to versatile, domain-aware LLMs suitable for deployment across law, medicine, finance, science, code, and general knowledge tasks.

Abstract

As demonstrated by the proprietary Large Language Models (LLMs) such as GPT and Claude series, LLMs have the potential to achieve remarkable proficiency across a wide range of domains, including law, medicine, finance, science, code, etc., all within a single model. These capabilities are further augmented during the Supervised Fine-Tuning (SFT) phase. Despite their potential, existing work mainly focuses on domain-specific enhancements during fine-tuning, the challenge of which lies in catastrophic forgetting of knowledge across other domains. In this study, we introduce **VersaTune**, a novel data composition framework designed for enhancing LLMs' overall multi-domain capabilities during training. We begin with detecting the distribution of domain-specific knowledge within the base model, followed by the training data composition that aligns with the model's existing knowledge distribution. During the subsequent training process, domain weights are dynamically adjusted based on their learnable potential and forgetting degree. Experimental results indicate that VersaTune is effective in multi-domain fostering, with an improvement of 35.21\% in the overall multi-ability performances compared to uniform domain weights. Furthermore, we find that Qwen-2.5-32B + VersaTune even surpasses frontier models, including GPT-4o, Claude3.5-Sonnet and DeepSeek-V3 by 0.86\%, 4.76\% and 4.60\%. Additionally, in scenarios where flexible expansion of a specific domain is required, VersaTune reduces the performance degradation in other domains by 38.77\%, while preserving the training efficacy of the target domain.

Paper Structure

This paper contains 37 sections, 6 equations, 14 figures, 7 tables, 3 algorithms.

Figures (14)

  • Figure 1: Overview of VersaTune. We begin by probing the knowledge distribution within the base model $M_\theta$, utilizing a proprietary model $M_P$ to estimate the probability of sequences generated by $M_\theta$ belonging to various domains. Throughout the efficient fine-tuning process, we dynamically adjust the data domain ratios in response to $M_\theta$'s real-time performance feedback, with learnable potential and forgetting degree serving as evaluative metrics.
  • Figure 2: Performances of Qwen-2-7B on versatile tasks across different domains for multi-ability fostering.
  • Figure 3: Domain expansion for medicine domain. We evaluated checkpoints from each epoch. Left (a) presents the grouped stacked bar chart showing the growth or loss of capabilities in non-target domains compared to the pre-fine-tuning state. Within each group, the left, center, and right bars represent: (1) 100% specific domain fine-tuning, (2) domain increase with uniform distribution of remainder, and (3) VersaTune implementation based on \ref{['alg:domain_expansion']}. Right (b) features the line chart depicting the enhancement of the medicine domain's capabilities.
  • Figure 4: The average scores of models' performances across domains during the domain expansion process, with detailed domain variations provided in \ref{['fig:domain_expansion_ablation']}.
  • Figure 5: Illustration of the LLMs training workflow. In the pretraining phase, raw documents are concatenated into a sequence using special tokens such as <BOS> (Beginning of Sequence) and <EOS> (End of Sequence), thereby endowing the LLM with fundamental language generation capabilities. In the fine-tuning phase, the model's abilities in various domains are further enhanced.
  • ...and 9 more figures