LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Shitao Xiao; Zheng Liu; Peitian Zhang; Xingrun Xing

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Shitao Xiao, Zheng Liu, Peitian Zhang, Xingrun Xing

TL;DR

LM-Cocktail addresses catastrophic forgetting from task-specific fine-tuning by merging the target-tuned model with the pre-trained base model and peer-domain models using weights derived from a small set of few-shot examples. The method computes merging weights via a softmax over target-domain losses and combines models through simple parameter averaging, yielding a resilient-tuned model that preserves general capabilities while maintaining target-task performance. Empirical results on decoder-based LMs (Llama-2-chat-7b) and encoder-based LMs (BGE) across FLAN, MMLU, and MTEB demonstrate strong general-domain resilience with competitive or superior task-specific accuracy, and the approach remains effective even when fine-tuning data is unavailable. The technique is lightweight, training-free beyond the merging step, and broadly applicable across architectures, with open-source code provided.”

Abstract

The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

TL;DR

Abstract

Paper Structure (24 sections, 4 equations, 3 figures, 9 tables)

This paper contains 24 sections, 4 equations, 3 figures, 9 tables.

Introduction
LM-Cocktail
General Paradigm
Variations
Experimental setup
Decoder-based LM
Encoder-based LM
Experimental Results
Overall Comparison
Analysis on decoder-based LM
Analysis on encoder-based LM
LM-Cocktail without Fine-tuning
Analysis on Decoder-based LM
Analysis on Encoder-based LM
Impact of Weight $\alpha$
...and 9 more sections

Figures (3)

Figure 1: The illustration of LM-Cocktail. Fine-tuning for the target task will lead to severe degeneration of LM’s general capabilities beyond the targeted domain. LM-Cocktail can increase accuracy on new target tasks while maintaining its accuracy on other tasks.
Figure 2: Performance with different $\alpha$.
Figure 3: Performance of encoder-based LMs with different merging weights.

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

TL;DR

Abstract

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (3)