Table of Contents
Fetching ...

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

TL;DR

The paper introduces emulated fine-tuning (EFT), a framework to decouple the scale of pre-training and fine-tuning in large language models by representing the fine-tuning impact as a behavior delta added to base logits from a possibly different-sized pre-trained model. EFT enables cross-scale sampling (up-scaling and down-scaling), test-time reward interpolation between helpfulness and harmlessness, and practical techniques like speculative decoding to accelerate inference. Empirical results across Llama-1, Llama-2, and Falcon families show pre-training scale mainly boosts factuality while fine-tuning scale boosts helpfulness, with up-scaling delivering substantial factual gains for small fine-tuned models. The work also demonstrates test-time behavior control without retraining and presents conservative decoding strategies to stabilize EFT samples, offering a versatile, computationally efficient path to leveraging large base models with smaller fine-tuned models.

Abstract

Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?" Using an RL-based framework derived from recent developments in learning from human preferences, we introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates (or 'emulates') the result of pre-training and fine-tuning at different scales. Our experiments with EFT show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality. Beyond decoupling scale, we show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models, essentially emulating the result of fine-tuning the large pre-trained model. Up-scaling consistently improves helpfulness and factuality of instruction-following models in the Llama, Llama-2, and Falcon families, without additional hyperparameters or training.

An Emulator for Fine-Tuning Large Language Models using Small Language Models

TL;DR

The paper introduces emulated fine-tuning (EFT), a framework to decouple the scale of pre-training and fine-tuning in large language models by representing the fine-tuning impact as a behavior delta added to base logits from a possibly different-sized pre-trained model. EFT enables cross-scale sampling (up-scaling and down-scaling), test-time reward interpolation between helpfulness and harmlessness, and practical techniques like speculative decoding to accelerate inference. Empirical results across Llama-1, Llama-2, and Falcon families show pre-training scale mainly boosts factuality while fine-tuning scale boosts helpfulness, with up-scaling delivering substantial factual gains for small fine-tuned models. The work also demonstrates test-time behavior control without retraining and presents conservative decoding strategies to stabilize EFT samples, offering a versatile, computationally efficient path to leveraging large base models with smaller fine-tuned models.

Abstract

Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?" Using an RL-based framework derived from recent developments in learning from human preferences, we introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates (or 'emulates') the result of pre-training and fine-tuning at different scales. Our experiments with EFT show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality. Beyond decoupling scale, we show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models, essentially emulating the result of fine-tuning the large pre-trained model. Up-scaling consistently improves helpfulness and factuality of instruction-following models in the Llama, Llama-2, and Falcon families, without additional hyperparameters or training.
Paper Structure (22 sections, 5 equations, 8 figures, 2 tables)

This paper contains 22 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Emulated fine-tuning (EFT) enables a principled answer to the question of what happens when we combine what is learned from pre-training a model of one size with what is learned from fine-tuning a model of a different size? Conventional models combine the learnings of pre-training and fine-tuning at the same size (A + B, C + D). In contrast, EFT enables choosing these independently, allowing a principled approach to evaluating the result of A + D and C + B.
  • Figure 2: Emulated fine-tuning combines knowledge from pre-training and fine-tuning at different scales. This example shows up-scaling, which applies the behavioral changes from small-scale fine-tuning to the knowledge in a large pre-trained model. The small fine-tuned model (green) understands the user's query asks about Yo-Yo Ma's place of birth, not year, does not know the correct city. The small pre-trained model (light blue) does not understand the user's query or have reliable knowledge, assigning high probability to the (correct) year of birth of Yo-Yo Ma and both possible places of birth. Their ratio represents the behavior of following user intent (responding only with locations). Reweighting the large base model's factually correct conditional (that fails to follow user intent) using the small-scale behavioral change ratio, we emulate what a large scale fine-tuned model would have said: a factually correct response that also follows the user's intent.
  • Figure 3: Scaling pre-training alone mostly benefits factuality; scaling up fine-tuning alone mostly benefits helpfulness. The bottom group of bars shows that emulating a large fine-tuned model with a small fine-tuned model and large base model produces nearly 70% of the factuality gains compared to the small fine-tuned model alone. Normalized improvements averaged across Llama-1, Llama-2, and Falcon model families and Anthropic-HH and ELI5 datasets.
  • Figure 4: Normalized improvements in factuality and helpfulness from emulated fine-tuning for prompts from Anthropic-HH dialogue dataset. Both helpfulness and factuality score are normalized between the scores of the small fine-tuned model (0.0) and the large fine-tuned model (1.0). Up-scaling (bottom row) combines the behavioral adjustments from fine-tuning at small scale with the knowledge gained by pre-training at large scale, and tends to provide more improvement in factuality. Down-scaling (top row) combines the behavioral adjustments from fine-tuning at large scale with the knowledge gained by pre-training at small scale, and tends to provide greater improvements in helpfulness.
  • Figure 5: Dynamically adjusting the desired tradeoff between helpfulness and harmlessness without retraining. We use EFT to interpolate between two implicit rewards for helpfulness and harmlessness and plot GPT-4-evaluated helpfulness and fraction of responses that are harmful on Anthropic-HH prompts. Combining reward interpolation with up-scaling enables a Pareto improvement in the frontier, all without fine-tuning. Error bars are one standard error.
  • ...and 3 more figures