An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning
TL;DR
The paper introduces emulated fine-tuning (EFT), a framework to decouple the scale of pre-training and fine-tuning in large language models by representing the fine-tuning impact as a behavior delta added to base logits from a possibly different-sized pre-trained model. EFT enables cross-scale sampling (up-scaling and down-scaling), test-time reward interpolation between helpfulness and harmlessness, and practical techniques like speculative decoding to accelerate inference. Empirical results across Llama-1, Llama-2, and Falcon families show pre-training scale mainly boosts factuality while fine-tuning scale boosts helpfulness, with up-scaling delivering substantial factual gains for small fine-tuned models. The work also demonstrates test-time behavior control without retraining and presents conservative decoding strategies to stabilize EFT samples, offering a versatile, computationally efficient path to leveraging large base models with smaller fine-tuned models.
Abstract
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?" Using an RL-based framework derived from recent developments in learning from human preferences, we introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates (or 'emulates') the result of pre-training and fine-tuning at different scales. Our experiments with EFT show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality. Beyond decoupling scale, we show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models, essentially emulating the result of fine-tuning the large pre-trained model. Up-scaling consistently improves helpfulness and factuality of instruction-following models in the Llama, Llama-2, and Falcon families, without additional hyperparameters or training.
