Overtrained Language Models Are Harder to Fine-Tune
Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, Aditi Raghunathan
TL;DR
This work challenges the assumption that more pre-training data universally improves downstream performance by revealing catastrophic overtraining, where extended pre-training harms post-training results. It establishes a dual approach: extensive empirical studies showing real-world degradation after instruction and multimodal fine-tuning, plus controlled linear-model analyses that reveal progressive sensitivity to parameter updates. The authors formalize the phenomenon in a two-layer linear transfer-learning framework, proving inflection points and the inevitability of degradation without regularization, while also showing how learning-rate strategies and regularization can delay but not always eliminate the effect. The findings call for a reevaluation of pre-training strategies and highlight the need to balance base-model gains with downstream adaptability in practical deployments.
Abstract
Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on multiple standard LLM benchmarks than its 2.3T token counterpart. Through controlled experiments and theoretical analysis, we show that catastrophic overtraining arises from a systematic increase in the broad sensitivity of pre-trained parameters to modifications, including but not limited to fine-tuning. Our findings call for a critical reassessment of pre-training design that considers the downstream adaptability of the model.
