U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Tung-Yu Wu, Pei-Yu Lo
TL;DR
This work analyzes how large language models exhibit emergent abilities by grouping questions by difficulty and observing distinct scaling trends: inverted-U for easy questions and U-shaped for hard ones. The authors introduce the Target-Conditioned (TC) Brier Score to measure performance continuously and define an emergence threshold $T$ where sharp improvements occur. They propose Slice-and-Sandwich, a pipeline that fits separate easy and hard scaling trends below $T$, averages them, and maps the forecast back to traditional accuracy metrics to predict post-threshold performance. Across multiple benchmarks, this approach forecasts emergent behavior more accurately than sigmoid-based baselines and offers an explainable framework for anticipating sharp performance increases. The method provides practical value for monitoring and forecasting LLM capabilities, with potential applications in deployment planning and safety assessment.
Abstract
Large language models (LLMs) have been shown to exhibit emergent abilities in some downstream tasks, where model performance stagnates at first and then improves sharply and unpredictably with scale beyond a threshold. In this work, we investigate the phenomenon by grouping questions based on difficulty level and provide a possible explanation for emergent abilities. Specifically, we observe U-shaped scaling for hard questions and inverted-U scaling followed by steady improvement for easy questions. The two scaling patterns initially offset each other, causing stagnant overall performance. The performance starts to soar when the scaling pattern of easy questions reverts from inverse to standard scaling, leading to emergent abilities. Based on this finding, we propose a simple yet effective pipeline, called Slice-and-Sandwich, to predict the emergence threshold and model performance beyond the threshold. Our code is publicly available at https://github.com/tony10101105/ExpEmergence.
