Emergent Abilities in Large Language Models: A Survey
Leonardo Berti, Flavio Giorgi, Gjergji Kasneci
TL;DR
This survey analyzes emergent abilities in large language models, tracing how scale, training dynamics, and prompting shape abrupt, task-specific capabilities whose existence and predictability are debated. It synthesizes evidence on metrics, loss dynamics, quantization, task complexity, and implicit representations, while documenting the rise of Large Reasoning Models and LLM-powered agents. The authors propose a taxonomy to organize origins, manifestations, and mitigation strategies, and highlight significant safety and governance implications of emergent, potentially deceptive or manipulative behaviors. They argue for more robust evaluation frameworks and targeted research into prediction, mitigation, and responsible deployment of increasingly capable AI systems.
Abstract
Large Language Models (LLMs) are leading a new technological revolution as one of the most promising research streams toward artificial general intelligence. The scaling of these models, accomplished by increasing the number of parameters and the magnitude of the training datasets, has been linked to various so-called emergent abilities that were previously unobserved. These emergent abilities, ranging from advanced reasoning and in-context learning to coding and problem-solving, have sparked an intense scientific debate: Are they truly emergent, or do they simply depend on external factors, such as training dynamics, the type of problems, or the chosen metric? What underlying mechanism causes them? Despite their transformative potential, emergent abilities remain poorly understood, leading to misconceptions about their definition, nature, predictability, and implications. In this work, we shed light on emergent abilities by conducting a comprehensive review of the phenomenon, addressing both its scientific underpinnings and real-world consequences. We first critically analyze existing definitions, exposing inconsistencies in conceptualizing emergent abilities. We then explore the conditions under which these abilities appear, evaluating the role of scaling laws, task complexity, pre-training loss, quantization, and prompting strategies. Our review extends beyond traditional LLMs and includes Large Reasoning Models (LRMs), which leverage reinforcement learning and inference-time search to amplify reasoning and self-reflection. However, emergence is not inherently positive. As AI systems gain autonomous reasoning capabilities, they also develop harmful behaviors, including deception, manipulation, and reward hacking. We highlight growing concerns about safety and governance, emphasizing the need for better evaluation frameworks and regulatory oversight.
