Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park, Maya Okawa, Andrew Lee, Hidenori Tanaka, Ekdeep Singh Lubana
TL;DR
The paper introduces the concept space framework to study how generative models acquire and manipulate abstract concepts, defining concept signal σ_i = |∂G(z)/∂z_i| as the driver of learning speed and revealing sudden transitions that mark the emergence of hidden capabilities. Through synthetic toy datasets and diffusion models, it demonstrates that stronger concept signals accelerate concept learning and shape generalization trajectories, with an initial memorization phase giving way to true out-of-distribution competence after a transition point. It further shows that latent interventions can elicit these hidden capabilities before naive prompting succeeds, and that underspecification can delay or entangle concept learning, a pattern that also appears in realistic data like CelebA. The framework offers a lens for benchmarking, understanding, and potentially guiding training to uncover latent generalization abilities, while acknowledging limitations and directions for extension to more complex concept structures and real-world tasks.
Abstract
Modern generative models demonstrate impressive capabilities, likely stemming from an ability to identify and manipulate abstract concepts underlying their training data. However, fundamental questions remain: what determines the concepts a model learns, the order in which it learns them, and its ability to manipulate those concepts? To address these questions, we propose analyzing a model's learning dynamics via a framework we call the concept space, where each axis represents an independent concept underlying the data generating process. By characterizing learning dynamics in this space, we identify how the speed at which a concept is learned, and hence the order of concept learning, is controlled by properties of the data we term concept signal. Further, we observe moments of sudden turns in the direction of a model's learning dynamics in concept space. Surprisingly, these points precisely correspond to the emergence of hidden capabilities, i.e., where latent interventions show the model possesses the capability to manipulate a concept, but these capabilities cannot yet be elicited via naive input prompting. While our results focus on synthetically defined toy datasets, we hypothesize a general claim on emergence of hidden capabilities may hold: generative models possess latent capabilities that emerge suddenly and consistently during training, though a model might not exhibit these capabilities under naive input prompting.
