Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Core Francisco Park; Maya Okawa; Andrew Lee; Hidenori Tanaka; Ekdeep Singh Lubana

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Core Francisco Park, Maya Okawa, Andrew Lee, Hidenori Tanaka, Ekdeep Singh Lubana

TL;DR

The paper introduces the concept space framework to study how generative models acquire and manipulate abstract concepts, defining concept signal σ_i = |∂G(z)/∂z_i| as the driver of learning speed and revealing sudden transitions that mark the emergence of hidden capabilities. Through synthetic toy datasets and diffusion models, it demonstrates that stronger concept signals accelerate concept learning and shape generalization trajectories, with an initial memorization phase giving way to true out-of-distribution competence after a transition point. It further shows that latent interventions can elicit these hidden capabilities before naive prompting succeeds, and that underspecification can delay or entangle concept learning, a pattern that also appears in realistic data like CelebA. The framework offers a lens for benchmarking, understanding, and potentially guiding training to uncover latent generalization abilities, while acknowledging limitations and directions for extension to more complex concept structures and real-world tasks.

Abstract

Modern generative models demonstrate impressive capabilities, likely stemming from an ability to identify and manipulate abstract concepts underlying their training data. However, fundamental questions remain: what determines the concepts a model learns, the order in which it learns them, and its ability to manipulate those concepts? To address these questions, we propose analyzing a model's learning dynamics via a framework we call the concept space, where each axis represents an independent concept underlying the data generating process. By characterizing learning dynamics in this space, we identify how the speed at which a concept is learned, and hence the order of concept learning, is controlled by properties of the data we term concept signal. Further, we observe moments of sudden turns in the direction of a model's learning dynamics in concept space. Surprisingly, these points precisely correspond to the emergence of hidden capabilities, i.e., where latent interventions show the model possesses the capability to manipulate a concept, but these capabilities cannot yet be elicited via naive input prompting. While our results focus on synthetically defined toy datasets, we hypothesize a general claim on emergence of hidden capabilities may hold: generative models possess latent capabilities that emerge suddenly and consistently during training, though a model might not exhibit these capabilities under naive input prompting.

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

TL;DR

Abstract

Paper Structure (34 sections, 2 equations, 23 figures)

This paper contains 34 sections, 2 equations, 23 figures.

Introduction
Related Work
Concept Space: A Framework for Analyzing Concept Learning Dynamics
Experimental and Evaluation Setup
Evaluation Metric.
Learning Dynamics in Concept Space
Concept Signal Determines Learning Speed
Concept Signal Governs Generalization Dynamics
Towards a Landscape Theory of Learning Dynamics
Sudden Transitions in Concept Learning Dynamics
Additional Results on Realistic Data
Effect of Underspecification on Learning Dynamics in Concept Space
Influence of Underspecification on Emergence of Hidden Capabilities.
Discussion
Concept learning vs. grokking
...and 19 more sections

Figures (23)

Figure 1: Concept Learning Geometry underlies emergence. (a) Top: A multimodal model learns to generate the concepts in the order of “astronaut”, “horse”, and finally “riding” as it scales up (adapted from Yu et al.yu2022scaling). Middle: “blue square apple” is generated in the order of “apple”, "blue", and “square”” (adapted from Li et al.li2024scalability). Bottom: Despite its simplicity, our model trained on synthetic data shows concept learning dynamics where it first learns “shape” and then “color”. (b) Concept space is an abstract coordinate space where individual axes correspond to different concepts and a given point corresponds to a "concept class", i.e., a predefined collection of concepts (e.g., large blue circles on bottom left corner). Traversal along axes of the concept space yield change in a specific property of the sample (e.g., going from large blue circle to large red circle along object color axis). Trajectories show a model's dynamics in concept space for learning to generate classes shown in-distribution (blue nodes) versus out of distribution (pink / red nodes). As we show, dynamics in concept space are highly interpretable, enabling precise comments on which concepts the model learns first, why, and what order it follows. (c) Measuring how accurately a model generates samples from a given concept class, showing an order of concept learning: first background color is learned, then size, and then object color.
Figure 2: Concept spaces with different concept values see different concept signal. Consider a concept space comprised of concepts size and color. (Left) The color separation between the classes is stronger than the size separation, resulting in a stronger concept signal in the color dimension. (right) The size separation between the classes is stronger, thus resulting in a stronger signal for size.
Figure 3: Concept signal determines learning speed. The speed of concept learning as an inverse of the time in gradient steps when the separation in color (left) and size (right) between different classes increases. Concept learning is faster when pixel differences among concept class and hence concepts are larger.
Figure 4: Concept signal governs generalization dynamics. (a) Learning dynamics in the concept space for in-distribution concept class 00 (bottom left). (b) Learning dynamics for out-of-distribution (OOD) concept class 11 (top right). We plot the accuracy for color on the x-axis and size on the y-axis. The [0,1) normalized color concept signal level is color coded. Two trajectories for 01 and 10 are shown to illustrate concept memorization. See App. \ref{['sec:error_quant']} for uncertainties.
Figure 5: A Phenomenological Model of Learning Dynamics in Concept Space. Using Eq. \ref{['eq:lagrangian']}, we simulate the learning trajectory for concept class 00 in panel (a) and the OOD class 11 in panel (b). Initially, target values are set at $(0, 1)$ or $(1, 0)$ based on the concept signal strengths for color or size, respectively. As the model progressively learns each concept, the target values gradually shift towards $(1, 1)$. This simple toy model accurately reproduces the observed curves in Fig. \ref{['fig:learning_times']}(c), which arise from concept memorization.
...and 18 more figures

Theorems & Definitions (3)

Definition 1
Definition 2
Definition 3

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

TL;DR

Abstract

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Authors

TL;DR

Abstract

Table of Contents

Figures (23)

Theorems & Definitions (3)