Swing-by Dynamics in Concept Learning and Compositional Generalization
Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka
TL;DR
This work addresses how diffusion models learn compositional concepts and generalize to unseen, out-of-distribution compositions. It introduces Structured Identity Mapping (SIM) as a tractable abstraction to study concept-learning dynamics, formalizing SIM with Gaussian-cluster data and a regression identity task. Through theoretical analysis of one-layer and symmetric two-layer linear models, it explains learning order, terminal slow-down, and a novel Swing-by Dynamics, connecting stage-wise Jacobian evolution to observed phenomena. The authors validate key predictions by training diffusion models on concept-space tasks, showing non-monotonic loss trajectories and exponential deceleration of concept-space learning, thereby bridging theory and practice in compositional generalization. Overall, SIM provides a mechanistic lens on how modern generative models acquire and manipulate concepts, with implications for designing models that generalize compositionally to novel data."
Abstract
Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, respecting the compositional hierarchy of the data-generating process. Moreover, concept-centric structures within the data significantly influence a model's speed of learning the ability to manipulate a concept. In this paper, we aim to better characterize these empirical results from a theoretical standpoint. Specifically, we propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights -- e.g., we find a novel mechanism for non-monotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models.
