Embodied World Models Emerge from Navigational Task in Open-Ended Environments
Li Jin, Liu Jia
TL;DR
The paper investigates whether embodied sensorimotor interaction suffices for the spontaneous emergence of compact world models in artificial agents. By training a gated-recurrent agent in thousands of open-ended 10×10 mazes with sparse rewards and partial observation, the authors cast the closed agent–environment loop as a Hybrid Dynamical System and demonstrate stable limit-cycle strategies. They introduce Ridge Representation to map entire trajectories into fixed-size behavioral images and use Canonical Correlation Analysis to reveal a high-dimensional linear alignment between neural activations and Ridge-based behavioral geometry, with causal interventions confirming the importance of highly correlated neural dimensions. Collectively, the work provides mechanistic evidence that embodied interaction can produce interpretable, transferable spatial representations, and offers a principled toolkit (HDS, Ridge, CCA, and cyclic stimulation) for diagnosing embodied intelligence in navigation agents.
Abstract
Spatial reasoning in partially observable environments has often been approached through passive predictive models, yet theories of embodied cognition suggest that genuinely useful representations arise only when perception is tightly coupled to action. Here we ask whether a recurrent agent, trained solely by sparse rewards to solve procedurally generated planar mazes, can autonomously internalize metric concepts such as direction, distance and obstacle layout. After training, the agent consistently produces near-optimal paths in unseen mazes, behavior that hints at an underlying spatial model. To probe this possibility, we cast the closed agent-environment loop as a hybrid dynamical system, identify stable limit cycles in its state space, and characterize behavior with a Ridge Representation that embeds whole trajectories into a common metric space. Canonical correlation analysis exposes a robust linear alignment between neural and behavioral manifolds, while targeted perturbations of the most informative neural dimensions sharply degrade navigation performance. Taken together, these dynamical, representational, and causal signatures show that sustained sensorimotor interaction is sufficient for the spontaneous emergence of compact, embodied world models, providing a principled path toward interpretable and transferable navigation policies.
