Building spatial world models from sparse transitional episodic memories
Zizhan He, Maxime Daigle, Pouya Bashivan
TL;DR
This work addresses how a neural model can rapidly construct a spatial world model from sparse episodic memories. The Episodic Spatial World Model (ESWM) meta-trains across diverse environments to infer unseen transitions from minimal one-step memories, producing a latent space that maps closely to actual environment geometry. ESWM enables near-optimal exploration and navigation in novel spaces, supports fast adaptation to structural changes through memory edits, and offers planning capabilities via imagination and latent-heuristic search. The approach decouples memory and reasoning, allowing flexible inference and robust performance in obstacle-rich environments with limited data, which has implications for data-efficient autonomous navigation and cognitive-inspired spatial reasoning.
Abstract
Many animals possess a remarkable capacity to rapidly construct flexible mental models of their environments. These world models are crucial for ethologically relevant behaviors such as navigation, exploration, and planning. The ability to form episodic memories and make inferences based on these sparse experiences is believed to underpin the efficiency and adaptability of these models in the brain. Here, we ask: Can a neural network learn to construct a spatial model of its surroundings from sparse and disjoint episodic memories? We formulate the problem in a simulated world and propose a novel framework, the Episodic Spatial World Model (ESWM), as a potential answer. We show that ESWM is highly sample-efficient, requiring minimal observations to construct a robust representation of the environment. It is also inherently adaptive, allowing for rapid updates when the environment changes. In addition, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training.
