Table of Contents
Fetching ...

Shared Spatial Memory Through Predictive Coding

Zhengru Fang, Yu Guo, Jingjing Wang, Yuang Zhang, Haonan An, Yinhai Wang, Yuguang Fang

TL;DR

The work tackles how multiple agents build and maintain a shared spatial memory under partial observability and strict bandwidth constraints. It introduces a unified predictive coding framework with three levels: an internal grid-cell–like metric for self-localization, bandwidth-aware social communication via a variational information bottleneck, and a hierarchical policy (HRL-ICM) that actively explores to minimize joint uncertainty. Empirically, grid-cell–like representations spontaneously emerge from self-motion prediction, social place cells arise as partner-sensitive encoders, and memory-efficient communication supports robust cooperative navigation on Memory-Maze under severe bandwidth reductions (e.g., 73.5% success at 128 bits/step down to 64.4% at 4 bits/step, outperforming full-broadcast baselines). The results offer a principled, biologically plausible mechanism by which predictive loss drives the emergence of shared spatial memory and collective intelligence in multi-agent systems.

Abstract

Constructing a consistent shared spatial memory is a critical challenge in multi-agent systems, where partial observability and limited bandwidth often lead to catastrophic failures in coordination. We introduce a multi-agent predictive coding framework that formulates coordination as the minimization of mutual uncertainty among agents. Through an information bottleneck objective, this framework prompts agents to learn not only who and what to communicate but also when. At the foundation of this framework lies a grid-cell-like metric as internal spatial coding for self-localization, emerging spontaneously from self-supervised motion prediction. Building upon this internal spatial code, agents gradually develop a bandwidth-efficient communication mechanism and specialized neural populations that encode partners' locations-an artificial analogue of hippocampal social place cells (SPCs). These social representations are further utilized by a hierarchical reinforcement learning policy that actively explores to reduce joint uncertainty. On the Memory-Maze benchmark, our approach shows exceptional resilience to bandwidth constraints: success degrades gracefully from 73.5% to 64.4% as bandwidth shrinks from 128 to 4 bits/step, whereas a full-broadcast baseline collapses from 67.6% to 28.6%. Our findings establish a theoretically principled and biologically plausible basis for how complex social representations emerge from a unified predictive drive, leading to collective intelligence.

Shared Spatial Memory Through Predictive Coding

TL;DR

The work tackles how multiple agents build and maintain a shared spatial memory under partial observability and strict bandwidth constraints. It introduces a unified predictive coding framework with three levels: an internal grid-cell–like metric for self-localization, bandwidth-aware social communication via a variational information bottleneck, and a hierarchical policy (HRL-ICM) that actively explores to minimize joint uncertainty. Empirically, grid-cell–like representations spontaneously emerge from self-motion prediction, social place cells arise as partner-sensitive encoders, and memory-efficient communication supports robust cooperative navigation on Memory-Maze under severe bandwidth reductions (e.g., 73.5% success at 128 bits/step down to 64.4% at 4 bits/step, outperforming full-broadcast baselines). The results offer a principled, biologically plausible mechanism by which predictive loss drives the emergence of shared spatial memory and collective intelligence in multi-agent systems.

Abstract

Constructing a consistent shared spatial memory is a critical challenge in multi-agent systems, where partial observability and limited bandwidth often lead to catastrophic failures in coordination. We introduce a multi-agent predictive coding framework that formulates coordination as the minimization of mutual uncertainty among agents. Through an information bottleneck objective, this framework prompts agents to learn not only who and what to communicate but also when. At the foundation of this framework lies a grid-cell-like metric as internal spatial coding for self-localization, emerging spontaneously from self-supervised motion prediction. Building upon this internal spatial code, agents gradually develop a bandwidth-efficient communication mechanism and specialized neural populations that encode partners' locations-an artificial analogue of hippocampal social place cells (SPCs). These social representations are further utilized by a hierarchical reinforcement learning policy that actively explores to reduce joint uncertainty. On the Memory-Maze benchmark, our approach shows exceptional resilience to bandwidth constraints: success degrades gracefully from 73.5% to 64.4% as bandwidth shrinks from 128 to 4 bits/step, whereas a full-broadcast baseline collapses from 67.6% to 28.6%. Our findings establish a theoretically principled and biologically plausible basis for how complex social representations emerge from a unified predictive drive, leading to collective intelligence.

Paper Structure

This paper contains 63 sections, 15 theorems, 75 equations, 17 figures, 13 tables.

Key Result

Lemma 1

The rotation matrices satisfy: (i) composition law $R(\alpha)R(\gamma) = R(\alpha + \gamma)$, (ii) identity $R(0) = I_2$, (iii) inverse $R(\alpha)^{-1} = R(-\alpha) = R(\alpha)^\top$, and (iv) orthogonality $R(\alpha)^\top R(\alpha) = I_2$, implying norm preservation $\|R(\alpha)\mathbf{v}\| = \|\ma

Figures (17)

  • Figure 1: Overview of the predictive coding framework for sharing spatial memory.a, The multi-agent cooperative navigation task. Multiple agents, each with egocentric vision input, explore a 3D environment to find a hidden target. They coordinate by building and sharing a 2D bird's-eye-view (BEV) map via learned, emergent symbols. b, The single-agent spatial memory module. This module consists of two streams. The left stream, a Grid Cell Network, functions as a LSTM-based path integrator that processes the agent's motion state to estimate its pose. Its bottleneck layer spontaneously develops hexagonal activation patterns, mimicking biological grid cells. The right stream uses a Transformer-based network to generate a BEV map from visual inputs. The pose information from the path integrator is then used to accurately register the BEV map, constructing a coherent spatial memory of the maze layout. c, The agent's decision-making process via shared spatial memory. This process is divided into communication and action decisions. The communication decision is managed by an information bottleneck that adaptively adjusts data compression. Crucially, as this process must account for social peers, the network architecture gives rise to emergent social place cell-like activations. The action decision is handled by a hierarchical framework where a meta controller, trained with a multi-agent proximal policy optimization (MAPPO) algorithm and guided by an enhanced intrinsic curiosity module (ICM), directs a low-level planner to navigate toward regions that reduce the uncertainty of spatial memory.
  • Figure 2: Grid-cell-like representations enhance robust global BEV mapping.a, 2D spatial firing rate maps of learned grid-like representations. Each panel shows activity of a single unit in the path integrator LSTM network, labeled by artificial neuron index (#) and gridness score ($G_{60}$, range: -2 to +2; higher values indicate stronger hexagonal symmetry). Heat maps represent normalized firing rates across the 2D environment. b, Spatial autocorrelograms (SACs) of representative grid-like units reveal hexagonal symmetry. Each circular plot shows the autocorrelation structure of the corresponding unit's firing pattern from panel a, with the six-fold rotational symmetry characteristic of biological grid cells. Unit labels and $G_{60}$ scores match panel a. c, The convergence of loss functions during grid cell network training. d, As training proceeds, top unit gridness ($G_{60}$) increases while path integration error decreases, demonstrating the co-emergence of structured representations and predictive accuracy. e, Ablation study shows the full model achieves lower trajectory error and higher prediction confidence than the variant without the grid scaffold. f, Comparison of BEV map reconstructed by different configurations.
  • Figure 3: An efficient, structured, and intelligent communication mechanism emerges from a predictive objective.a. Intelligent communication strategies emerge, with message frequency (heatmaps) concentrated at critical decision points like coordination hubs or dead ends, demonstrating strategic triggering. b. The emergent communication mechanism is highly bandwidth-efficient, consistently requiring the lowest communication overhead across diverse maze types when compared to full and periodic broadcast baselines. c. The communication mechanism is theoretically controllable via the information bottleneck's $\beta$ coefficient, which enables a principled trade-off between message compression (compression ratio) and predictive utility (reconstruction accuracy). d. An emergent symbolic vocabulary is grounded in strategic contexts. A t-SNE visualization reveals distinct symbol clusters corresponding to high-level navigational situations, such as encountering a "Three-way Fork" or discovering the "Target". e. Communication causally influences decision-making. In a controlled scenario where one agent faces a choice between an unexplored path (A) and a known one (B), communication from its partner allows it to identify Path A as the more informative route. f. The behavioral impact is statistically significant. Violin plots quantifying choices at two-way forks show a significantly higher probability of selecting the unexplored path with communication ("w Comm.") compared to the no-communication baseline ("No Comm."). Internal box plots indicate the median (center line) and interquartile range (white box); whiskers denote 1.5$\times$IQR. ($^{***}p < 0.001$, two-sided $t$-test; $n = 150$ per condition).
  • Figure 4: Predictive learning forges a functionally specialized social place code.a, Model architecture. Observer ($S_1$) and partner ($S_2$) states are processed through a bottleneck layer and relational head. The network is trained by back-propagating predictive error from self, partner, and social outputs. b, Functionally distinct neuron types. Three artificial neuron types in the relational head: Pure Place Cells (self-location encoding, Unit 40), Pure SPCs (partner-location encoding, Units 29, 36), and Special SPCs (mixed selectivity, Unit 8). Heatmaps show normalized firing rates. c, Population code for inter-agent distance. Top panels show 2D maps and 1D tuning curves for neurons selective for close-, mid-, and far-distances. Bottom right, a heatmap of all distance-tuned neurons reveals a "tiling" of the distance space. d, Quantitative functional dissociation. Top, bar plots show high mutual information (MI) with self-position for Place Cells and with partner-position for SPCs. Bottom, a scatter plot of self MI vs. partner MI reveals specialized cell clusters. e, Causal necessity of SPCs demonstrated via in-silico lesioning. Left, targeted lesioning of relational-layer distance-selective SPCs ($n=20$ units) specifically impairs distance prediction compared to intact model and random lesion controls (two-tailed unpaired $t$-test; $n=50$ test episodes per condition). Right, targeted lesioning of LSTM-layer SPCs impairs position prediction. Violin plots show full data distribution; *$p<0.05$, **$p<0.01$, ***$p<0.001$. f, Co-evolution of performance and specialization. Left, trajectory predictions are accurate for the trained network but poor for untrained or SPC-lesioned networks. Right, validation loss decreases over training epochs as the proportion of specialized Place Cells and SPCs increases.
  • Figure 5: HRL-ICM framework achieves superior and robust cooperative performance.a, Architecture of the hierarchical reinforcement learning with intrinsic curiosity module (HRL-ICM). The ICM embodies the Level 3 predictive objective: it generates an intrinsic reward based on the agent's inability to predict the consequences of its actions. This "prediction error" signal guides the high-level Meta Controller to select goals that maximally reduce uncertainty, which are then executed by a Low-level Planner. b, Superior success rates and efficiency across 10,000 random mazes. c, High performance maintained in a central coordination maze. d, Robustness demonstrated in a deceptive maze with numerous dead ends. e, Ablation analysis confirms that each predictive component (grid-cells, social cells, communication) is critical for performance, with communication being indispensable. f, The framework scales effectively as agent count increases, outperforming baseline strategies that suffer from performance degradation. g, Exceptional bandwidth robustness is shown as our method's success rate degrades slightly when bandwidth shrinks, while the "Full Broadcast" baseline's performance collapses.
  • ...and 12 more figures

Theorems & Definitions (40)

  • Definition 1: Pose and state space
  • Definition 2: Rotation Matrix
  • Lemma 1: Rotation matrix properties
  • proof
  • Definition 3: Path integration dynamicsgao2021path_supp
  • Remark : Position encoding vs Full pose
  • Definition 4: Prediction objective
  • Definition 5: Rigid body transformation
  • Proposition 1: Equivariance of physical dynamics
  • proof
  • ...and 30 more