Table of Contents
Fetching ...

Egocentric Visual Navigation through Hippocampal Sequences

Xiao-Xiong Lin, Yuk Hoi Yiu, Christian Leibold

TL;DR

The paper investigates how hippocampal theta sequences could emerge from intrinsic CA3 circuitry when driven by highly sparse dentate gyrus inputs. It introduces a minimal, biologically inspired model: a sparsified DG projection feeding a fixed CA3 sequence generator that acts as a shift-register reservoir, coupled to an actor–critic RL learner for egocentric navigation in a vision-based maze. Under sparse input, this CA3-based reservoir outperforms comparable LSTM cores, producing place-field–like tuning, orthogonalized DG representations, and distance-dependent population kernels, while performance degrades for dense inputs where LSTMs excel. The work provides a mechanistic account of hippocampal sequences with tangible RL implications, predicting how sequence length, sparsity, and remapping shape navigation and neural representations in the brain and offering a scalable inductive bias for reinforcement learning in sparse sensory regimes.

Abstract

Sequential activation of place-tuned neurons in an animal during navigation is typically interpreted as reflecting the sequence of input from adjacent positions along the trajectory. More recent theories about such place cells suggest sequences arise from abstract cognitive objectives like planning. Here, we propose a mechanistic and parsimonious interpretation to complement these ideas: hippocampal sequences arise from intrinsic recurrent circuitry that propagates activity without readily available input, acting as a temporal memory buffer for extremely sparse inputs. We implement a minimal sequence generator inspired by neurobiology and pair it with an actor-critic learner for egocentric visual navigation. Our agent reliably solves a continuous maze without explicit geometric cues, with performance depending on the length of the recurrent sequence. Crucially, the model outperforms LSTM cores under sparse input conditions (16 channels, ~2.5% activity), but not under dense input, revealing a strong interaction between representational sparsity and memory architecture. In contrast to LSTM agents, hidden sequence units develop localized place fields, distance-dependent spatial kernels, and task-dependent remapping, while inputs orthogonalize and spatial information increases across layers. These phenomena align with neurobiological data and are causal to performance. Together, our results show that sparse input synergizes with sequence-generating dynamics, providing both a mechanistic account of place cell sequences in the mammalian hippocampus and a simple inductive bias for reinforcement learning based on sparse egocentric inputs in navigation tasks.

Egocentric Visual Navigation through Hippocampal Sequences

TL;DR

The paper investigates how hippocampal theta sequences could emerge from intrinsic CA3 circuitry when driven by highly sparse dentate gyrus inputs. It introduces a minimal, biologically inspired model: a sparsified DG projection feeding a fixed CA3 sequence generator that acts as a shift-register reservoir, coupled to an actor–critic RL learner for egocentric navigation in a vision-based maze. Under sparse input, this CA3-based reservoir outperforms comparable LSTM cores, producing place-field–like tuning, orthogonalized DG representations, and distance-dependent population kernels, while performance degrades for dense inputs where LSTMs excel. The work provides a mechanistic account of hippocampal sequences with tangible RL implications, predicting how sequence length, sparsity, and remapping shape navigation and neural representations in the brain and offering a scalable inductive bias for reinforcement learning in sparse sensory regimes.

Abstract

Sequential activation of place-tuned neurons in an animal during navigation is typically interpreted as reflecting the sequence of input from adjacent positions along the trajectory. More recent theories about such place cells suggest sequences arise from abstract cognitive objectives like planning. Here, we propose a mechanistic and parsimonious interpretation to complement these ideas: hippocampal sequences arise from intrinsic recurrent circuitry that propagates activity without readily available input, acting as a temporal memory buffer for extremely sparse inputs. We implement a minimal sequence generator inspired by neurobiology and pair it with an actor-critic learner for egocentric visual navigation. Our agent reliably solves a continuous maze without explicit geometric cues, with performance depending on the length of the recurrent sequence. Crucially, the model outperforms LSTM cores under sparse input conditions (16 channels, ~2.5% activity), but not under dense input, revealing a strong interaction between representational sparsity and memory architecture. In contrast to LSTM agents, hidden sequence units develop localized place fields, distance-dependent spatial kernels, and task-dependent remapping, while inputs orthogonalize and spatial information increases across layers. These phenomena align with neurobiological data and are causal to performance. Together, our results show that sparse input synergizes with sequence-generating dynamics, providing both a mechanistic account of place cell sequences in the mammalian hippocampus and a simple inductive bias for reinforcement learning based on sparse egocentric inputs in navigation tasks.

Paper Structure

This paper contains 35 sections, 6 equations, 21 figures, 3 tables.

Figures (21)

  • Figure 1: Theta sequences. A Illustration of theta sequences observed in rodent hippocampus. In each theta cycle, R=3 neurons are activated and the activation propagates in a sequence of $\ell$=5 neurons over L=3 theta cycles (cf. eq. \ref{['eq:S-and-J']}). B Current baseline understanding of the theta sequences, driven by sequential inputs despite recurrent connections in CA3. Left: spatial tunings of the sequentially activated cells. C Intrinsic theta sequence hypothesis, a parsimonious account where the recurrent connections support generating long horizon sequential activity without sequential external inputs.
  • Figure 2: Model summary. A Virtual environment ($19\times19$ tiles) were constructed with walls randomly placed on 15 % of the tiles. Wall layouts for kept fixed for repeated trials, with an invisible goal (red star) near to the bottom right unless otherwise mentioned. At each episode, the agent was initially placed at a random location at least 5 tiles away from the goal. B The agent receives a first person perspective vision input that is processed via a visual encoder (ResNet, matching the SOTA in deepmind lab environment espeholt2018impala, pretrained and fixed in our experiments; cf. \ref{['tab:hipposlam-arch']}). C These output was linearly transformed to F features, and then sparsified using batch normalization and high thresholding ($\tau=2.43$), such that the percentage of activation ($\sim2.5\%$) matches the sparse activity of neurons in the hippocampal DG that project to CA3 area. The modules in white are hard coded or fixed during training. The modules in blue (fully connected layers, FC) are trained.
  • Figure 3: Training performance A Agents with different sequence length L and repetition number R. Line and shaded area are mean and s.e.m. across 6 random seeds. Len: number of frames to reach the goal location. Stable success: the rate of having 100 consecutive successful episodes. Metrics were Gaussian smoothed, $\sigma=6\times10^6$ frames. B The best performing agent with L=64 and R=8 across all seeds was tested for its transfer learning. New Rew.: new reward location at the lower left corner. new map: a randomly generated map with the same statistics. block path: new walls are added to block paths to the reward, while the previous blocking walls are removed. C and D Performance of agents with different recurrent modules and inputs. C: sparse input. D: dense input where batch-normalization and high thresholding was removed. CA3: our CA3 model with L=64 and R=8. RandRNN: randomly initialized fixed RNN of the same state size. SSM_LegS: fixed SSM HiPPO-LegS from gu2020hippo with the same state size. LSTM: trainable LSTM with matching number of parameters.
  • Figure 4: Evolution of occupancy over the course of learning. Color represents a normalized measure of time the agent spent at a location. Mean head directions at a location were visualized in arrows. From epoch 5 onwards, the agent has a preference to reach the bottom right corner (independent of the random starting location) and proceed to the goal location (red star) from there.
  • Figure 5: Left: spatial tuning of DG and CA3 units from epoch 6. Pixel coordinates correspond to environment (black crosses correspond to walls). Each row shows the CA3 units ordered by their positions in the activity sequence. We selected 4 out of the 16 feature sequences for visualization. Spatial Information (SI): bits per time step. Right: spatial tuning of randomly selected Decoder layer 1 units from epoch 6.
  • ...and 16 more figures