Visual Place Cell Encoding: A Computational Model for Spatial Representation and Cognitive Mapping
Chance J. Hamilton, Alfredo Weitzenfeld
TL;DR
The paper investigates whether visual appearance alone can yield hippocampal-like spatial coding by introducing Visual Place Cell Encoding (VPCE), a framework that forms place-field–like units through appearance-based clustering of image features and activates them with a radial-basis function. VPCE constructs a place-cell ensemble by extracting features from POV images (via ResNet50 and handcrafted descriptors), clustering with K-Means to define centroids as receptive-field centers, and computing activations A_i(f) = exp(-||f - c_i||^2/(2 alpha_i^2)) with alpha_i = max_{f in Cluster i} ||f - c_i||, followed by min-max normalization. Experiments show that clustering quality improves with more clusters, especially for multimodal features, and that VPCE activations reflect spatial proximity and orientation, differentiate across barriers, and adapt to environmental changes without online learning. The findings support appearance-driven cognitive mapping and offer a computationally tractable platform for studying visually grounded spatial representations in robotics and neuroscience.
Abstract
This paper presents the Visual Place Cell Encoding (VPCE) model, a biologically inspired computational framework for simulating place cell-like activation using visual input. Drawing on evidence that visual landmarks play a central role in spatial encoding, the proposed VPCE model activates visual place cells by clustering high-dimensional appearance features extracted from images captured by a robot-mounted camera. Each cluster center defines a receptive field, and activation is computed based on visual similarity using a radial basis function. We evaluate whether the resulting activation patterns correlate with key properties of biological place cells, including spatial proximity, orientation alignment, and boundary differentiation. Experiments demonstrate that the VPCE can distinguish between visually similar yet spatially distinct locations and adapt to environment changes such as the insertion or removal of walls. These results suggest that structured visual input, even in the absence of motion cues or reward-driven learning, is sufficient to generate place-cell-like spatial representations and support biologically inspired cognitive mapping.
