Table of Contents
Fetching ...

Visual Place Cell Encoding: A Computational Model for Spatial Representation and Cognitive Mapping

Chance J. Hamilton, Alfredo Weitzenfeld

TL;DR

The paper investigates whether visual appearance alone can yield hippocampal-like spatial coding by introducing Visual Place Cell Encoding (VPCE), a framework that forms place-field–like units through appearance-based clustering of image features and activates them with a radial-basis function. VPCE constructs a place-cell ensemble by extracting features from POV images (via ResNet50 and handcrafted descriptors), clustering with K-Means to define centroids as receptive-field centers, and computing activations A_i(f) = exp(-||f - c_i||^2/(2 alpha_i^2)) with alpha_i = max_{f in Cluster i} ||f - c_i||, followed by min-max normalization. Experiments show that clustering quality improves with more clusters, especially for multimodal features, and that VPCE activations reflect spatial proximity and orientation, differentiate across barriers, and adapt to environmental changes without online learning. The findings support appearance-driven cognitive mapping and offer a computationally tractable platform for studying visually grounded spatial representations in robotics and neuroscience.

Abstract

This paper presents the Visual Place Cell Encoding (VPCE) model, a biologically inspired computational framework for simulating place cell-like activation using visual input. Drawing on evidence that visual landmarks play a central role in spatial encoding, the proposed VPCE model activates visual place cells by clustering high-dimensional appearance features extracted from images captured by a robot-mounted camera. Each cluster center defines a receptive field, and activation is computed based on visual similarity using a radial basis function. We evaluate whether the resulting activation patterns correlate with key properties of biological place cells, including spatial proximity, orientation alignment, and boundary differentiation. Experiments demonstrate that the VPCE can distinguish between visually similar yet spatially distinct locations and adapt to environment changes such as the insertion or removal of walls. These results suggest that structured visual input, even in the absence of motion cues or reward-driven learning, is sufficient to generate place-cell-like spatial representations and support biologically inspired cognitive mapping.

Visual Place Cell Encoding: A Computational Model for Spatial Representation and Cognitive Mapping

TL;DR

The paper investigates whether visual appearance alone can yield hippocampal-like spatial coding by introducing Visual Place Cell Encoding (VPCE), a framework that forms place-field–like units through appearance-based clustering of image features and activates them with a radial-basis function. VPCE constructs a place-cell ensemble by extracting features from POV images (via ResNet50 and handcrafted descriptors), clustering with K-Means to define centroids as receptive-field centers, and computing activations A_i(f) = exp(-||f - c_i||^2/(2 alpha_i^2)) with alpha_i = max_{f in Cluster i} ||f - c_i||, followed by min-max normalization. Experiments show that clustering quality improves with more clusters, especially for multimodal features, and that VPCE activations reflect spatial proximity and orientation, differentiate across barriers, and adapt to environmental changes without online learning. The findings support appearance-driven cognitive mapping and offer a computationally tractable platform for studying visually grounded spatial representations in robotics and neuroscience.

Abstract

This paper presents the Visual Place Cell Encoding (VPCE) model, a biologically inspired computational framework for simulating place cell-like activation using visual input. Drawing on evidence that visual landmarks play a central role in spatial encoding, the proposed VPCE model activates visual place cells by clustering high-dimensional appearance features extracted from images captured by a robot-mounted camera. Each cluster center defines a receptive field, and activation is computed based on visual similarity using a radial basis function. We evaluate whether the resulting activation patterns correlate with key properties of biological place cells, including spatial proximity, orientation alignment, and boundary differentiation. Experiments demonstrate that the VPCE can distinguish between visually similar yet spatially distinct locations and adapt to environment changes such as the insertion or removal of walls. These results suggest that structured visual input, even in the absence of motion cues or reward-driven learning, is sufficient to generate place-cell-like spatial representations and support biologically inspired cognitive mapping.

Paper Structure

This paper contains 49 sections, 9 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the Visual Place Cell Encoding (VPCE) model. Feature vectors derived from visual observations are embedded in a high-dimensional feature space and clustered. Each cluster centroid represents a place field center. When a new observation is processed, the VPCE computes distances to all centroids, and activation levels are determined using a radial basis function. Feature vectors that are close in the feature space produce higher activations, while dissimilar observations result in weaker responses.
  • Figure 2: Overview of the Visual Place Cell Encoding (VPCE) model. (a) Construction of the visual place cell ensemble: POV images collected from a navigating agent are processed through a feature extraction pipeline combining ResNet50 and low-level descriptors (HOG, color histograms, spatial histograms). Extracted feature vectors are clustered to define centroids and intra-cluster spreads that specify visual place fields. (b) Place cell activation during deployment: A new image is processed through the same pipeline and compared to learned centroids. Place cell activations are computed using radial basis functions centered at each centroid and scaled by intra-cluster spread, producing a graded activation pattern based on visual similarity.
  • Figure 3: Simulation environments utilized in this study, each containing eight unique landmarks (colored cylinders). The Husarion ROSbot equipped with LiDAR and RGB-D camera sensors is shown within each environment.
  • Figure 4: Similarity metrics for VPCE activation patterns across four spatial groupings. Each row represents a different group of five data points selected based on spatial arrangement and orientation. The leftmost column shows the spatial positions and headings of the data points within a bounded environment. The three similarity matrices (cosine similarity, Pearson correlation, and Euclidean distance) show pairwise comparisons of the VPCE activation patterns. Groups composed of spatially proximal and similarly oriented data points (e.g., Group 1) exhibit high cosine similarity and low Euclidean distances between activation patterns, while groups with greater spatial separation or divergent orientations (e.g., Groups 2–4) show reduced similarity and increased distance, highlighting the spatial sensitivity of the VPCE representation.
  • Figure 5: Example of testing points used for the spatial differentiation experiment. Colored points indicate sampling positions on each side of the wall. Red lines represent the physical barriers.
  • ...and 6 more figures