Visuospatial navigation from the bottom-up: without vestibular integration, distance prediction, or maps

Patrick Govoni; Pawel Romanczuk

Visuospatial navigation from the bottom-up: without vestibular integration, distance prediction, or maps

Patrick Govoni, Pawel Romanczuk

TL;DR

The study demonstrates that simple, vision-based route planning can solve a classic navigation task without cognitive maps, vestibular input, or distance prediction, by employing a minimal, feedforward perception–action loop evolved under constraints. It reveals three distinct route-based strategies—indirect sequential, biased diffusive, and direct pathing—whose prevalence depends on visual resolution and the presence of distance cues, with elliptical decision manifolds guiding angle-based turns. These findings challenge the necessity of map-based representations for navigation, suggesting a robust bottom-up framework that could generalize across species and inform energy-efficient robotics. The work highlights neural-activations aligned with goal-directed views rather than explicit spatial maps, prompting a shift toward egocentric, episodic perspectives in understanding navigation.

Abstract

Navigation is believed to be controlled by at least two partially dissociable systems in the brain. The cognitive map informs an organism of its location and bearing, updated by integrating vestibular self-motion or predicting distances to landmarks. Route-based navigation, on the other hand, directly evaluate sequential movement decisions from immediate percepts. Here we demonstrate the sufficiency of visual route-based decision-making in a classic open field navigation task often assumed to require a cognitive map. Three distinct strategies emerge to robustly navigate to a hidden goal, each conferring contextual tradeoffs analyzed at both neural and behavioral scales, as well as qualitatively aligning with behavior observed across the biological spectrum. We propose reframing navigation from the bottom-up, through an egocentric episodic perspective without assuming online access to computationally expensive top-down representations, to better explain behavior under energetic or attentional constraints.

Visuospatial navigation from the bottom-up: without vestibular integration, distance prediction, or maps

TL;DR

Abstract

Paper Structure (6 sections, 4 equations, 14 figures, 1 table)

This paper contains 6 sections, 4 equations, 14 figures, 1 table.

Introduction
Results
Discussion
Materials and methods
Acknowledgements
Supporting information

Figures (14)

Figure 1: Agent perception-action loop flow. Clockwise from bottom left: visual encoding, information processing, action conversion, environment update. Visual encoding consists of identifying walls corresponding to retinal angles of a raycast ($\upsilon$ number rays between -$\theta$ & $\theta$ field of view limits) for minimal angle-only vision, then adding distance information according to scaling factor $\sigma$. Visual information passes through convolutional neural network, perceptron, linear output layer, and hyperbolic tangent tranformations to directly represent turning angle as well as speed via a linear function, which updates agent position and orientation for the next timestep (single update shown in environment and black point on linear function).
Figure 2: Three navigational classes, movement behavior & correlations.Top row: global movement behavior of three individual evolutionary runs or agents, with angle-based vision for A/B and with added distance scaling for C ($\sigma = 1$), each with an 8 ray visual resolution ($\upsilon = 8$). Solid lines: single agent trajectories from unique initial positions and orientations, temporally colored, density and darkness reflect common routes. Dotted lines: 4 example individual trajectories. Black circle: patch location. Bottom row: movement correlations. Top left: spatial heading profile with respect to initial heading and distance from patch center, inset shows same 4 example trajectories in above plots. Top right: polar histogram of relative orientation for timesteps when agent is over 100 grid units away from patch center, with frequency proportional to area and color of bin. Bottom left: temporal persistence of initial heading angle, marked by decorrelation time (50% threshold, red dashed line), sinusoidal shape reflecting correlated oscillations. Bottom right: directedness heatmap, an information theoretic measure calculated for each spatial bin in the environment. See \ref{['fig:trajs_ext']} and \ref{['fig:corrs_ext']} Figs for other individuals.
Figure 3: Distinct approaches to represent neural data.Top: neural activity data gathered with uniform spatial and orientational occupancy. Bottom: agents initialized as above, data gathered over 500 timesteps and binned in an equivalent manner. Left: spatial selectivity regardless of orientation, normalized from zero to maximum activation. Middle eight: spatial selectivity with respect to orientation, illustrated by agent directions. Right: neural tuning curves with respect to egocentric angle from the patch. A-C & D-F relate to the three agents and navigational classes of Fig \ref{['fig:trajs_corrs']}: indirect sequential (IS), biased diffusive (BD), and direct pathing (DP).
Figure 4: Classification & relative evolvability and fitnesses. A: Navigational algorithmic classes, separated by decorrelation time and directedness, top: colors correspond to distance scaling factor ($\sigma$) and visual resolution ($\upsilon$), bottom: colors correspond to algorithmic classes (IS: indirect sequential, BD: biased diffusion, DP: direct pathing), including hybrids as combined half-circles. B: Angle-based strategy bifurcation, left: relative rates of either angle-based classes evolving under different visual resolutions, right: relative fitnesses, revealing optima at $\upsilon = 8$ for IS and $\upsilon = 16$ for BD. C: Distance-based phase transition, similar plots as in B, varying distance scaling factors and $\upsilon = 8$. Scatter plots on right include violins for sample size greater than 5.
Figure S1: Evolutionary performance with differing parameters. Left: median performance of 40 training runs, dashed lines indicate median of validation tests. Right: validation test distribution (remaining distance to patch not included), lines indicate median, same color as legend in left plot. Runs used in top figure reflect data used in main text (perfect: theoretical lower bound). Unless otherwise noted, parameters are as follows: Distance scaling factor ($\sigma$): 0; Visual resolution ($\upsilon$): 8; Field of vision: 0.4; CNN output size: 4; Multilayer perceptron (MLP) size: 1x2.
...and 9 more figures

Visuospatial navigation from the bottom-up: without vestibular integration, distance prediction, or maps

TL;DR

Abstract

Visuospatial navigation from the bottom-up: without vestibular integration, distance prediction, or maps

Authors

TL;DR

Abstract

Table of Contents

Figures (14)