Table of Contents
Fetching ...

Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks

Moritz Lange, Raphael C. Engelhardt, Wolfgang Konen, Laurenz Wiskott

TL;DR

The paper addresses the challenge of visual navigation by introducing hierarchical slow feature analysis (hSFA) to extract interpretable location and heading representations directly from visual input. It evaluates hSFA against CNN and PCA baselines by integrating the features into PPO-based RL agents across four Miniworld environments, showing that hSFA can yield robust localization cues and improve navigation efficiency in certain tasks (notably StarMazeArm) while exposing limitations related to symmetries and data coverage. The study highlights the slowness prior as a powerful inductive bias for localization, discusses training and integration constraints, and argues for future work on online end-to-end training, planning integration, and transferability of learned representations. Overall, the work demonstrates neuroscience-inspired representations that enhance explainability and potentially guide the development of more robust, interpretable RL agents for visual navigation.

Abstract

Visual navigation requires a whole range of capabilities. A crucial one of these is the ability of an agent to determine its own location and heading in an environment. Prior works commonly assume this information as given, or use methods which lack a suitable inductive bias and accumulate error over time. In this work, we show how the method of slow feature analysis (SFA), inspired by neuroscience research, overcomes both limitations by generating interpretable representations of visual data that encode location and heading of an agent. We employ SFA in a modern reinforcement learning context, analyse and compare representations and illustrate where hierarchical SFA can outperform other feature extractors on navigation tasks.

Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks

TL;DR

The paper addresses the challenge of visual navigation by introducing hierarchical slow feature analysis (hSFA) to extract interpretable location and heading representations directly from visual input. It evaluates hSFA against CNN and PCA baselines by integrating the features into PPO-based RL agents across four Miniworld environments, showing that hSFA can yield robust localization cues and improve navigation efficiency in certain tasks (notably StarMazeArm) while exposing limitations related to symmetries and data coverage. The study highlights the slowness prior as a powerful inductive bias for localization, discusses training and integration constraints, and argues for future work on online end-to-end training, planning integration, and transferability of learned representations. Overall, the work demonstrates neuroscience-inspired representations that enhance explainability and potentially guide the development of more robust, interpretable RL agents for visual navigation.

Abstract

Visual navigation requires a whole range of capabilities. A crucial one of these is the ability of an agent to determine its own location and heading in an environment. Prior works commonly assume this information as given, or use methods which lack a suitable inductive bias and accumulate error over time. In this work, we show how the method of slow feature analysis (SFA), inspired by neuroscience research, overcomes both limitations by generating interpretable representations of visual data that encode location and heading of an agent. We employ SFA in a modern reinforcement learning context, analyse and compare representations and illustrate where hierarchical SFA can outperform other feature extractors on navigation tasks.
Paper Structure (36 sections, 2 equations, 9 figures, 3 tables)

This paper contains 36 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Illustration of the architecture of a hierarchical slow feature analysis model. The input image is perceived in patches by receptive fields with certain strides. These patches are stacked and passed as batches through an hSFA layer. This happens repeatedly until the last layer produces an output with multiple channels (features), but no width and height.
  • Figure 2: Analysis of hSFA representations in different environments (top view). Figures \ref{['fig:sfa_reps_a']}, \ref{['fig:sfa_reps_b']}, \ref{['fig:sfa_reps_c']} show activations of the first 6 hSFA feature dimensions for different positions and orientations in the room. The points are generated by a random agent moving for 80,000 steps without reset. Colors fade from deep red for large positive values into white for zero into deep blue for large negative values. Figure \ref{['fig:sfa_reps_b']} additionally shows the 4th feature of WallGap for separate agent headings.
  • Figure 3: Reconstruction of heading angles. The angle is reconstructed from sine and cosine, which are provided by two linear models trained on all 32 hSFA features. In order to see density, points have a high transparency. The top left and bottom right corners contain points because of the heading's circularity.
  • Figure 4: Performance of agents with various feature extractors on the different Miniworld environments. Shaded areas indicate the minimum and maximum of five agents trained with different random seeds. Curves have been smoothed slightly for clearer presentation.
  • Figure 5: Exemplary observations rendered from the different environments. In this observation of StarMaze, a red target cube is visible. There is no shade from illumination, so the different wall texture colors in FourColoredRooms are in fact textures of different colors.
  • ...and 4 more figures