Table of Contents
Fetching ...

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents

Andrew Liu, Alla Borisyuk

TL;DR

This work investigates how environmental complexity, tuned via the open-shortcut probability $p$, shapes representation learning in deep reinforcement learning agents performing a DSP-inspired navigation task. The authors train PPO-based agents in a partially observable maze with a removable pink landmark wall, and analyze both node- and population-level representations in the recurrent layer using spatial heatmaps, landmark sensitivity, clustering, and trajectory-separation metrics. They find that spatial place-like encodings emerge early and stabilize, while landmark representations depend on cue exposure and integration into navigation planning; furthermore, population-level encodings encode planned trajectories that correlate with shortcut usage, even when policy performance stabilizes. The study advances methods for probing neural representations in networks and offers insights transferable to biological navigation, including potential DSP task refinements to boost shortcut adoption and a framework for population-based analysis of neural activity.

Abstract

We developed a simulated environment to train deep reinforcement learning agents on a shortcut usage navigation task, motivated by the Dual Solutions Paradigm test used for human navigators. We manipulated the frequency with which agents were exposed to a shortcut and a navigation cue, to investigate how these factors influence shortcut usage development. We find that all agents rapidly achieve optimal performance in closed shortcut trials once initial learning starts. However, their navigation speed and shortcut usage when it is open happen faster in agents with higher shortcut exposure. Analysis of the agents' artificial neural networks activity revealed that frequent presentation of a cue initially resulted in better encoding of the cue in the activity of individual nodes, compared to agents who encountered the cue less often. However, stronger cue representations were ultimately formed through the use of the cue in the context of navigation planning, rather than simply through exposure. We found that in all agents, spatial representations develop early in training and subsequently stabilize before navigation strategies fully develop, suggesting that having spatially consistent activations is necessary for basic navigation, but insufficient for advanced strategies. Further, using new analysis techniques, we found that the planned trajectory rather than the agent's immediate location is encoded in the agent's networks. Moreover, the encoding is represented at the population rather than the individual node level. These techniques could have broader applications in studying neural activity across populations of neurons or network nodes beyond individual activity patterns.

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents

TL;DR

This work investigates how environmental complexity, tuned via the open-shortcut probability , shapes representation learning in deep reinforcement learning agents performing a DSP-inspired navigation task. The authors train PPO-based agents in a partially observable maze with a removable pink landmark wall, and analyze both node- and population-level representations in the recurrent layer using spatial heatmaps, landmark sensitivity, clustering, and trajectory-separation metrics. They find that spatial place-like encodings emerge early and stabilize, while landmark representations depend on cue exposure and integration into navigation planning; furthermore, population-level encodings encode planned trajectories that correlate with shortcut usage, even when policy performance stabilizes. The study advances methods for probing neural representations in networks and offers insights transferable to biological navigation, including potential DSP task refinements to boost shortcut adoption and a framework for population-based analysis of neural activity.

Abstract

We developed a simulated environment to train deep reinforcement learning agents on a shortcut usage navigation task, motivated by the Dual Solutions Paradigm test used for human navigators. We manipulated the frequency with which agents were exposed to a shortcut and a navigation cue, to investigate how these factors influence shortcut usage development. We find that all agents rapidly achieve optimal performance in closed shortcut trials once initial learning starts. However, their navigation speed and shortcut usage when it is open happen faster in agents with higher shortcut exposure. Analysis of the agents' artificial neural networks activity revealed that frequent presentation of a cue initially resulted in better encoding of the cue in the activity of individual nodes, compared to agents who encountered the cue less often. However, stronger cue representations were ultimately formed through the use of the cue in the context of navigation planning, rather than simply through exposure. We found that in all agents, spatial representations develop early in training and subsequently stabilize before navigation strategies fully develop, suggesting that having spatially consistent activations is necessary for basic navigation, but insufficient for advanced strategies. Further, using new analysis techniques, we found that the planned trajectory rather than the agent's immediate location is encoded in the agent's networks. Moreover, the encoding is represented at the population rather than the individual node level. These techniques could have broader applications in studying neural activity across populations of neurons or network nodes beyond individual activity patterns.
Paper Structure (22 sections, 17 figures)

This paper contains 22 sections, 17 figures.

Figures (17)

  • Figure 1: Maps of A. Padua (Veneto, Italy) and B. Salt Lake City (Utah, USA), taken from Google Maps googlemaps
  • Figure 2: A and B. Depictions of the simulated shortcut navigation environment with A. shortcut closed and B. shortcut opened. All walls are colored white except for the shortcut wall when closed, which is pink. A gray box in the very top-right corner represents the navigation target. The agent is depicted as a yellow triangle, with white lines extending outwards its sight lines. C. A schematic of the neural network that agents are parameterized by. Each block represents network layer of 64 nodes, and the arrows represent the activations of one layer being passed to the next in a fully-connected weighted sum. The block with a circled arrow is a recurrent layer. Observation are input ($o_t$ on the left of the schematic), and the network splits into actor ($\pi$) and critic ($V$) branches.
  • Figure 3: Examples of shifting learning curves, for agents trained with $p=0.1$. Each line represents one agent. A. Original learning curves showing mean episode length for an agent. Dots depict where the mean first falls below 180. B. Learning curves shifted such that they start at the dots from A.
  • Figure 4: (A-C) Performance curves for agents trained with different probabilities of encountering an open shortcut ($p$). A. Mean episode length on closed shortcut episodes. B. Mean episode length on open shortcut episodes. C. Mean shortcut use rate on open shortcut episodes. D. Shortcut use rate compared to episode length on open shortcut episodes, with each point representing one agent from one checkpoint.
  • Figure 5: Examples of spatial heatmaps generated from agents at different points in training and taken from different $p$ training environments. Each heatmap depicts a single node and how it typically activated based on the agent's location. Heatmaps have zero mean and unit variance, with red and blue indicating where a node had higher and lower than average activation respectively. Drawings of the goal and walls are also shown in each plot for reference.
  • ...and 12 more figures