A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents
Andrew Liu, Alla Borisyuk
TL;DR
This work investigates how environmental complexity, tuned via the open-shortcut probability $p$, shapes representation learning in deep reinforcement learning agents performing a DSP-inspired navigation task. The authors train PPO-based agents in a partially observable maze with a removable pink landmark wall, and analyze both node- and population-level representations in the recurrent layer using spatial heatmaps, landmark sensitivity, clustering, and trajectory-separation metrics. They find that spatial place-like encodings emerge early and stabilize, while landmark representations depend on cue exposure and integration into navigation planning; furthermore, population-level encodings encode planned trajectories that correlate with shortcut usage, even when policy performance stabilizes. The study advances methods for probing neural representations in networks and offers insights transferable to biological navigation, including potential DSP task refinements to boost shortcut adoption and a framework for population-based analysis of neural activity.
Abstract
We developed a simulated environment to train deep reinforcement learning agents on a shortcut usage navigation task, motivated by the Dual Solutions Paradigm test used for human navigators. We manipulated the frequency with which agents were exposed to a shortcut and a navigation cue, to investigate how these factors influence shortcut usage development. We find that all agents rapidly achieve optimal performance in closed shortcut trials once initial learning starts. However, their navigation speed and shortcut usage when it is open happen faster in agents with higher shortcut exposure. Analysis of the agents' artificial neural networks activity revealed that frequent presentation of a cue initially resulted in better encoding of the cue in the activity of individual nodes, compared to agents who encountered the cue less often. However, stronger cue representations were ultimately formed through the use of the cue in the context of navigation planning, rather than simply through exposure. We found that in all agents, spatial representations develop early in training and subsequently stabilize before navigation strategies fully develop, suggesting that having spatially consistent activations is necessary for basic navigation, but insufficient for advanced strategies. Further, using new analysis techniques, we found that the planned trajectory rather than the agent's immediate location is encoded in the agent's networks. Moreover, the encoding is represented at the population rather than the individual node level. These techniques could have broader applications in studying neural activity across populations of neurons or network nodes beyond individual activity patterns.
