Table of Contents
Fetching ...

Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems

Marcos Negre Saura, Richard Allmendinger, Wei Pan, Theodore Papamarkou

TL;DR

The paper addresses efficient, uncertainty-aware action selection in reinforcement learning for spatially structured tasks. It introduces ring attractors to encode the action space on a circular topology, implemented via a CTRNN-based exogenous model and a DL-based RA module integrated into DRL, with uncertainty carried through a Bayesian linear layer. Key contributions include a novel spatial-action policy mechanism, intrinsic uncertainty handling, and a reusable DL component; the method achieves a state-of-the-art $53\%$ improvement on the Atari 100k benchmark over baselines, particularly in spatial games. The results suggest that explicit spatial action encoding and attractor-based temporal filtering improve learning speed and robustness, with potential extensions to multi-agent and safe RL domains.

Abstract

Ring attractors, mathematical models inspired by neural circuit dynamics, provide a biologically plausible mechanism to improve learning speed and accuracy in Reinforcement Learning (RL). Serving as specialized brain-inspired structures that encode spatial information and uncertainty, ring attractors explicitly encode the action space, facilitate the organization of neural activity, and enable the distribution of spatial representations across the neural network in the context of Deep Reinforcement Learning (DRL). These structures also provide temporal filtering that stabilizes action selection during exploration, for example, by preserving the continuity between rotation angles in robotic control or adjacency between tactical moves in game-like environments. The application of ring attractors in the action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building an exogenous model and integrating them as part of DRL agents. Our approach significantly improves state-of-the-art performance on the Atari 100k benchmark, achieving a 53% increase in performance over selected baselines.

Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems

TL;DR

The paper addresses efficient, uncertainty-aware action selection in reinforcement learning for spatially structured tasks. It introduces ring attractors to encode the action space on a circular topology, implemented via a CTRNN-based exogenous model and a DL-based RA module integrated into DRL, with uncertainty carried through a Bayesian linear layer. Key contributions include a novel spatial-action policy mechanism, intrinsic uncertainty handling, and a reusable DL component; the method achieves a state-of-the-art improvement on the Atari 100k benchmark over baselines, particularly in spatial games. The results suggest that explicit spatial action encoding and attractor-based temporal filtering improve learning speed and robustness, with potential extensions to multi-agent and safe RL domains.

Abstract

Ring attractors, mathematical models inspired by neural circuit dynamics, provide a biologically plausible mechanism to improve learning speed and accuracy in Reinforcement Learning (RL). Serving as specialized brain-inspired structures that encode spatial information and uncertainty, ring attractors explicitly encode the action space, facilitate the organization of neural activity, and enable the distribution of spatial representations across the neural network in the context of Deep Reinforcement Learning (DRL). These structures also provide temporal filtering that stabilizes action selection during exploration, for example, by preserving the continuity between rotation angles in robotic control or adjacency between tactical moves in game-like environments. The application of ring attractors in the action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building an exogenous model and integrating them as part of DRL agents. Our approach significantly improves state-of-the-art performance on the Atari 100k benchmark, achieving a 53% increase in performance over selected baselines.
Paper Structure (44 sections, 30 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 44 sections, 30 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Ring attractor Touretzky representation: Circular arrangement of excitatory neurons (N0-N7) and central inhibitory neuron. Four input signals shown as colored gradients. Overall activation depicted by red outline. Includes connection weights and input signal parameters, illustrating the final ring attractor dynamics state from Eq. \ref{['eq:Activation_to_action_translation']}.
  • Figure 2: Learning speed comparison. Above: OpenAI Gym Super Mario Bros environment gym-super-mario-bros with discrete action space. Below: OpenAI highway highway-env with a continuous 1-D circular variable. The plot shows cumulative reward over 1 million frames for three models: Standard BDQN; BDQNRA with ring attractor behavior policy from Section \ref{['section:RA_policy']}, setting the action-value pair variance constant to $\sigma_a=\frac{\pi}{6}$, using this fixed variance to enable smooth action transitions while preventing interference with opposing actions; and BDQNRA-UA with RA and Uncertainty Awareness (UA) implementing the uncertainty quantification model from Section \ref{['section:UQ_methodology']} to feed into the variance of the action-value pairs. Displaying mean episodic returns over 10 averaged seeds.
  • Figure 3: Performance comparison: DDQNRA vs standard DDQN double_DQN in two environments. Above: OpenAI Super Mario Bros gym-super-mario-bros, demonstrating adaptability to complex, game-like scenarios. Below: OpenAI highway highway-env, showing learning speed in spatial navigation tasks. Displaying mean episodic returns over 10 averaged seeds.
  • Figure 4: Performance comparison: PPO-RA vs standard PPO in OpenAI Gym Super Mario Bros environment gym-super-mario-bros with discrete action space, demonstrating adaptability to complex, game-like scenarios. The plot shows PPO-RA (red) consistently outperforming standard PPO (blue), with shaded regions representing standard deviation across 10 averaged seeds.
  • Figure 5: Ablation study comparing BDQN variants in OpenAI Gym Super Mario Bros gym-super-mario-bros. The plot shows cumulative reward over 1 million frames for three models: Standard BDQN BDQN; BDQNRA-UA with RA and Uncertainty Awareness (UA) implementing both the ring attractor behavior policy from Section \ref{['section:RA_policy']} and the uncertainty quantification model from Section \ref{['section:UQ_methodology']}; and BDQNRA-RM, applying the same concepts from BDQNRA-UA, but randomly distributing the action space across the ring in each experiment. Displaying mean episodic returns over 10 averaged seeds.
  • ...and 5 more figures