Table of Contents
Fetching ...

Bellman Memory Units: A neuromorphic framework for synaptic reinforcement learning with an evolving network topology

Shreyan Banerjee, Aasifa Rounak, Vikram Pakrashi

TL;DR

Addresses the challenge of gradient-free online learning and hardware scalability in neuromorphic control. Proposes a synaptic Q-learning algorithm that embeds Bellman updates at the synapse and an evolving network topology, instantiated as Bellman Memory Units in the Nengo NEF framework and on Intel's Loihi chip. Demonstrates CartPole control with topology growth and on-chip learning, where updates follow the Bellman rule $Q(s,a) = Q(s,a) + α(-Q(s,a) + r + γ V(s'))$, $V(s') = max_{a'} Q(s',a')$. The results indicate reduced memory and network requirements and potential for compact neuromorphic accelerators, enabling adaptation to unseen scenarios.

Abstract

Application of neuromorphic edge devices for control is limited by the constraints on gradient-free online learning and scalability of the hardware across control problems. This paper introduces a synaptic Q-learning algorithm for the control of the classical Cartpole, where the Bellman equations are incorporated at the synaptic level. This formulation enables the iterative evolution of the network topology, represented as a directed graph, throughout the training process. This is followed by a similar approach called neuromorphic Bellman Memory Units (BMU(s)), which are implemented with the Neural Engineering Framework on Intel's Loihi neuromorphic chip. Topology evolution, in conjunction with mixed-signal computation, leverages the optimization of the number of neurons and synapses that could be used to design spike-based reinforcement learning accelerators. The proposed architecture can potentially reduce resource utilization on board, aiding the manufacturing of compact application-specific neuromorphic ICs. Moreover, the on-chip learning introduced in this work and implemented on a neuromorphic chip can enable adaptation to unseen control scenarios.

Bellman Memory Units: A neuromorphic framework for synaptic reinforcement learning with an evolving network topology

TL;DR

Addresses the challenge of gradient-free online learning and hardware scalability in neuromorphic control. Proposes a synaptic Q-learning algorithm that embeds Bellman updates at the synapse and an evolving network topology, instantiated as Bellman Memory Units in the Nengo NEF framework and on Intel's Loihi chip. Demonstrates CartPole control with topology growth and on-chip learning, where updates follow the Bellman rule , . The results indicate reduced memory and network requirements and potential for compact neuromorphic accelerators, enabling adaptation to unseen scenarios.

Abstract

Application of neuromorphic edge devices for control is limited by the constraints on gradient-free online learning and scalability of the hardware across control problems. This paper introduces a synaptic Q-learning algorithm for the control of the classical Cartpole, where the Bellman equations are incorporated at the synaptic level. This formulation enables the iterative evolution of the network topology, represented as a directed graph, throughout the training process. This is followed by a similar approach called neuromorphic Bellman Memory Units (BMU(s)), which are implemented with the Neural Engineering Framework on Intel's Loihi neuromorphic chip. Topology evolution, in conjunction with mixed-signal computation, leverages the optimization of the number of neurons and synapses that could be used to design spike-based reinforcement learning accelerators. The proposed architecture can potentially reduce resource utilization on board, aiding the manufacturing of compact application-specific neuromorphic ICs. Moreover, the on-chip learning introduced in this work and implemented on a neuromorphic chip can enable adaptation to unseen control scenarios.

Paper Structure

This paper contains 17 sections, 5 equations, 13 figures, 2 tables, 2 algorithms.

Figures (13)

  • Figure 1: A control system block diagram with synaptic RL-based controller in the loop. The simulator contains the cartpole model, where, $\text{M}$ is the mass of the cart, m is the mass of the pole and l is the length of the pole.
  • Figure 2: Flow diagram showing the discretization of the cartpole state space.
  • Figure 3: An example network state for training an arbitrary synaptic Q-learning model. The spike propagation is shown for a particular forward pass of the network. $r_i$ is the reward received for the $i^{th}$ observation and $V_i$ is the magnitude of the value function for the $i^{th}$ neuron.
  • Figure 4: Figure showing the evolution of network architecture in Nengo. The blue dots symbolize nodes and ensembles and the gray lines symbolize the connections between them.
  • Figure 5: Figure showing the evolution of network architecture on Loihi board using online learning. The blue dots symbolize nodes and ensembles and the gray lines symbolize the connections between them.
  • ...and 8 more figures