Bellman Memory Units: A neuromorphic framework for synaptic reinforcement learning with an evolving network topology
Shreyan Banerjee, Aasifa Rounak, Vikram Pakrashi
TL;DR
Addresses the challenge of gradient-free online learning and hardware scalability in neuromorphic control. Proposes a synaptic Q-learning algorithm that embeds Bellman updates at the synapse and an evolving network topology, instantiated as Bellman Memory Units in the Nengo NEF framework and on Intel's Loihi chip. Demonstrates CartPole control with topology growth and on-chip learning, where updates follow the Bellman rule $Q(s,a) = Q(s,a) + α(-Q(s,a) + r + γ V(s'))$, $V(s') = max_{a'} Q(s',a')$. The results indicate reduced memory and network requirements and potential for compact neuromorphic accelerators, enabling adaptation to unseen scenarios.
Abstract
Application of neuromorphic edge devices for control is limited by the constraints on gradient-free online learning and scalability of the hardware across control problems. This paper introduces a synaptic Q-learning algorithm for the control of the classical Cartpole, where the Bellman equations are incorporated at the synaptic level. This formulation enables the iterative evolution of the network topology, represented as a directed graph, throughout the training process. This is followed by a similar approach called neuromorphic Bellman Memory Units (BMU(s)), which are implemented with the Neural Engineering Framework on Intel's Loihi neuromorphic chip. Topology evolution, in conjunction with mixed-signal computation, leverages the optimization of the number of neurons and synapses that could be used to design spike-based reinforcement learning accelerators. The proposed architecture can potentially reduce resource utilization on board, aiding the manufacturing of compact application-specific neuromorphic ICs. Moreover, the on-chip learning introduced in this work and implemented on a neuromorphic chip can enable adaptation to unseen control scenarios.
