Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning
Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian
TL;DR
The paper tackles energy-efficient reinforcement learning for continuous control by introducing ILC-SAN, a fully spiking actor network that decodes spike trains via membrane voltage readouts from non-spiking neurons, avoiding floating-point decoders. It combines a population encoder and a backbone SNN with intra-layer connections to enhance action decoding, and integrates with TD3 to train the actor while using a deep critic for guidance. Key contributions include the membrane voltage decoding scheme, the intra-layer connection mechanism, and extensive demonstrations on OpenAI Gym tasks showing superior performance and improved energy efficiency over state-of-the-art spike-based actors. The work advances neuromorphic RL by enabling end-to-end spike-based control suitable for deployment on neuromorphic hardware, with practical impact on energy-conscious robotics and real-time, on-device learning.
Abstract
With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).
