Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

Ding Chen; Peixi Peng; Tiejun Huang; Yonghong Tian

Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian

TL;DR

The paper tackles energy-efficient reinforcement learning for continuous control by introducing ILC-SAN, a fully spiking actor network that decodes spike trains via membrane voltage readouts from non-spiking neurons, avoiding floating-point decoders. It combines a population encoder and a backbone SNN with intra-layer connections to enhance action decoding, and integrates with TD3 to train the actor while using a deep critic for guidance. Key contributions include the membrane voltage decoding scheme, the intra-layer connection mechanism, and extensive demonstrations on OpenAI Gym tasks showing superior performance and improved energy efficiency over state-of-the-art spike-based actors. The work advances neuromorphic RL by enabling end-to-end spike-based control suitable for deployment on neuromorphic hardware, with practical impact on energy-conscious robotics and real-time, on-device learning.

Abstract

With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).

Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

TL;DR

Abstract

Paper Structure (26 sections, 11 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 11 equations, 6 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Reward-based Learning by Three-factor Learning Rules
ANN to SNN Conversion for RL
RL Methods using Spike-based BP
Non-spiking Neurons
Intra-layer Connections
Method
Spiking Neural Model
Non-spiking Neural Model
Membrane Voltage Coding
Fully Spiking Actor Network with Intra-layer Connections
Population Encoder
Current-based LIF Neurons
Backbone SNN
...and 11 more sections

Figures (6)

Figure 1: The correspondence diagram between our method and the sensory motor neuron pathway.
Figure 2: The overall framework of the proposed ILC-SAN. The state is transformed into spike-trains by the population encoder. Each state dimension $s_n$ is encoded by the corresponding input population, which consists of learnable Gaussian receptive fields and Integrate-and-Fire (IF) neurons. Each neuron in the input population has a different Gaussian kernel ($\mu, \sigma$). Through these Gaussian kernels, $s_n$ is first encoded into the stimulation strength for each neuron in the input population, and then transformed into the spike-trains using deterministic encoding. After that, the spike-trains are transmitted through the backbone SNN to the population decoder. The spiking neurons in the last layer of the backbone SNN are evenly divided into $M$ output populations, and the intra-layer connections are applied in each output population. Each output population has a corresponding population decoder, where the spike-trains are first integrated into a single non-spiking neuron and then decoded into the corresponding action dimension using membrane voltage coding.
Figure 3: The general discrete neural model (Top) Spiking neural model. (Bottom) Non-spiking neural model.
Figure 4: Eight continuous control tasks from OpenAI gym.
Figure 5: The comparison of average rewards for PopSAN and ILC-SAN using $E_{pop\_det}$ over 10 random seeds. The shaded area represents half the value of the standard deviation, and the curves are smoothed for clarity.
...and 1 more figures

Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

TL;DR

Abstract

Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)