A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

Lang Qin; Rui Yan; Huajin Tang

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

Lang Qin, Rui Yan, Huajin Tang

TL;DR

The paper addresses high latency and limited versatility in spiking reinforcement learning (SRL) by introducing the adaptive coding spike framework (ACSF), which uses learnable matrices to encode and decode spikes through Spike Encoders/Decoders. By employing iterative LIF neurons and surrogate-gradient training, ACSF supports both online and offline DRL with a directly trained SNN, achieving ultra-low latency (as low as $0.8\%$ of prior SRL methods) and up to $5\times$ energy efficiency over DNNs. Empirical results on Atari and MuJoCo show ACSF either matching or surpassing baselines while significantly reducing latency, and ablations demonstrate the value of adaptive coders. Overall, ACSF broadens SRL applicability, enables efficient neuromorphic deployment, and provides a unified, end-to-end framework for low-latency reinforcement learning.

Abstract

In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

TL;DR

of prior SRL methods) and up to

energy efficiency over DNNs. Empirical results on Atari and MuJoCo show ACSF either matching or surpassing baselines while significantly reducing latency, and ablations demonstrate the value of adaptive coders. Overall, ACSF broadens SRL applicability, enables efficient neuromorphic deployment, and provides a unified, end-to-end framework for low-latency reinforcement learning.

Abstract

Paper Structure (26 sections, 21 equations, 6 figures, 4 tables)

This paper contains 26 sections, 21 equations, 6 figures, 4 tables.

Introduction
Related Works
Reward-based Local Learning
Convert DNNs to SNNs for RL
Hybrid Framework of SNNs and DNNs
Directly Trained SNNs for RL
Methods
DRL Algorithms
DQN
DDPG
BCQ
Behavioral Cloning
Iterative LIF Model
Adaptive Coders
Spike Encoder
...and 11 more sections

Figures (6)

Figure 1: Online and offline SRL frameworks. Environments generally contain elements such as states ($S$), rewards ($R$), and state transition probabilities ($P_{ss'}^a$). The state ($S$) is transmitted to the SNNs through the encoders. Action ($a$) and value functions are expressed by the decoders. The environment accepts the action ($a$) and gives the next state ($S'$) based on the transition probability ($P_{ss'}^a$). (A) In online SRL, the SNN-based policy $\pi$ interacts directly with the environment. (B) In offline SRL, the SNN-based policy interacts with a dataset $\mathcal{D}$ which collected by the behavior policy $\pi_{\beta}$. The behavior policy are usually developed by experienced humans or well-trained agent.
Figure 2: The overall structure and workflow of the ACSF. The encoder transforms the raw state $S$ into the temporal state $S^\tau$, which is then fed into SNNs. The output spike trains generated by SNNs are decoded into values or actions by different decoders. Both the spike encoder and the decoder use learnable matrix multiplication to expand or compress inputs in the time dimension. Deep SNNs are trained directly using surrogate gradients.
Figure 3: Spatiotemporal dynamics of neurons. The solid arrows and the dotted arrows indicate the directions of spatial and temporal feedforward, respectively. The input signal changes the membrane potential ($V^l_t$) and fires the output spikes ($O^l_t$) through the processes of charging, discharging, and resetting.
Figure 4: Atari games and MuJoCo environments in the OpenAI gym. (A) Screenshots from various Atari games. The agent needs to be alive and earn more rewards. (B) MuJoCo robot control tasks, making robots of different shapes walk forward as fast as possible.
Figure 5: Learning curves for DQN and ACSF. During the training process, the performance of ACSF meets or exceeds that of the DQN algorithm. The learning curves have been smoothed for aesthetics.
...and 1 more figures

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

TL;DR

Abstract

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)