Table of Contents
Fetching ...

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

Lang Qin, Rui Yan, Huajin Tang

TL;DR

The paper addresses high latency and limited versatility in spiking reinforcement learning (SRL) by introducing the adaptive coding spike framework (ACSF), which uses learnable matrices to encode and decode spikes through Spike Encoders/Decoders. By employing iterative LIF neurons and surrogate-gradient training, ACSF supports both online and offline DRL with a directly trained SNN, achieving ultra-low latency (as low as $0.8\%$ of prior SRL methods) and up to $5\times$ energy efficiency over DNNs. Empirical results on Atari and MuJoCo show ACSF either matching or surpassing baselines while significantly reducing latency, and ablations demonstrate the value of adaptive coders. Overall, ACSF broadens SRL applicability, enables efficient neuromorphic deployment, and provides a unified, end-to-end framework for low-latency reinforcement learning.

Abstract

In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

TL;DR

The paper addresses high latency and limited versatility in spiking reinforcement learning (SRL) by introducing the adaptive coding spike framework (ACSF), which uses learnable matrices to encode and decode spikes through Spike Encoders/Decoders. By employing iterative LIF neurons and surrogate-gradient training, ACSF supports both online and offline DRL with a directly trained SNN, achieving ultra-low latency (as low as of prior SRL methods) and up to energy efficiency over DNNs. Empirical results on Atari and MuJoCo show ACSF either matching or surpassing baselines while significantly reducing latency, and ablations demonstrate the value of adaptive coders. Overall, ACSF broadens SRL applicability, enables efficient neuromorphic deployment, and provides a unified, end-to-end framework for low-latency reinforcement learning.

Abstract

In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.
Paper Structure (26 sections, 21 equations, 6 figures, 4 tables)

This paper contains 26 sections, 21 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Online and offline SRL frameworks. Environments generally contain elements such as states ($S$), rewards ($R$), and state transition probabilities ($P_{ss'}^a$). The state ($S$) is transmitted to the SNNs through the encoders. Action ($a$) and value functions are expressed by the decoders. The environment accepts the action ($a$) and gives the next state ($S'$) based on the transition probability ($P_{ss'}^a$). (A) In online SRL, the SNN-based policy $\pi$ interacts directly with the environment. (B) In offline SRL, the SNN-based policy interacts with a dataset $\mathcal{D}$ which collected by the behavior policy $\pi_{\beta}$. The behavior policy are usually developed by experienced humans or well-trained agent.
  • Figure 2: The overall structure and workflow of the ACSF. The encoder transforms the raw state $S$ into the temporal state $S^\tau$, which is then fed into SNNs. The output spike trains generated by SNNs are decoded into values or actions by different decoders. Both the spike encoder and the decoder use learnable matrix multiplication to expand or compress inputs in the time dimension. Deep SNNs are trained directly using surrogate gradients.
  • Figure 3: Spatiotemporal dynamics of neurons. The solid arrows and the dotted arrows indicate the directions of spatial and temporal feedforward, respectively. The input signal changes the membrane potential ($V^l_t$) and fires the output spikes ($O^l_t$) through the processes of charging, discharging, and resetting.
  • Figure 4: Atari games and MuJoCo environments in the OpenAI gym. (A) Screenshots from various Atari games. The agent needs to be alive and earn more rewards. (B) MuJoCo robot control tasks, making robots of different shapes walk forward as fast as possible.
  • Figure 5: Learning curves for DQN and ACSF. During the training process, the performance of ACSF meets or exceeds that of the DQN algorithm. The learning curves have been smoothed for aesthetics.
  • ...and 1 more figures