Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

Yuhao Pan; Xiucheng Wang; Nan Cheng; Qi Qiu

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

Yuhao Pan, Xiucheng Wang, Nan Cheng, Qi Qiu

TL;DR

The paper tackles the high energy cost of reinforcement learning by leveraging Spiking Neural Networks (SNNs) as energy-efficient actors within an Actor-Critic TD3 framework. It introduces a trapezoidal surrogate gradient to replace the non-differentiable spiking activity, combining the stability of rectangular surrogates with the adaptability of triangular ones. Empirical results in HalfCheetah-v3 show that the trapezoidal gradient improves convergence speed, final performance, and training stability compared with rectangular and triangular surrogates across Pop-SAN and MDC-SAN variants. This work advances energy-efficient RL by enabling robust, responsive SNN-based policies, and points to future directions in optimizing SNN connectivity and dynamics for practical deployments.

Abstract

With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance comparable to deep neural networks, have garnered widespread attention. To reduce the energy consumption of practical applications of reinforcement learning, researchers have successively proposed the Pop-SAN and MDC-SAN algorithms. Nonetheless, these algorithms use rectangular functions to approximate the spike network during the training process, resulting in low sensitivity, thus indicating room for improvement in the training effectiveness of SNN. Based on this, we propose a trapezoidal approximation gradient method to replace the spike network, which not only preserves the original stable learning state but also enhances the model's adaptability and response sensitivity under various signal dynamics. Simulation results show that the improved algorithm, using the trapezoidal approximation gradient to replace the spike network, achieves better convergence speed and performance compared to the original algorithm and demonstrates good training stability.

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

TL;DR

Abstract

Paper Structure (14 sections, 15 equations, 4 figures)

This paper contains 14 sections, 15 equations, 4 figures.

Introduction
SAN Algorithm Framework Overview
Introduction to the Reinforcement Learning Framework
SNN Playing the Role of the Actor Network
Encoder Module
Transmission Module
Decoder Module
Approximate Gradients To Replace The Spiking Network
Introduction to the Trapezoidal Approximate Gradient Function
SNNs Parameters Update Process
PERFORMANCE EVALUATION
Simulation Settings
Performance Comparison
Conclusion

Figures (4)

Figure 1: Introducing SAN algorithm framework into TD3 algorithm model based on AC framework, with the spiking actor network replacing the deep actor network.
Figure 2: Illustration of various approximate gradient functions: rectangular, triangular, and trapezoidal
Figure 3: The impact of different approximate gradient methods on Pop-SAN performance
Figure 4: The impact of different approximate gradient methods on MDC-SAN performance

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

TL;DR

Abstract

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)