Table of Contents
Fetching ...

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

Yuhao Pan, Xiucheng Wang, Nan Cheng, Qi Qiu

TL;DR

The paper tackles the high energy cost of reinforcement learning by leveraging Spiking Neural Networks (SNNs) as energy-efficient actors within an Actor-Critic TD3 framework. It introduces a trapezoidal surrogate gradient to replace the non-differentiable spiking activity, combining the stability of rectangular surrogates with the adaptability of triangular ones. Empirical results in HalfCheetah-v3 show that the trapezoidal gradient improves convergence speed, final performance, and training stability compared with rectangular and triangular surrogates across Pop-SAN and MDC-SAN variants. This work advances energy-efficient RL by enabling robust, responsive SNN-based policies, and points to future directions in optimizing SNN connectivity and dynamics for practical deployments.

Abstract

With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance comparable to deep neural networks, have garnered widespread attention. To reduce the energy consumption of practical applications of reinforcement learning, researchers have successively proposed the Pop-SAN and MDC-SAN algorithms. Nonetheless, these algorithms use rectangular functions to approximate the spike network during the training process, resulting in low sensitivity, thus indicating room for improvement in the training effectiveness of SNN. Based on this, we propose a trapezoidal approximation gradient method to replace the spike network, which not only preserves the original stable learning state but also enhances the model's adaptability and response sensitivity under various signal dynamics. Simulation results show that the improved algorithm, using the trapezoidal approximation gradient to replace the spike network, achieves better convergence speed and performance compared to the original algorithm and demonstrates good training stability.

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

TL;DR

The paper tackles the high energy cost of reinforcement learning by leveraging Spiking Neural Networks (SNNs) as energy-efficient actors within an Actor-Critic TD3 framework. It introduces a trapezoidal surrogate gradient to replace the non-differentiable spiking activity, combining the stability of rectangular surrogates with the adaptability of triangular ones. Empirical results in HalfCheetah-v3 show that the trapezoidal gradient improves convergence speed, final performance, and training stability compared with rectangular and triangular surrogates across Pop-SAN and MDC-SAN variants. This work advances energy-efficient RL by enabling robust, responsive SNN-based policies, and points to future directions in optimizing SNN connectivity and dynamics for practical deployments.

Abstract

With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance comparable to deep neural networks, have garnered widespread attention. To reduce the energy consumption of practical applications of reinforcement learning, researchers have successively proposed the Pop-SAN and MDC-SAN algorithms. Nonetheless, these algorithms use rectangular functions to approximate the spike network during the training process, resulting in low sensitivity, thus indicating room for improvement in the training effectiveness of SNN. Based on this, we propose a trapezoidal approximation gradient method to replace the spike network, which not only preserves the original stable learning state but also enhances the model's adaptability and response sensitivity under various signal dynamics. Simulation results show that the improved algorithm, using the trapezoidal approximation gradient to replace the spike network, achieves better convergence speed and performance compared to the original algorithm and demonstrates good training stability.
Paper Structure (14 sections, 15 equations, 4 figures)

This paper contains 14 sections, 15 equations, 4 figures.

Figures (4)

  • Figure 1: Introducing SAN algorithm framework into TD3 algorithm model based on AC framework, with the spiking actor network replacing the deep actor network.
  • Figure 2: Illustration of various approximate gradient functions: rectangular, triangular, and trapezoidal
  • Figure 3: The impact of different approximate gradient methods on Pop-SAN performance
  • Figure 4: The impact of different approximate gradient methods on MDC-SAN performance