Table of Contents
Fetching ...

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao, Jianhao Ding, Zhaofei Yu

TL;DR

The paper addresses the instability of RL with discrete SNNs caused by the mismatch with continuous target-network updates. It introduces a proxy target framework that uses a differentiable proxy actor during training to enable smooth target updates, while preserving SNN energy efficiency during deployment. The approach achieves stable learning, faster convergence, and up to 32% higher average performance across multiple spiking neuron models and continuous control tasks, with simple LIF neurons sometimes surpassing ANN baselines. This work demonstrates the value of SNN-tailored RL algorithms and points to practical, energy-efficient neuromorphic controllers for edge devices.

Abstract

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision making on neuromorphic hardware, making them attractive for Reinforcement Learning (RL) in resource-constrained edge devices. However, most RL algorithms for continuous control are designed for Artificial Neural Networks (ANNs), particularly the target network soft update mechanism, which conflicts with the discrete and non-differentiable dynamics of spiking neurons. We show that this mismatch destabilizes SNN training and degrades performance. To bridge the gap between discrete SNNs and continuous-control algorithms, we propose a novel proxy target framework. The proxy network introduces continuous and differentiable dynamics that enable smooth target updates, stabilizing the learning process. Since the proxy operates only during training, the deployed SNN remains fully energy-efficient with no additional inference overhead. Extensive experiments on continuous control benchmarks demonstrate that our framework consistently improves stability and achieves up to $32\%$ higher performance across various spiking neuron models. Notably, to the best of our knowledge, this is the first approach that enables SNNs with simple Leaky Integrate and Fire (LIF) neurons to surpass their ANN counterparts in continuous control. This work highlights the importance of SNN-tailored RL algorithms and paves the way for neuromorphic agents that combine high performance with low power consumption. Code is available at https://github.com/xuzijie32/Proxy-Target.

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

TL;DR

The paper addresses the instability of RL with discrete SNNs caused by the mismatch with continuous target-network updates. It introduces a proxy target framework that uses a differentiable proxy actor during training to enable smooth target updates, while preserving SNN energy efficiency during deployment. The approach achieves stable learning, faster convergence, and up to 32% higher average performance across multiple spiking neuron models and continuous control tasks, with simple LIF neurons sometimes surpassing ANN baselines. This work demonstrates the value of SNN-tailored RL algorithms and points to practical, energy-efficient neuromorphic controllers for edge devices.

Abstract

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision making on neuromorphic hardware, making them attractive for Reinforcement Learning (RL) in resource-constrained edge devices. However, most RL algorithms for continuous control are designed for Artificial Neural Networks (ANNs), particularly the target network soft update mechanism, which conflicts with the discrete and non-differentiable dynamics of spiking neurons. We show that this mismatch destabilizes SNN training and degrades performance. To bridge the gap between discrete SNNs and continuous-control algorithms, we propose a novel proxy target framework. The proxy network introduces continuous and differentiable dynamics that enable smooth target updates, stabilizing the learning process. Since the proxy operates only during training, the deployed SNN remains fully energy-efficient with no additional inference overhead. Extensive experiments on continuous control benchmarks demonstrate that our framework consistently improves stability and achieves up to higher performance across various spiking neuron models. Notably, to the best of our knowledge, this is the first approach that enables SNNs with simple Leaky Integrate and Fire (LIF) neurons to surpass their ANN counterparts in continuous control. This work highlights the importance of SNN-tailored RL algorithms and paves the way for neuromorphic agents that combine high performance with low power consumption. Code is available at https://github.com/xuzijie32/Proxy-Target.

Paper Structure

This paper contains 51 sections, 2 theorems, 23 equations, 8 figures, 11 tables, 3 algorithms.

Key Result

Theorem 1

Let the proxy network $\pi_{\phi'}^{\text{Proxy}}$ be updated by minimizing the loss $L_{\text{proxy}}$ in Eq. eq:BC. During each update, as the proxy learning rate $lr_{\text{proxy}} \to 0$, the output change satisfies where $\phi'_{\text{old}}$ and $\phi'_{\text{new}}$ denote parameters before and after the update, respectively. Hence, minimizing $L_{\text{proxy}}$ ensures sufficiently small an

Figures (8)

  • Figure 1: Overview of the training framework and performance comparison. (a)-(c) are different training paradigms. (a) Actor-Critic framework in ANNs, (b) The Actor-Critic framework for SNNs, (c) the proposed proxy target framework for SNNs. (d) Performance ratio of SNNs relative to ANNs across five random seeds and five environments. The middle orange line denotes the median, the box spans from the first to the third quartile, and the whiskers extend to the farthest data within $1.5$ inter-quartile range from the box.
  • Figure 2: Effects of different target network update mechanisms. (a)-(c) show output trajectories of different target networks during updates, where each line denotes a normalized output dimension within $(-1,1)$. (a) ANN target network exhibits smooth transitions; (b) SNN target network produces discrete and irregular output jumps; (c) the proposed proxy target achieves continuous and stable transitions. (d) Mean squared error between target and online networks during training in the InvertedDoublePendulum-v4 environment.
  • Figure 3: Architecture of the proposed proxy network and the spiking actor network. The proxy actor is updated implicitly by imitating the behavior of the online spiking actor network, ensuring stable and accurate target updates.
  • Figure 4: Continuous control tasks of the MuJoCo environments on OpenAI Gymnasium. (a) InvertedDoublePendulum-v4, (b) Ant-v4, (c) HalfCheetah-v4, (d) Hopper-v4, (e) Walker2d-v4.
  • Figure 5: Learning curves of the proxy target framework (PT) and the vanilla Actor–Critic framework with the LIF neuron, the CLIF neuron and the DN. AR denotes average returns, and TS denotes training steps. The shaded region represents half a standard deviation over 5 different seeds. Curves are uniformly smoothed for visual clarity.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • Proof 1