Table of Contents
Fetching ...

CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning

Zijie Xu, Xinyu Shi, Yiting Dong, Zihan Huang, Zhaofei Yu

TL;DR

Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN) is proposed, which introduces a confidence-guided adaptive update strategy for BN statistics and a re-calibration mechanism to align distributions that stabilizes SNN optimization without disrupting the RL training process.

Abstract

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision-making on neuromorphic hardware by mimicking the event-driven dynamics of biological neurons. However, the discrete and non-differentiable nature of spikes leads to unstable gradient propagation in directly trained SNNs, making Batch Normalization (BN) an important component for stabilizing training. In online Reinforcement Learning (RL), imprecise BN statistics hinder exploitation, resulting in slower convergence and suboptimal policies. While Artificial Neural Networks (ANNs) can often omit BN, SNNs critically depend on it, limiting the adoption of SNNs for energy-efficient control on resource-constrained devices. To overcome this, we propose Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN), which introduces (i) a confidence-guided adaptive update strategy for BN statistics and (ii) a re-calibration mechanism to align distributions. By providing more accurate normalization, CaRe-BN stabilizes SNN optimization without disrupting the RL training process. Importantly, CaRe-BN does not alter inference, thus preserving the energy efficiency of SNNs in deployment. Extensive experiments on both discrete and continuous control benchmarks demonstrate that CaRe-BN improves SNN performance by up to $22.6\%$ across different spiking neuron models and RL algorithms. Remarkably, SNNs equipped with CaRe-BN even surpass their ANN counterparts by $5.9\%$. These results highlight a new direction for BN techniques tailored to RL, paving the way for neuromorphic agents that are both efficient and high-performing. Code is available at https://github.com/xuzijie32/CaRe-BN.

CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning

TL;DR

Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN) is proposed, which introduces a confidence-guided adaptive update strategy for BN statistics and a re-calibration mechanism to align distributions that stabilizes SNN optimization without disrupting the RL training process.

Abstract

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision-making on neuromorphic hardware by mimicking the event-driven dynamics of biological neurons. However, the discrete and non-differentiable nature of spikes leads to unstable gradient propagation in directly trained SNNs, making Batch Normalization (BN) an important component for stabilizing training. In online Reinforcement Learning (RL), imprecise BN statistics hinder exploitation, resulting in slower convergence and suboptimal policies. While Artificial Neural Networks (ANNs) can often omit BN, SNNs critically depend on it, limiting the adoption of SNNs for energy-efficient control on resource-constrained devices. To overcome this, we propose Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN), which introduces (i) a confidence-guided adaptive update strategy for BN statistics and (ii) a re-calibration mechanism to align distributions. By providing more accurate normalization, CaRe-BN stabilizes SNN optimization without disrupting the RL training process. Importantly, CaRe-BN does not alter inference, thus preserving the energy efficiency of SNNs in deployment. Extensive experiments on both discrete and continuous control benchmarks demonstrate that CaRe-BN improves SNN performance by up to across different spiking neuron models and RL algorithms. Remarkably, SNNs equipped with CaRe-BN even surpass their ANN counterparts by . These results highlight a new direction for BN techniques tailored to RL, paving the way for neuromorphic agents that are both efficient and high-performing. Code is available at https://github.com/xuzijie32/CaRe-BN.

Paper Structure

This paper contains 51 sections, 1 theorem, 24 equations, 12 figures, 21 tables, 2 algorithms.

Key Result

Theorem 1

Let $(\mu_i,\sigma_i^2)$ and $(\hat{\mu}_{i\mid i-1},\hat{\sigma}^2_{i\mid i-1})$ be two unbiased estimators of the population parameters $(\mu_i^*,{\sigma_i^*}^2)$. Taking them as random variables, the optimal linear estimator is where $K^\mu_i$ and $K^\sigma_i$ are confidence-guided adaptive weights, and $\mathbb{D}(\cdot)$ denotes generalized varianceThe confidence is defined as the inverse of

Figures (12)

  • Figure 1: Real and estimated input activation distributions in BN layers. Between each gradient update iterations, distributions change rapidly in (a) and (c), while remaining stable in (b) and (d).
  • Figure 2: The statistics estimation scheme of CaRe-BN. In this framework, Ca-BN is applied at every update step, while Re-BN is performed periodically. $\Delta^2$ denotes the squared error, Var represents the variance computed according to Eq. \ref{['Eq:var-approx']}, EMA refers to the exponential moving average in Eq. \ref{['Eq:D-update']}, and CA-EMA denotes the confidence-adaptive update defined in Eqs. \ref{['Eq:mean_update']} and \ref{['Eq:var_update']}.
  • Figure 3: Wasserstein distance between estimated BN statistics and the true distribution across layers, measured with CLIF neurons and the TD3 algorithm in the InvertedDoublePendulum-v4 environment. Shaded areas denote half a standard deviation over five runs. Curves are uniformly smoothed for visual clarity.
  • Figure 4: Exploration returns of BN and CaRe-BN with CLIF neurons and the TD3 algorithm across five MuJoCo tasks. Shaded areas represent half a standard deviation across five random seeds. Curves are uniformly smoothed for visual clarity.
  • Figure 5: Learning curves of SNN-based agents in continuous control trained with TD3 (top) and DDPG (bottom). Since the DDPG algorithm (in both ANN and SNN) diverges in the Ant-v4 environment, these curves are not shown. Shaded areas represent half a standard deviation across five random seeds. Curves are uniformly smoothed for visual clarity.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Proof 1