Table of Contents
Fetching ...

Scaling SNNs Trained Using Equilibrium Propagation to Convolutional Architectures

Jiaqi Lin, Malyaban Bal, Abhronil Sengupta

TL;DR

Problem: Scale Equilibrium Propagation (EP) to convolutional spiking convergent RNNs while preserving biological plausibility. Approach: formulate EP for conv SNNs within a Hopfield energy framework, introduce a Sigma-Delta spiking module and a spiking convolutional layer, and diagnose pooling-induced gradient misalignment, proposing average pooling with nearest-neighbor upsampling. Key theoretical result: the EP gradient estimator is $\Delta w = \frac{1}{2\beta} \left( \frac{\partial E(\xi^{\beta})}{\partial w} - \frac{\partial E(\xi^{-{\beta}})}{\partial w} \right)$, with a negative-nudge phase to cancel bias, converging toward BPTT as $\beta \to 0$. Experimental contribution: on MNIST ($0.97\%$) and FashionMNIST ($8.89\%$) with a 2C2FC conv-spiking network, EP achieves competitive accuracy with lower memory than BPTT, validating the approach for on-chip training. Significance: demonstrates scalable, biologically plausible training of conv SNNs using EP, bridging spiking and non-spiking convergent networks and enabling practical energy-efficient learning on neuromorphic hardware.

Abstract

Equilibrium Propagation (EP) is a biologically plausible local learning algorithm initially developed for convergent recurrent neural networks (RNNs), where weight updates rely solely on the connecting neuron states across two phases. The gradient calculations in EP have been shown to approximate the gradients computed by Backpropagation Through Time (BPTT) when an infinitesimally small nudge factor is used. This property makes EP a powerful candidate for training Spiking Neural Networks (SNNs), which are commonly trained by BPTT. However, in the spiking domain, previous studies on EP have been limited to architectures involving few linear layers. In this work, for the first time we provide a formulation for training convolutional spiking convergent RNNs using EP, bridging the gap between spiking and non-spiking convergent RNNs. We demonstrate that for spiking convergent RNNs, there is a mismatch in the maximum pooling and its inverse operation, leading to inaccurate gradient estimation in EP. Substituting this with average pooling resolves this issue and enables accurate gradient estimation for spiking convergent RNNs. We also highlight the memory efficiency of EP compared to BPTT. In the regime of SNNs trained by EP, our experimental results indicate state-of-the-art performance on the MNIST and FashionMNIST datasets, with test errors of 0.97% and 8.89%, respectively. These results are comparable to those of convergent RNNs and SNNs trained by BPTT. These findings underscore EP as an optimal choice for on-chip training and a biologically-plausible method for computing error gradients.

Scaling SNNs Trained Using Equilibrium Propagation to Convolutional Architectures

TL;DR

Problem: Scale Equilibrium Propagation (EP) to convolutional spiking convergent RNNs while preserving biological plausibility. Approach: formulate EP for conv SNNs within a Hopfield energy framework, introduce a Sigma-Delta spiking module and a spiking convolutional layer, and diagnose pooling-induced gradient misalignment, proposing average pooling with nearest-neighbor upsampling. Key theoretical result: the EP gradient estimator is , with a negative-nudge phase to cancel bias, converging toward BPTT as . Experimental contribution: on MNIST () and FashionMNIST () with a 2C2FC conv-spiking network, EP achieves competitive accuracy with lower memory than BPTT, validating the approach for on-chip training. Significance: demonstrates scalable, biologically plausible training of conv SNNs using EP, bridging spiking and non-spiking convergent networks and enabling practical energy-efficient learning on neuromorphic hardware.

Abstract

Equilibrium Propagation (EP) is a biologically plausible local learning algorithm initially developed for convergent recurrent neural networks (RNNs), where weight updates rely solely on the connecting neuron states across two phases. The gradient calculations in EP have been shown to approximate the gradients computed by Backpropagation Through Time (BPTT) when an infinitesimally small nudge factor is used. This property makes EP a powerful candidate for training Spiking Neural Networks (SNNs), which are commonly trained by BPTT. However, in the spiking domain, previous studies on EP have been limited to architectures involving few linear layers. In this work, for the first time we provide a formulation for training convolutional spiking convergent RNNs using EP, bridging the gap between spiking and non-spiking convergent RNNs. We demonstrate that for spiking convergent RNNs, there is a mismatch in the maximum pooling and its inverse operation, leading to inaccurate gradient estimation in EP. Substituting this with average pooling resolves this issue and enables accurate gradient estimation for spiking convergent RNNs. We also highlight the memory efficiency of EP compared to BPTT. In the regime of SNNs trained by EP, our experimental results indicate state-of-the-art performance on the MNIST and FashionMNIST datasets, with test errors of 0.97% and 8.89%, respectively. These results are comparable to those of convergent RNNs and SNNs trained by BPTT. These findings underscore EP as an optimal choice for on-chip training and a biologically-plausible method for computing error gradients.
Paper Structure (9 sections, 16 equations, 3 figures, 2 tables)

This paper contains 9 sections, 16 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Information transmission and retention between two consecutive neurons during a period of 2 time steps. Brown color represents information at $t$, and purple color represents information at $t+1$.
  • Figure 2: Activation summation of middle 3 convolutional layers with 32-64-128 channels in both forward route and backward route during nudge phase of a five-layer convolutional architecture. The activations are mean over 2000 random training samples from the MNIST dataset and summed across spatial dimensions of the convolutional layers. (A) Activation summation of convergent non-spiking RNNs equipped with maximum pooling and unpooling operations; (B) Activation summation of SNNs equipped with maximum pooling and unpooling operations; (C) Activation summation of SNNs equipped with average pooling and its inverse operator.
  • Figure 3: Mean and standard deviation (Std Dev) of activations $X_{\text{forward}}$, $Y_{\text{forward}}$, $X_{\text{backward}}$, and $Y_{\text{backward}}$ of middle 3 convolutional layers with 32-64-128 channels during nudge phase of a five-layer convolutional architecture equipped with maximum pooling and unpooling operations. The average and standard deviation is calculated over 2000 samples from the MNIST dataset. (A) Mean activation of convergent RNNs; (B) Mean activation of SNNs; (C) Standard deviation of convergent RNNs' activations; (D) Standard deviation of SNNs' activations.