Table of Contents
Fetching ...

The Role of Temporal Hierarchy in Spiking Neural Networks

Filippo Moro, Pau Vilimelis Aceituno, Laura Kriener, Melika Payvand

TL;DR

Spiking Neural Networks (SNNs) can leverage time as an additional computation axis. This work proposes a temporal hierarchy across hidden layers by organizing processing speeds via $\tau(l)$ and by structuring temporal convolutions, and demonstrates that this inductive bias improves accuracy on temporally structured tasks, with gains up to about $4\%$ on SHD and competitive results on SSC. Importantly, the hierarchy can emerge naturally when time constants are learned through gradient-based optimization, and a kernel-size/dilation hierarchy in temporal convolutions yields strong performance with fewer parameters. The findings have practical hardware implications, suggesting substantial model-size reductions (e.g., up to $6\times$ smaller) while maintaining accuracy, and point to extensions to richer neuron models and more complex temporal data.

Abstract

Spiking Neural Networks (SNNs) have the potential for rich spatio-temporal signal processing thanks to exploiting both spatial and temporal parameters. The temporal dynamics such as time constants of the synapses and neurons and delays have been recently shown to have computational benefits that help reduce the overall number of parameters required in the network and increase the accuracy of the SNNs in solving temporal tasks. Optimizing such temporal parameters, for example, through gradient descent, gives rise to a temporal architecture for different problems. As has been shown in machine learning, to reduce the cost of optimization, architectural biases can be applied, in this case in the temporal domain. Such inductive biases in temporal parameters have been found in neuroscience studies, highlighting a hierarchy of temporal structure and input representation in different layers of the cortex. Motivated by this, we propose to impose a hierarchy of temporal representation in the hidden layers of SNNs, highlighting that such an inductive bias improves their performance. We demonstrate the positive effects of temporal hierarchy in the time constants of feed-forward SNNs applied to temporal tasks (Multi-Time-Scale XOR and Keyword Spotting, with a benefit of up to 4.1% in classification accuracy). Moreover, we show that such architectural biases, i.e. hierarchy of time constants, naturally emerge when optimizing the time constants through gradient descent, initialized as homogeneous values. We further pursue this proposal in temporal convolutional SNNs, by introducing the hierarchical bias in the size and dilation of temporal kernels, giving rise to competitive results in popular temporal spike-based datasets.

The Role of Temporal Hierarchy in Spiking Neural Networks

TL;DR

Spiking Neural Networks (SNNs) can leverage time as an additional computation axis. This work proposes a temporal hierarchy across hidden layers by organizing processing speeds via and by structuring temporal convolutions, and demonstrates that this inductive bias improves accuracy on temporally structured tasks, with gains up to about on SHD and competitive results on SSC. Importantly, the hierarchy can emerge naturally when time constants are learned through gradient-based optimization, and a kernel-size/dilation hierarchy in temporal convolutions yields strong performance with fewer parameters. The findings have practical hardware implications, suggesting substantial model-size reductions (e.g., up to smaller) while maintaining accuracy, and point to extensions to richer neuron models and more complex temporal data.

Abstract

Spiking Neural Networks (SNNs) have the potential for rich spatio-temporal signal processing thanks to exploiting both spatial and temporal parameters. The temporal dynamics such as time constants of the synapses and neurons and delays have been recently shown to have computational benefits that help reduce the overall number of parameters required in the network and increase the accuracy of the SNNs in solving temporal tasks. Optimizing such temporal parameters, for example, through gradient descent, gives rise to a temporal architecture for different problems. As has been shown in machine learning, to reduce the cost of optimization, architectural biases can be applied, in this case in the temporal domain. Such inductive biases in temporal parameters have been found in neuroscience studies, highlighting a hierarchy of temporal structure and input representation in different layers of the cortex. Motivated by this, we propose to impose a hierarchy of temporal representation in the hidden layers of SNNs, highlighting that such an inductive bias improves their performance. We demonstrate the positive effects of temporal hierarchy in the time constants of feed-forward SNNs applied to temporal tasks (Multi-Time-Scale XOR and Keyword Spotting, with a benefit of up to 4.1% in classification accuracy). Moreover, we show that such architectural biases, i.e. hierarchy of time constants, naturally emerge when optimizing the time constants through gradient descent, initialized as homogeneous values. We further pursue this proposal in temporal convolutional SNNs, by introducing the hierarchical bias in the size and dilation of temporal kernels, giving rise to competitive results in popular temporal spike-based datasets.
Paper Structure (11 sections, 8 equations, 9 figures, 4 tables)

This paper contains 11 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Temporal hierarchy in Spiking Neural Networks (SNNs). a) Multi-Layer SNN featuring $\mathcal{N}$ hidden layers, each colored differently, highlighting the difference in temporal processing speed of the neurons in the hidden layers, from Fast in the first hidden layers to Slow in the deeper layers. b) Time constant $\mathbf{\tau}$ determining the speed of neuronal dynamics in Leaky-Integrate-and-Fire (LIF) spiking neurons. The differential equation describes the dynamics for the membrane voltage $u(t)$, which evolves with a time constant $\mathbf{\tau}$. c) Temporal processing expressed in temporal convolutions by varying the weights' kernel $W_k$, affecting the state of neurons $u(t)$ in time.
  • Figure 2: Temporal hierarchy through the time constants of SNN. a) Sketch of the Multi-Time-Scale XOR (MTS-XOR) task. b) Classification accuracy on MST-XOR as a function of the homogeneous time constant through the hidden layers. c) Classification accuracy when temporal hierarchy is introduced in multi-layer SNN with varying amplitude ($\color{ForestGreen}\Delta \tau$). Positive $\color{ForestGreen}\Delta \tau$ improves the classification accuracy. d) Sample from the Spiking Heidelberg Digits (SHD) dataset. e) The time constant hierarchy is parametrized according to a tanh function, with steepness ($\color{Purple}s$) and centering ($\color{blue}c$) parameters. f) The effect of such parameters is tested on the SHD task, with ${\color{ForestGreen}\Delta \tau}=[retain-explicit-plus]{+150}{\milli\second}$. In all cases, hierarchy has a positive effect compared to the reference performance (${\color{ForestGreen}\Delta \tau }=0ms$). g) Temporal hierarchy is applied to networks of varying size, showing that an SNN with 32 hidden neurons per layer and temporal hierarchy performs as well as a bigger SNN with 128 neurons per layer without temporal hierarchy. Error bars show the quartiles over 10 trials in all plots.
  • Figure 3: Optimizing the time constants in SNN. a) Multi-layer SNN are initialized with the same distribution of time constants through the hidden layers. Each time constant is optimized individually. b) Probability Density Function for the optimized time constants in a 3-hidden-layer SNN solving the MTS-XOR task. The median time constant grows in each layer, indicating the formation of a hierarchy. c) Mean time constant from 5-hidden-layer SNN solving the SHD task. The mean grows indicating the hierarchy through the network. Results are averaged from 5 trials in both b) and c).
  • Figure 4: Temporal hierarchy in convolutional Spiking Neural Networks. a) Kernel size can vary in 1D-Causal convolution. One can increase the kernel size across hidden layers, forming a hierarchy of temporal representation. b) Effect of the hierarchy of kernel size on the SHD task. Positive hierarchy increases the classification accuracy. c) Dilation in Causal Temporal Convolution controls the size of the receptive field. One can construct a temporal hierarchy by progressively enlarging the dilation across the hidden layers. d) Effect of the hierarchy of dilations on the SHD task. Incrementing dilation through the hidden layers improves the performance of the network. Error bars show the quartiles over 10 trials.
  • Figure S1: a) Accuracy of SNNs with positive ($\Delta \tau = [retain-explicit-plus]{+500}{\milli\second}$) and negative ($\Delta \tau = [retain-explicit-plus]{-500}{\milli\second}$) hierarchy of time constant, as a function of the background noise rate on the MTS-XOR task. Shaded areas show the quartiles over 5 trials. b) Test accuracy on the SHD task for a network with homogeneous time constants through the layers. c) Hierarchy shapes of the $\tanh$ function that maximize classification accuracy. Error bars show the quartiles over 5 trials in b) and c). d) The $\tanh$ function parameterized by the 5 combinations of steepness and centering that yield the best results in Figure \ref{['fig:Figure2']}f.
  • ...and 4 more figures