Table of Contents
Fetching ...

Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks

Ziqiao Yu, Pengfei Sun, Danyal Akarca, Dan F. M. Goodman

TL;DR

This work interrogates whether surrogate-gradient trained SNNs exploit spike timing beyond firing rate by using synthetic timing benchmarks and timing-normalized speech datasets. It demonstrates that Surrogate GD can learn fine-grained timing features such as ISIs, cross-channel ISIs, and coincidences, and that incorporating trainable axonal delays further enhances learning, especially for long timescales. In realistic datasets (SHD/SSC), timing information persists even after rate normalization, and delay-based networks show increased sensitivity to temporal order and cross-channel cues, underscoring the value of temporal coding in SNNs. The authors also provide timing-focused data resources to foster future exploration of temporal-spike coding in neuromorphic computing.

Abstract

The surrogate gradient descent algorithm enabled spiking neural networks to be trained to carry out challenging sensory processing tasks, an important step in understanding how spikes contribute to neural computations. However, it is unclear the extent to which these algorithms fully explore the space of possible spiking solutions to problems. We investigated whether spiking networks trained with surrogate gradient descent can learn to make use of information that is only encoded in the timing and not the rate of spikes. We constructed synthetic datasets with a range of types of spike timing information (interspike intervals, spatio-temporal spike patterns or polychrony, and coincidence codes). We find that surrogate gradient descent training can extract all of these types of information. In more realistic speech-based datasets, both timing and rate information is present. We therefore constructed variants of these datasets in which all rate information is removed, and find that surrogate gradient descent can still perform well. We tested all networks both with and without trainable axonal delays. We find that delays can give a significant increase in performance, particularly for more challenging tasks. To determine what types of spike timing information are being used by the networks trained on the speech-based tasks, we test these networks on time-reversed spikes which perturb spatio-temporal spike patterns but leave interspike intervals and coincidence information unchanged. We find that when axonal delays are not used, networks perform well under time reversal, whereas networks trained with delays perform poorly. This suggests that spiking neural networks with delays are better able to exploit temporal structure. To facilitate further studies of temporal coding, we have released our modified speech-based datasets.

Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks

TL;DR

This work interrogates whether surrogate-gradient trained SNNs exploit spike timing beyond firing rate by using synthetic timing benchmarks and timing-normalized speech datasets. It demonstrates that Surrogate GD can learn fine-grained timing features such as ISIs, cross-channel ISIs, and coincidences, and that incorporating trainable axonal delays further enhances learning, especially for long timescales. In realistic datasets (SHD/SSC), timing information persists even after rate normalization, and delay-based networks show increased sensitivity to temporal order and cross-channel cues, underscoring the value of temporal coding in SNNs. The authors also provide timing-focused data resources to foster future exploration of temporal-spike coding in neuromorphic computing.

Abstract

The surrogate gradient descent algorithm enabled spiking neural networks to be trained to carry out challenging sensory processing tasks, an important step in understanding how spikes contribute to neural computations. However, it is unclear the extent to which these algorithms fully explore the space of possible spiking solutions to problems. We investigated whether spiking networks trained with surrogate gradient descent can learn to make use of information that is only encoded in the timing and not the rate of spikes. We constructed synthetic datasets with a range of types of spike timing information (interspike intervals, spatio-temporal spike patterns or polychrony, and coincidence codes). We find that surrogate gradient descent training can extract all of these types of information. In more realistic speech-based datasets, both timing and rate information is present. We therefore constructed variants of these datasets in which all rate information is removed, and find that surrogate gradient descent can still perform well. We tested all networks both with and without trainable axonal delays. We find that delays can give a significant increase in performance, particularly for more challenging tasks. To determine what types of spike timing information are being used by the networks trained on the speech-based tasks, we test these networks on time-reversed spikes which perturb spatio-temporal spike patterns but leave interspike intervals and coincidence information unchanged. We find that when axonal delays are not used, networks perform well under time reversal, whereas networks trained with delays perform poorly. This suggests that spiking neural networks with delays are better able to exploit temporal structure. To facilitate further studies of temporal coding, we have released our modified speech-based datasets.

Paper Structure

This paper contains 25 sections, 7 equations, 7 figures.

Figures (7)

  • Figure 1: ISI and CCISI-based datasets. (a) Shared feature space spanned by firing rate and ISI/CCISI interval. A full decision boundary (black) separates classes more effectively than a rate-only threshold (gray). (b) Example ISI spike trains with fixed intervals (top) and their randomized versions with $f=0.5$ (bottom). (c) Example CCISI spike trains with fixed cross-neuron delays (top) and their randomized counterparts with $f=0.5$ (bottom). (d) Test accuracy under increasing spike timing perturbation $f$ (evaluated at maximum interval of 50 ms), comparing models with a learnable membrane constant $\tau$ and those augmented with delays. (e) Accuracy across increasing maximum ISI/CCISI intervals.
  • Figure 2: Coincidence-based dataset. (a) Spike count distributions for ON/OFF windows at $\lambda=0$ and $\lambda=1$. As $\lambda$ increases, ON and OFF distributions converge. (b) Spike raster plots for each class under low ($\lambda=0$) and moderate ($\lambda=0.5$) synchrony overlap. Red lines mark group boundaries. Light gray and dark gray bands indicate ON and OFF neuron groups, respectively. (c) Test accuracy across varying synchrony overlap $\lambda$, comparing models with a learnable $\tau$ versus $\tau$+delay.
  • Figure 3: Example spike rasters of auditory datasets SHD (top) and SSC (bottom) across the Whole $\rightarrow$ Part $\rightarrow$ Norm normalization stages. Neuron selection and spike count normalization progressively remove rate-based information while retaining temporal structure. Whole (left) is the original dataset. Part (middle) removes neurons whose spike counts are sometimes very low. Norm (right) randomly selects a fixed number of spikes (different for each neuron) leaving only spike timing information and no rate information.
  • Figure 4: Comparison of Feedforward SNN architectures trained without delays (SGD, top) and with delays (SGD-delay, bottom). Both models adapt the surrogate gradient descent methods. The top row shows a model with no delay components. The bottom row inserts learnable axonal delays between layers. Dashed outlines are used purely as a visual aid to highlight the architectural comparison between the two variants.
  • Figure 5: Test accuracy under spike timing perturbation $f$ on SHD (left) and SSC (right) spiking speech datasets for models with (solid lines) and without delays (dashed lines). Performance is shown for the original dataset (whole=blue), on a subset of neurons selected for having a minimum number of spikes (part=green), and on a variant where spike count information has been fully removed (norm=red). For the norm variant, spike count information is fully removed, so the accuracy of an MLP trained on spikes counts (grey dashed) corresponds to chance level (5% for SHD, 2.8% for SSC).
  • ...and 2 more figures