Table of Contents
Fetching ...

Spiking Neural Networks for event-based action recognition: A new task to understand their advantage

Alex Vicente-Sola, Davide L. Manna, Paul Kirkland, Gaetano Di Caterina, Trevor Bihl

TL;DR

This work investigates how Spiking Neural Networks (SNNs) can perform temporal feature extraction without recurrent connections by introducing DVS-Gesture-Chain (DVS-GC), a task that requires understanding the order of events in real neuromorphic data. By comparing feed-forward SNNs, recurrent SNNs, and various time-aware normalization and weighting schemes against non-spiking baselines, the study shows that SNNs can capture temporal dependencies beyond frame-accumulation, with leakage and reset mechanisms playing crucial roles. The DVS-Gesture results reveal that temporal processing can be achieved with SNNs and time-aware normalization, while DVS-GC demonstrates stronger temporal reasoning and transition detection, particularly when using leakage and appropriate reset strategies. The findings highlight the complementary roles of internal state, temporal attention, and normalization in temporal processing, offering practical guidance for designing energy-efficient, temporally capable neuromorphic systems and contributing to bridging neuroscience and machine learning perspectives.

Abstract

Spiking Neural Networks (SNN) are characterised by their unique temporal dynamics, but the properties and advantages of such computations are still not well understood. In order to provide answers, in this work we demonstrate how Spiking neurons can enable temporal feature extraction in feed-forward neural networks without the need for recurrent synapses, and how recurrent SNNs can achieve comparable results to LSTM with a smaller number of parameters. This shows how their bio-inspired computing principles can be successfully exploited beyond energy efficiency gains and evidences their differences with respect to conventional artificial neural networks. These results are obtained through a new task, DVS-Gesture-Chain (DVS-GC), which allows, for the first time, to evaluate the perception of temporal dependencies in a real event-based action recognition dataset. Our study proves how the widely used DVS Gesture benchmark can be solved by networks without temporal feature extraction when its events are accumulated in frames, unlike the new DVS-GC which demands an understanding of the order in which events happen. Furthermore, this setup allowed us to reveal the role of the leakage rate in spiking neurons for temporal processing tasks and demonstrated the benefits of "hard reset" mechanisms. Additionally, we also show how time-dependent weights and normalization can lead to understanding order by means of temporal attention.

Spiking Neural Networks for event-based action recognition: A new task to understand their advantage

TL;DR

This work investigates how Spiking Neural Networks (SNNs) can perform temporal feature extraction without recurrent connections by introducing DVS-Gesture-Chain (DVS-GC), a task that requires understanding the order of events in real neuromorphic data. By comparing feed-forward SNNs, recurrent SNNs, and various time-aware normalization and weighting schemes against non-spiking baselines, the study shows that SNNs can capture temporal dependencies beyond frame-accumulation, with leakage and reset mechanisms playing crucial roles. The DVS-Gesture results reveal that temporal processing can be achieved with SNNs and time-aware normalization, while DVS-GC demonstrates stronger temporal reasoning and transition detection, particularly when using leakage and appropriate reset strategies. The findings highlight the complementary roles of internal state, temporal attention, and normalization in temporal processing, offering practical guidance for designing energy-efficient, temporally capable neuromorphic systems and contributing to bridging neuroscience and machine learning perspectives.

Abstract

Spiking Neural Networks (SNN) are characterised by their unique temporal dynamics, but the properties and advantages of such computations are still not well understood. In order to provide answers, in this work we demonstrate how Spiking neurons can enable temporal feature extraction in feed-forward neural networks without the need for recurrent synapses, and how recurrent SNNs can achieve comparable results to LSTM with a smaller number of parameters. This shows how their bio-inspired computing principles can be successfully exploited beyond energy efficiency gains and evidences their differences with respect to conventional artificial neural networks. These results are obtained through a new task, DVS-Gesture-Chain (DVS-GC), which allows, for the first time, to evaluate the perception of temporal dependencies in a real event-based action recognition dataset. Our study proves how the widely used DVS Gesture benchmark can be solved by networks without temporal feature extraction when its events are accumulated in frames, unlike the new DVS-GC which demands an understanding of the order in which events happen. Furthermore, this setup allowed us to reveal the role of the leakage rate in spiking neurons for temporal processing tasks and demonstrated the benefits of "hard reset" mechanisms. Additionally, we also show how time-dependent weights and normalization can lead to understanding order by means of temporal attention.
Paper Structure (25 sections, 12 equations, 4 figures, 6 tables)

This paper contains 25 sections, 12 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Example gesture chain with variable $F_g$ duration ($\alpha_1=0.2$ and $\alpha_2=1$). The coloured underscores represent, for each gesture in the chain, the temporal window in which they could appear given the values of $\alpha_1$ and $\alpha_2$. This allows to understand why the gesture transition is not predictable and most time-steps have no guarantees of belonging to a certain position in the chain.
  • Figure 2: Value of the time weight in TW-ANN. Dotted lines highlight the gesture transition zone. The last two layers of the graph correspond to the layers in the residual connection downsampling. Trained in the 81-p DVS-GC.
  • Figure 3: (A): Diagram of a layer of LIF neurons. $layer$ are the synaptic weights, $SF$ the spiking function, $V_{res}$ the voltage reset value. Gray lines show the architecture with recurrent connections, without them, the architecture is feed-forward. (B): LSTM diagram. $C_t$ is the cell state, $h_t$ the hidden state and output, the yellow $tanh$ is a layer of synaptic weights with Hyperbolic Tangent activation. $\sigma$ stands for the gating layer with Sigmoid activation.
  • Figure 4: (A): Bias weight average value across channels in the BNTT layers of ANN-BNTT. (B): Value of the center of mass in the time dimension (60 time-steps) of the bias weight of the BNTT layers in ANN-BNTT. The last two layers of all graphs correspond to the layers in the residual connection downsampling. Trained in the 81-class DVS-GC.