Table of Contents
Fetching ...

Event-based Optical Flow on Neuromorphic Processor: ANN vs. SNN Comparison based on Activation Sparsification

Yingfu Xu, Guangzhi Tang, Amirreza Yousefzadeh, Guido de Croon, Manolis Sifalakis

TL;DR

This work tackles the fairness gap in comparing ANN and SNN performance for event-based optical flow by training both networks with activation sparsification on a unified neuromorphic platform, SENECA. The authors introduce trainable FATReLU thresholds and sparsification regularizers to drive activation densities down to roughly 5% without sacrificing accuracy, enabling an apples-to-apples hardware comparison. Hardware experiments show the SNN achieves about 62.5% of the time and 75.2% of the energy of the ANN, with a substantially lower pixel-level spike density (43.5% vs. 66.5%), primarily due to sparser activations. The results demonstrate that, on a fair hardware platform, SNNs can outperform ANNs in time and energy for regression tasks on event-based vision, aided by event-driven processing and data reuse strategies intrinsic to neuromorphic processors.

Abstract

Spiking neural networks (SNNs) for event-based optical flow are claimed to be computationally more efficient than their artificial neural networks (ANNs) counterparts, but a fair comparison is missing in the literature. In this work, we propose an event-based optical flow solution based on activation sparsification and a neuromorphic processor, SENECA. SENECA has an event-driven processing mechanism that can exploit the sparsity in ANN activations and SNN spikes to accelerate the inference of both types of neural networks. The ANN and the SNN for comparison have similar low activation/spike density (~5%) thanks to our novel sparsification-aware training. In the hardware-in-loop experiments designed to deduce the average time and energy consumption, the SNN consumes 44.9ms and 927.0 microjoules, which are 62.5% and 75.2% of the ANN's consumption, respectively. We find that SNN's higher efficiency attributes to its lower pixel-wise spike density (43.5% vs. 66.5%) that requires fewer memory access operations for neuron states.

Event-based Optical Flow on Neuromorphic Processor: ANN vs. SNN Comparison based on Activation Sparsification

TL;DR

This work tackles the fairness gap in comparing ANN and SNN performance for event-based optical flow by training both networks with activation sparsification on a unified neuromorphic platform, SENECA. The authors introduce trainable FATReLU thresholds and sparsification regularizers to drive activation densities down to roughly 5% without sacrificing accuracy, enabling an apples-to-apples hardware comparison. Hardware experiments show the SNN achieves about 62.5% of the time and 75.2% of the energy of the ANN, with a substantially lower pixel-level spike density (43.5% vs. 66.5%), primarily due to sparser activations. The results demonstrate that, on a fair hardware platform, SNNs can outperform ANNs in time and energy for regression tasks on event-based vision, aided by event-driven processing and data reuse strategies intrinsic to neuromorphic processors.

Abstract

Spiking neural networks (SNNs) for event-based optical flow are claimed to be computationally more efficient than their artificial neural networks (ANNs) counterparts, but a fair comparison is missing in the literature. In this work, we propose an event-based optical flow solution based on activation sparsification and a neuromorphic processor, SENECA. SENECA has an event-driven processing mechanism that can exploit the sparsity in ANN activations and SNN spikes to accelerate the inference of both types of neural networks. The ANN and the SNN for comparison have similar low activation/spike density (~5%) thanks to our novel sparsification-aware training. In the hardware-in-loop experiments designed to deduce the average time and energy consumption, the SNN consumes 44.9ms and 927.0 microjoules, which are 62.5% and 75.2% of the ANN's consumption, respectively. We find that SNN's higher efficiency attributes to its lower pixel-wise spike density (43.5% vs. 66.5%) that requires fewer memory access operations for neuron states.
Paper Structure (25 sections, 2 equations, 12 figures, 4 tables)

This paper contains 25 sections, 2 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: FireNet network architecture used in this work for event-based optical flow prediction. As stated at the top left of this figure, for ANN, the output tensors of conv layers are input to the FATReLU activation function. For SNN, the spike tensors are integrated into the membrane potential of the LIF neurons as synaptic input currents. When the input tensor of a conv layer is sparse (blue arrow), a considerable number of pixel locations of the output tensor are not updated by the input tensor. A green arrow indicate such a sparsely updated tensor. In hagenaars2021self, the flow prediction layer has the TanH activation function. We replace it with Softsign due to its more efficient implementation on SENECA.
  • Figure 2: The correlation between prediction accuracy and ac./sp. density. The accuracy metric is the AEE over the whole test set. The shown neuron-wise and pixel-wise density ($x$-axis) is the average of all the layers and all testing samples. The networks shown in both subplots are the same. "vol." in the legend means that the networks are trained with the $L$1 regularizer for neuron membrane voltage. "vol.&thre." means the sparsification loss involves both the $L$1 voltage regularizer and the $L$2 regularizer that encourages the ANN's FATReLU thresholds or SNN's firing thresholds to grow. There is no network whose neuron density is between 6% and 10% so the $x$-axis of the left subplot is truncated.
  • Figure 3: Time and energy cost of the controlled experiments. In the bottom two subplots, 151 pixels have at least one ac./sp..
  • Figure 4: Layer-wise distribution of pixel density (left) and distribution of the numbers of ac./sp. per pixel (right) based on the data logged in testing.
  • Figure 5: Visualization of event-based optical flow prediction on two testing event frames. The first column shows the accumulated camera spikes (event frames) that are the input to the networks. Events within the time bin of 12.5 milliseconds are accumulated in the image frame. Events in red or green capture brightness changes in two polarities (turn brighter and turn darker). The 2nd to 8th columns show the ac./sp. density of each pixel. For layers 1 to 7, their ac./sp. tensors have 32 channels. If there are no non-zero ac./sp. at a pixel, its color is black. If there are 8 or more than 8 channels that have non-zero ac./sp. at a pixel, its color is white. The bigger the number of ac./sp., the brighter the pixel. The second column from the right shows the optical flow prediction from the network. Only pixels that have at least one input event have an optical flow prediction, which is a 2-dimensional vector in the image plane, encoded by colors as shown in Fig. \ref{['fig:color_wheel']}. The column on the right shows the dense ground truth optical flow measured by other sensors, provided by the testing dataset zhu2018multivehicle.
  • ...and 7 more figures