Ultra-low-power Image Classification on Neuromorphic Hardware

Gregor Lenz; Garrick Orchard; Sadique Sheik

Ultra-low-power Image Classification on Neuromorphic Hardware

Gregor Lenz, Garrick Orchard, Sadique Sheik

TL;DR

The paper tackles the challenge of energy-efficient image classification by converting trained ANNs to SNNs using Time To First Spike (TTFS) temporal coding. It introduces Quartz, a TTFS-based ANN-to-SNN conversion that adds two simple synapses per neuron to stabilize spike timing, enabling efficient neuromorphic hardware deployment on Loihi. Simulation results on MNIST, CIFAR10, and ImageNet show competitive accuracy with drastically fewer spikes and operations, while Loihi experiments demonstrate favorable latency and energy- efficiency, with substantial improvements in the Energy-Delay Product relative to rate-coded approaches. The work provides practical considerations for normalization, zero-encoding, and hardware mapping, and releases open-source code for reproducibility and further exploration.

Abstract

Spiking neural networks (SNNs) promise ultra-low-power applications by exploiting temporal and spatial sparsity. The number of binary activations, called spikes, is proportional to the power consumed when executed on neuromorphic hardware. Training such SNNs using backpropagation through time for vision tasks that rely mainly on spatial features is computationally costly. Training a stateless artificial neural network (ANN) to then convert the weights to an SNN is a straightforward alternative when it comes to image recognition datasets. Most conversion methods rely on rate coding in the SNN to represent ANN activation, which uses enormous amounts of spikes and, therefore, energy to encode information. Recently, temporal conversion methods have shown promising results requiring significantly fewer spikes per neuron, but sometimes complex neuron models. We propose a temporal ANN-to-SNN conversion method, which we call Quartz, that is based on the time to first spike (TTFS). Quartz achieves high classification accuracy and can be easily implemented on neuromorphic hardware while using the least amount of synaptic operations and memory accesses. It incurs a cost of two additional synapses per neuron compared to previous temporal conversion methods, which are readily available on neuromorphic hardware. We benchmark Quartz on MNIST, CIFAR10, and ImageNet in simulation to show the benefits of our method and follow up with an implementation on Loihi, a neuromorphic chip by Intel. We provide evidence that temporal coding has advantages in terms of power consumption, throughput, and latency for similar classification accuracy. Our code and models are publicly available.

Ultra-low-power Image Classification on Neuromorphic Hardware

TL;DR

Abstract

Paper Structure (17 sections, 9 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 9 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Time to First Spike Conversion
Results in Simulation
MNIST
CIFAR10
ImageNet
Counting the Number of Operations and Estimating Power Consumption
Results on Loihi
Classification Accuracy
Power Measurements
Discussion
Methods
Bias-corrected Activation Normalization
Effect of Normalization Percentage on Accuracy
Author contributions
...and 2 more sections

Figures (7)

Figure 1: Quartz conversion scheme shown for 2 ANN units output1, output2 that have been converted to spiking neurons. Left: Connection architecture. The normalized ANN weights $w_{\{1,2\}}$ can be used as is to connect convolutional or fully connected layers. Inputs $x_{\{1,2\}}$ are encoded using latency coding in Eq. \ref{['eq:encoding']}. The rectifier injects a large current with $\beta \gg \sum w$ to force a neuron to spike if it hasn't yet at the last time step of a layer. The neuron output1 computes $\hat{y}_1 = \text{max}(0, w_1 x_1 + w_2 x_2)$, whereas output2 computes $\hat{y}_2 = \text{max}(0, w_2 x_1 + w_1 x_2)$. Right: Chronogram of the same network, with example inputs $x_1 = 0.75, x_2 = 0.25$ and weights $w_1=1, w_2=-1$. As soon as input spikes arrive at the output neurons, $i(t)$ increases according to the input weights (not shown) and $u(t)$ (shown in red) starts to ramp up. After $T_\text{max}$ time steps, the encoding phase is completed and $u(t)$ now represents the value that neuron is supposed to output. For the next $T_\text{max}$ time steps, we decode that membrane potential into a spike time. The readout neuron ensures that all input currents for a neuron are balanced by injecting a current that is the negative sum of all inputs plus a constant. Whereas output1 outputs the expected $1\times0.75 - 1\times0.25=0.5$, output2 with $1\times0.25 - 1\times0.75=-0.5$ is forced to spike early by injecting a high current at time step $2 T_\text{max}$. This spike coincides with the readout of the next layer (not shown here), where their effects will cancel out because the output is $0$. For this diagram $T_\text{max}$ is assumed to be large such that transmission delays are negligible.
Figure 2: Classification accuracy error of SNN converted using Quartz as a function of $T_\text{max}$ time steps per layer. Choosing a larger number of time steps will reduce quantization error but increase latency. x axis is plotted logarithmically from $T_\text{max}$ of $1$ to $100$.
Figure 3: Classification accuracy error in percent over the number of operations per image for three different datasets. The floating point operations in the pre-trained ANNs are counted using Meta's fvcore tool. Operations in SNNs are counted according to Equation \ref{['eq:ops-count']} and are additions only. By exploiting sparsity in the activation, we can drastically reduce the number of overall operations needed, and observe a error/operation trade-off depending on the amount of time steps per layer chosen ($T_\text{max}$).
Figure 4: Weighing the number of addition or MAC operations per frame by their respective energy cost, we observe a significant reduction in dynamic energy for Quartz SNNs. Much like in Figure \ref{['fig:n_ops']}, we observe a trade-off between the classification accuracy error and dynamic energy, chosen through the number of time steps per layer.
Figure 5: Energy-delay product (EDP) normalized to GPU usage, for exact numbers see Table \ref{['tab:power-measurements']}.
...and 2 more figures

Ultra-low-power Image Classification on Neuromorphic Hardware

TL;DR

Abstract

Ultra-low-power Image Classification on Neuromorphic Hardware

Authors

TL;DR

Abstract

Table of Contents

Figures (7)