Table of Contents
Fetching ...

A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network

Chih-Chyau Yang, Tian-Sheuan Chang

TL;DR

The paper tackles edge-friendly speech recognition by introducing a recurrent spiking neural network (RSNN) with two time steps and two recurrent layers plus a fully connected layer, designed for ultra-low power operation. Through algorithm-hardware co-optimization—including parallel time steps, merged spike computation, mixed-level pruning, and 4-bit quantization—the authors achieve a 2.79 MB baseline model reduced to 0.1 MB and a dramatic decrease in compute, enabling real-time processing at 100 kHz with 71.2 μW. The hardware architecture uses reconfigurable zero-skipping with input broadcasting and two PE sets to maximize weight reuse, delivering 28.41 TOPS/W and 1903 GOPS/mm^2 at 500 MHz. Experimental results on the TIMIT dataset show minimal accuracy loss with compression and substantial reductions in latency and energy, outperforming state-of-the-art designs in power and area efficiency. The work demonstrates a viable path to ultra-low-power edge ASR via tightly integrated RSNN models and hardware that exploit spike sparsity and time-step parallelism.

Abstract

This paper introduces a 71.2-$μ$W speech recognition accelerator designed for edge devices' real-time applications, emphasizing an ultra low power design. Achieved through algorithm and hardware co-optimizations, we propose a compact recurrent spiking neural network with two recurrent layers, one fully connected layer, and a low time step (1 or 2). The 2.79-MB model undergoes pruning and 4-bit fixed-point quantization, shrinking it by 96.42\% to 0.1 MB. On the hardware front, we take advantage of \textit{mixed-level pruning}, \textit{zero-skipping} and \textit{merged spike} techniques, reducing complexity by 90.49\% to 13.86 MMAC/S. The \textit{parallel time-step execution} addresses inter-time-step data dependencies and enables weight buffer power savings through weight sharing. Capitalizing on the sparse spike activity, an input broadcasting scheme eliminates zero computations, further saving power. Implemented on the TSMC 28-nm process, the design operates in real time at 100 kHz, consuming 71.2 $μ$W, surpassing state-of-the-art designs. At 500 MHz, it has 28.41 TOPS/W and 1903.11 GOPS/mm$^2$ in energy and area efficiency, respectively.

A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network

TL;DR

The paper tackles edge-friendly speech recognition by introducing a recurrent spiking neural network (RSNN) with two time steps and two recurrent layers plus a fully connected layer, designed for ultra-low power operation. Through algorithm-hardware co-optimization—including parallel time steps, merged spike computation, mixed-level pruning, and 4-bit quantization—the authors achieve a 2.79 MB baseline model reduced to 0.1 MB and a dramatic decrease in compute, enabling real-time processing at 100 kHz with 71.2 μW. The hardware architecture uses reconfigurable zero-skipping with input broadcasting and two PE sets to maximize weight reuse, delivering 28.41 TOPS/W and 1903 GOPS/mm^2 at 500 MHz. Experimental results on the TIMIT dataset show minimal accuracy loss with compression and substantial reductions in latency and energy, outperforming state-of-the-art designs in power and area efficiency. The work demonstrates a viable path to ultra-low-power edge ASR via tightly integrated RSNN models and hardware that exploit spike sparsity and time-step parallelism.

Abstract

This paper introduces a 71.2-W speech recognition accelerator designed for edge devices' real-time applications, emphasizing an ultra low power design. Achieved through algorithm and hardware co-optimizations, we propose a compact recurrent spiking neural network with two recurrent layers, one fully connected layer, and a low time step (1 or 2). The 2.79-MB model undergoes pruning and 4-bit fixed-point quantization, shrinking it by 96.42\% to 0.1 MB. On the hardware front, we take advantage of \textit{mixed-level pruning}, \textit{zero-skipping} and \textit{merged spike} techniques, reducing complexity by 90.49\% to 13.86 MMAC/S. The \textit{parallel time-step execution} addresses inter-time-step data dependencies and enables weight buffer power savings through weight sharing. Capitalizing on the sparse spike activity, an input broadcasting scheme eliminates zero computations, further saving power. Implemented on the TSMC 28-nm process, the design operates in real time at 100 kHz, consuming 71.2 W, surpassing state-of-the-art designs. At 500 MHz, it has 28.41 TOPS/W and 1903.11 GOPS/mm in energy and area efficiency, respectively.

Paper Structure

This paper contains 24 sections, 3 equations, 20 figures, 3 tables.

Figures (20)

  • Figure 1: The proposed RSNN spanning two time steps
  • Figure 2: Computation complexity and weight size of the proposed RSNN model
  • Figure 3: Data dependencies across time steps and network layers. Note that * indicates that the value of $\text{Membrane}[ts-1]$ is adjusted by $\beta$ and $\text{Spike}[ts-1]$.
  • Figure 4: System architecture
  • Figure 5: Reconfigurable zero-skipping: (a) type-A for the input features; (b) type-B for the single time step; (c) type-C: two time steps for the FC layer; (d) type-D: two time steps for the recurrent layer.
  • ...and 15 more figures