Table of Contents
Fetching ...

DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

TL;DR

This paper introduces DeltaKWS, which is to the best of the authors' knowledge, the first RNN-enabled, fine-grained, temporal sparsity-aware Keyword Spotting (KWS) integrated circuit (IC) designed for voice-controlled devices.

Abstract

This paper introduces DeltaKWS, to the best of our knowledge, the first $Δ$RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network ($Δ$RNN) classifier leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses; 2) an IIR BPF-based FEx that leverages mixed-precision quantization, low-cost computing structure and channel selection; 3) a 24 kB 0.6 V near-$V_\text{TH}$ weight SRAM that achieves 6.6X lower read power than the foundry-provided SRAM. From chip measurement results, we show that the DeltaKWS achieves an 11/12-class GSCD accuracy of 90.5%/89.5% respectively and energy consumption of 36 nJ/decision in 65 nm CMOS process. At 87% temporal sparsity, computing latency and energy/inference are reduced by 2.4X/3.4X, respectively. The IIR BPF-based FEx, $Δ$RNN accelerator, and 24 kB near-$V_\text{TH}$ SRAM blocks occupy 0.084 mm$^{2}$, 0.319 mm$^{2}$, and 0.381 mm$^{2}$ respectively (0.78 mm$^{2}$ in total).

DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

TL;DR

This paper introduces DeltaKWS, which is to the best of the authors' knowledge, the first RNN-enabled, fine-grained, temporal sparsity-aware Keyword Spotting (KWS) integrated circuit (IC) designed for voice-controlled devices.

Abstract

This paper introduces DeltaKWS, to the best of our knowledge, the first RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network (RNN) classifier leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses; 2) an IIR BPF-based FEx that leverages mixed-precision quantization, low-cost computing structure and channel selection; 3) a 24 kB 0.6 V near- weight SRAM that achieves 6.6X lower read power than the foundry-provided SRAM. From chip measurement results, we show that the DeltaKWS achieves an 11/12-class GSCD accuracy of 90.5%/89.5% respectively and energy consumption of 36 nJ/decision in 65 nm CMOS process. At 87% temporal sparsity, computing latency and energy/inference are reduced by 2.4X/3.4X, respectively. The IIR BPF-based FEx, RNN accelerator, and 24 kB near- SRAM blocks occupy 0.084 mm, 0.319 mm, and 0.381 mm respectively (0.78 mm in total).
Paper Structure (13 sections, 13 figures, 2 tables)

This paper contains 13 sections, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Overall architecture of proposed temporally-sparse $\Delta$RNN KWS IC with IIR BPF-based time-domain FEx and near-$V_\text{TH}$ weight SRAM.
  • Figure 2: Concept of $\Delta$ network: (a) $\Delta$ neurons along time axis. The bottom row illustrates the process by which neuron states are determined across frames in a GRU network, showing the temporal differences that drive these state changes. The middle row shows the resulting neuron states across frames in the GRU network, indicating which neurons are activated (dark blue) or inactivated (light blue) based on these temporal differences. The upper row provides a detailed view, indicating that a neuron is only activated when its temporal difference exceeds the defined threshold. (b) The $\Delta$GRU structure, with 10 input channels, processes data through a $\Delta$RNN layer containing 64 neurons, followed by a fully connected layer that classifies inputs into 12 command categories: ‘Silence,’ ‘Unknown,’ ‘Down,’ ‘Go,’ ‘Left,’ ‘No,’ ‘Off,’ ‘On,’ ‘Right,’ ‘Stop,’ ‘Up,’ and ‘Yes.’ In this work, we also evaluate 11-class accuracy tan2023, excluding the ‘Unknown’ category.
  • Figure 3: The architecture of the $\Delta$RNN accelerator. The 24 kB weight memory provides 16 bit words, storing two 8 bit $\Delta$RNN weights.
  • Figure 4: The architecture of the serial time-domain IIR BPF-based FEx.
  • Figure 5: Low arithmetic complexity structure of the 4th order IIR BPF.
  • ...and 8 more figures