Table of Contents
Fetching ...

ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors

Jongin Choi, Jina Park, Woojoo Lee, Jae-Jin Lee, Massoud Pedram

TL;DR

Multi-channel keyword spotting on edge devices suffers from high compute and energy demands. The authors propose ASAP-FE, a front-end architecture that combines half-overlapped IIR framing, sparsity-aware data reduction, and dynamic parallel processing to enable real-time processing across many channels. The approach achieves about a 62.7% reduction in FE workload with less than 1% accuracy loss across three KWS models, and identifies an energy-optimal configuration (15 parallel filters) for up to 25 channels. FPGA prototyping and 45 nm synthesis demonstrate real-time capability up to 32 channels under tight power budgets, highlighting ASAP-FE’s practical impact for edge AI in IoT and surveillance.

Abstract

Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges. We introduce ASAP-FE (Agile Sparsity-Aware Parallelized-Feature Extractor), a hardware-oriented front-end designed to address these challenges. Our framework incorporates three key innovations: (1) Half-overlapped Infinite Impulse Response (IIR) Framing: This reduces redundant data by approximately 25% while maintaining essential phoneme transition cues. (2) Sparsity-aware Data Reduction: We exploit frame-level sparsity to achieve an additional 50% data reduction by combining frame skipping with stride-based filtering. (3) Dynamic Parallel Processing: We introduce a parameterizable filter cluster and a priority-based scheduling algorithm that allows parallel execution of IIR filtering tasks, reducing latency and optimizing energy efficiency. ASAP-FE is implemented with various filter cluster sizes on edge processors, with functionality verified on FPGA prototypes and designs synthesized at 45 nm. Experimental results using TC-ResNet8, DS-CNN, and KWT-1 demonstrate that ASAP-FE reduces the average workload by 62.73% while supporting real-time processing for up to 32 channels. Compared to a conventional fully overlapped baseline, ASAP-FE achieves less than a 1% accuracy drop (e.g., 96.22% vs. 97.13% for DS-CNN), which is well within acceptable limits for edge AI. By adjusting the number of filter modules, our design optimizes the trade-off between performance and energy, with 15 parallel filters providing optimal performance for up to 25 channels. Overall, ASAP-FE offers a practical and efficient solution for multi-channel KWS on energy-constrained edge devices.

ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors

TL;DR

Multi-channel keyword spotting on edge devices suffers from high compute and energy demands. The authors propose ASAP-FE, a front-end architecture that combines half-overlapped IIR framing, sparsity-aware data reduction, and dynamic parallel processing to enable real-time processing across many channels. The approach achieves about a 62.7% reduction in FE workload with less than 1% accuracy loss across three KWS models, and identifies an energy-optimal configuration (15 parallel filters) for up to 25 channels. FPGA prototyping and 45 nm synthesis demonstrate real-time capability up to 32 channels under tight power budgets, highlighting ASAP-FE’s practical impact for edge AI in IoT and surveillance.

Abstract

Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges. We introduce ASAP-FE (Agile Sparsity-Aware Parallelized-Feature Extractor), a hardware-oriented front-end designed to address these challenges. Our framework incorporates three key innovations: (1) Half-overlapped Infinite Impulse Response (IIR) Framing: This reduces redundant data by approximately 25% while maintaining essential phoneme transition cues. (2) Sparsity-aware Data Reduction: We exploit frame-level sparsity to achieve an additional 50% data reduction by combining frame skipping with stride-based filtering. (3) Dynamic Parallel Processing: We introduce a parameterizable filter cluster and a priority-based scheduling algorithm that allows parallel execution of IIR filtering tasks, reducing latency and optimizing energy efficiency. ASAP-FE is implemented with various filter cluster sizes on edge processors, with functionality verified on FPGA prototypes and designs synthesized at 45 nm. Experimental results using TC-ResNet8, DS-CNN, and KWT-1 demonstrate that ASAP-FE reduces the average workload by 62.73% while supporting real-time processing for up to 32 channels. Compared to a conventional fully overlapped baseline, ASAP-FE achieves less than a 1% accuracy drop (e.g., 96.22% vs. 97.13% for DS-CNN), which is well within acceptable limits for edge AI. By adjusting the number of filter modules, our design optimizes the trade-off between performance and energy, with 15 parallel filters providing optimal performance for up to 25 channels. Overall, ASAP-FE offers a practical and efficient solution for multi-channel KWS on energy-constrained edge devices.

Paper Structure

This paper contains 10 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: (a) Mono-channel IIR-FBank: only one audio stream per $T_{audio}$, and (b) Shorter FE for multi-channel KWS: multiple channels within the same $T_{audio}$.
  • Figure 2: Conceptual overview of the proposed ASAP-FE.
  • Figure 3: Examples of streaming and framing methods.
  • Figure 4: Impact of different framing methods on KWS accuracy across various filter bank configurations.
  • Figure 5: Effect of the frame skipping threshold $th_1$ on KWS accuracy and remaining frames.
  • ...and 6 more figures