Table of Contents
Fetching ...

Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge

Sanggeon Yun, Ryozo Masukawa, Hanning Chen, SungHeon Jeong, Wenjun Huang, Arghavan Rezvani, Minhyoung Na, Yoshiki Yamaguchi, Mohsen Imani

TL;DR

This work tackles the energy and bandwidth challenges of real-time audio sensing at the extreme edge by proposing a near-sensor framework that fuses FFT-based feature extraction, lightweight CNNs, and Hyperdimensional Computing (HDC) to enable online learning and selective data transmission. The approach is designed for ASIC-friendly deployment, colocated with microphones, and leverages a sparse selective strategy to forward only audio-of-interest to the cloud; 8-bit quantized Edge TPU implementations illustrate substantial energy advantages over conventional CPUs/GPUs, with energy savings up to $82.1\%$ and quality loss of $1.39\%$ in ROC analyses, aided by $D=10{,}000$-dimensional hypervectors and a threshold $T_{score}$. Key contributions include the first near-sensor energy-efficient framework for audio-of-interest detection, a ROC-based trade-off analysis, and hardware-level gains via ASIC acceleration. The results indicate a practical path toward scalable, low-energy edge audio sensing for applications like gunshot detection and urban sound monitoring, reducing cloud processing needs while maintaining robust performance, thanks to rapid online adaptation enabled by HDC.

Abstract

The escalating challenges of managing vast sensor-generated data, particularly in audio applications, necessitate innovative solutions. Current systems face significant computational and storage demands, especially in real-time applications like gunshot detection systems (GSDS), and the proliferation of edge sensors exacerbates these issues. This paper proposes a groundbreaking approach with a near-sensor model tailored for intelligent audio-sensing frameworks. Utilizing a Fast Fourier Transform (FFT) module, convolutional neural network (CNN) layers, and HyperDimensional Computing (HDC), our model excels in low-energy, rapid inference, and online learning. It is highly adaptable for efficient ASIC design implementation, offering superior energy efficiency compared to conventional embedded CPUs or GPUs, and is compatible with the trend of shrinking microphone sensor sizes. Comprehensive evaluations at both software and hardware levels underscore the model's efficacy. Software assessments through detailed ROC curve analysis revealed a delicate balance between energy conservation and quality loss, achieving up to 82.1% energy savings with only 1.39% quality loss. Hardware evaluations highlight the model's commendable energy efficiency when implemented via ASIC design, especially with the Google Edge TPU, showcasing its superiority over prevalent embedded CPUs and GPUs.

Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge

TL;DR

This work tackles the energy and bandwidth challenges of real-time audio sensing at the extreme edge by proposing a near-sensor framework that fuses FFT-based feature extraction, lightweight CNNs, and Hyperdimensional Computing (HDC) to enable online learning and selective data transmission. The approach is designed for ASIC-friendly deployment, colocated with microphones, and leverages a sparse selective strategy to forward only audio-of-interest to the cloud; 8-bit quantized Edge TPU implementations illustrate substantial energy advantages over conventional CPUs/GPUs, with energy savings up to and quality loss of in ROC analyses, aided by -dimensional hypervectors and a threshold . Key contributions include the first near-sensor energy-efficient framework for audio-of-interest detection, a ROC-based trade-off analysis, and hardware-level gains via ASIC acceleration. The results indicate a practical path toward scalable, low-energy edge audio sensing for applications like gunshot detection and urban sound monitoring, reducing cloud processing needs while maintaining robust performance, thanks to rapid online adaptation enabled by HDC.

Abstract

The escalating challenges of managing vast sensor-generated data, particularly in audio applications, necessitate innovative solutions. Current systems face significant computational and storage demands, especially in real-time applications like gunshot detection systems (GSDS), and the proliferation of edge sensors exacerbates these issues. This paper proposes a groundbreaking approach with a near-sensor model tailored for intelligent audio-sensing frameworks. Utilizing a Fast Fourier Transform (FFT) module, convolutional neural network (CNN) layers, and HyperDimensional Computing (HDC), our model excels in low-energy, rapid inference, and online learning. It is highly adaptable for efficient ASIC design implementation, offering superior energy efficiency compared to conventional embedded CPUs or GPUs, and is compatible with the trend of shrinking microphone sensor sizes. Comprehensive evaluations at both software and hardware levels underscore the model's efficacy. Software assessments through detailed ROC curve analysis revealed a delicate balance between energy conservation and quality loss, achieving up to 82.1% energy savings with only 1.39% quality loss. Hardware evaluations highlight the model's commendable energy efficiency when implemented via ASIC design, especially with the Google Edge TPU, showcasing its superiority over prevalent embedded CPUs and GPUs.

Paper Structure

This paper contains 15 sections, 8 figures.

Figures (8)

  • Figure 1: Overview of our Hyperdimensional Intelligent Sensing pipeline. The "sparse selective strategy" is applied at the near-sensor stage, where only audio segments identified as audio-of-interest are transmitted to the cloud.
  • Figure 2: Overview of our audio detection framework for Hyperdimensional Intelligent Sensing. The audio detection training consists of three phases: (a) Offline learning, (b) Offline trained near-sensor model deployment, and (c) Online learning based on a costly machine learning model. After training CNN layers for feature extraction, the HDC encoding transforms extracted features into hypervectors, forming class hypervectors without any traditional MLP layers or activation functions.
  • Figure 3: Performance analysis by model size with hyperdimension of $D=10K$. Left: Receiver Operating Characteristic (ROC) curve analysis with varied feature extraction layers. Right: Area Under the Curve (AUC) analysis also with the same range of feature extraction layers.
  • Figure 4: Test F1 score comparison between HDC with online learning and with MLP layer which is hard to support online learning.
  • Figure 5: Energy consumption estimation by different near sensor model size
  • ...and 3 more figures