Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge
Sanggeon Yun, Ryozo Masukawa, Hanning Chen, SungHeon Jeong, Wenjun Huang, Arghavan Rezvani, Minhyoung Na, Yoshiki Yamaguchi, Mohsen Imani
TL;DR
This work tackles the energy and bandwidth challenges of real-time audio sensing at the extreme edge by proposing a near-sensor framework that fuses FFT-based feature extraction, lightweight CNNs, and Hyperdimensional Computing (HDC) to enable online learning and selective data transmission. The approach is designed for ASIC-friendly deployment, colocated with microphones, and leverages a sparse selective strategy to forward only audio-of-interest to the cloud; 8-bit quantized Edge TPU implementations illustrate substantial energy advantages over conventional CPUs/GPUs, with energy savings up to $82.1\%$ and quality loss of $1.39\%$ in ROC analyses, aided by $D=10{,}000$-dimensional hypervectors and a threshold $T_{score}$. Key contributions include the first near-sensor energy-efficient framework for audio-of-interest detection, a ROC-based trade-off analysis, and hardware-level gains via ASIC acceleration. The results indicate a practical path toward scalable, low-energy edge audio sensing for applications like gunshot detection and urban sound monitoring, reducing cloud processing needs while maintaining robust performance, thanks to rapid online adaptation enabled by HDC.
Abstract
The escalating challenges of managing vast sensor-generated data, particularly in audio applications, necessitate innovative solutions. Current systems face significant computational and storage demands, especially in real-time applications like gunshot detection systems (GSDS), and the proliferation of edge sensors exacerbates these issues. This paper proposes a groundbreaking approach with a near-sensor model tailored for intelligent audio-sensing frameworks. Utilizing a Fast Fourier Transform (FFT) module, convolutional neural network (CNN) layers, and HyperDimensional Computing (HDC), our model excels in low-energy, rapid inference, and online learning. It is highly adaptable for efficient ASIC design implementation, offering superior energy efficiency compared to conventional embedded CPUs or GPUs, and is compatible with the trend of shrinking microphone sensor sizes. Comprehensive evaluations at both software and hardware levels underscore the model's efficacy. Software assessments through detailed ROC curve analysis revealed a delicate balance between energy conservation and quality loss, achieving up to 82.1% energy savings with only 1.39% quality loss. Hardware evaluations highlight the model's commendable energy efficiency when implemented via ASIC design, especially with the Google Edge TPU, showcasing its superiority over prevalent embedded CPUs and GPUs.
