Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme
Saeid Haghighatshoar, Dylan R Muir
TL;DR
This work addresses low-power direction-of-arrival estimation for wideband audio on sensor arrays by replacing dense narrowband filtering with a Hilbert-transform–based beamforming framework that exploits the analytic-signal phase. It introduces online Short-Time Hilbert Transform (STHT) and Robust Zero-Crossing Conjugate (RZCC) spike encoding to realize real-time, energy-efficient DoA estimation within spiking neural networks, and proves an equivalence between real-valued and complex-valued beamformers to simplify hardware implementations. The approach achieves high DoA accuracy on noisy wideband signals and speech, demonstrates state-of-the-art SNN performance, and demonstrates deployment on ultra-low-power hardware (Xylo) with milli-watt power envelopes. Comparisons with MUSIC show competitive accuracy while reducing computational burden by avoiding per-band filterbanks, enabling practical ultra-low-power audio localization in IoT devices. The results suggest a co-design path where Hilbert-transform–based DSP and SNN hardware coalesce to deliver accurate, energy-efficient audio localization for diverse microphone geometries.
Abstract
Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise. Localization is frequently accomplished by ``beamforming'', which combines phase-shifted audio streams to increase power from chosen source directions, under a known microphone array geometry. Dense band-pass filters are often needed to obtain narrowband signal components from wideband audio. These approaches achieve high accuracy, but narrowband beamforming is computationally demanding, and not ideal for low-power IoT devices. We demonstrate a novel method for sound source localisation on arbitrary microphone arrays, designed for efficient implementation in ultra-low-power spiking neural networks (SNNs). We use a Hilbert transform to avoid dense band-pass filters, and introduce a new event-based encoding method that captures the phase of the complex analytic signal. Our approach achieves state-of-the-art accuracy for SNN methods, comparable with traditional non-SNN super-resolution beamforming. We deploy our method to low-power SNN inference hardware, with much lower power consumption than super-resolution methods. We demonstrate that signal processing approaches co-designed with spiking neural network implementations can achieve much improved power efficiency. Our new Hilbert-transform-based method for beamforming can also improve the efficiency of traditional DSP-based signal processing.
