Sound event localization and classification using WASN in Outdoor Environment
Dongzhe Zhang, Jianfeng Chen, Jisheng Bai, Mou Wang, Dongyuan Shi, Qixiang Niu, Alberto Bernardini
TL;DR
The paper tackles outdoor sound event localization and classification by leveraging a multi-array WASN and a multitask CNN-Transformer model that fuses Soundmap, GTGram, and array coordinate features. It introduces novel soundmap and GTGram representations and a joint loss to simultaneously estimate location and class, achieving state-of-the-art SELC performance in simulated and real-world outdoor environments. The approach demonstrates robust performance across varying noise levels and array configurations, with practical edge-computing deployment and synchronized timing. This work advances scalable, accurate, and robust outdoor acoustic sensing for applications like wildlife monitoring and public safety.
Abstract
Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microphone arrays often focus solely on source localization, neglecting the aspect of sound event classification. In this paper, we propose a deep learning-based method that employs multiple features and attention mechanisms to estimate the location and class of sound source. We introduce a Soundmap feature to capture spatial information across multiple frequency bands. We also use the Gammatone filter to generate acoustic features more suitable for outdoor environments. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within the acoustic features. To evaluate our proposed method, we conduct experiments using simulated datasets with different levels of noise and size of monitoring areas, as well as different arrays and source positions. The experimental results demonstrate the superiority of our proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks. And we provide further analysis to explain the reasons for the observed errors.
