Robust DOA estimation using deep acoustic imaging
Adrian S. Roman, Iran R. Roman, Juan P. Bello
TL;DR
The paper addresses DoAE using SIMs (spherical intensity maps) as input and tackles the limitation of low-resolution microphone arrays by introducing a complex-valued DBPN-based upsampling from 4ch to 32ch. It benchmarks SIM-based DoAE with DeepWave, incorporating a K-means post-processing pipeline and GRU-enhanced end-to-end models with ADPIT loss, validating on LOCATA and STARSS23 datasets. The results show that 32ch SIM-based DoAE can achieve competitive localization performance (e.g., $LE \,=\$ $14.8^\8$ and $LR \,=\$ $99.20$ on LOCATA), while GRU variants improve LE further, albeit with trade-offs in LR; upsampling 4ch inputs maintains potential when integrated into a DeepWave-like pipeline. Overall, the work demonstrates the relevance and practicality of acoustic imaging for DoAE and provides a scalable path to leverage SIMs on datasets with varying microphone counts.
Abstract
Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution microphone arrays, yet most DoAE datasets use low-resolution ones. Therefore, we first propose a super-resolution method to upsample low-resolution microphones. Next, we benchmark DoAE models that use SIMs as input. We arrive to a model that uses SIMs for DoAE estimation and outperforms a baseline and a state-of-the-art model. Our study highlights the relevance of acoustic imaging for DoAE tasks.
