Table of Contents
Fetching ...

Robust DOA estimation using deep acoustic imaging

Adrian S. Roman, Iran R. Roman, Juan P. Bello

TL;DR

The paper addresses DoAE using SIMs (spherical intensity maps) as input and tackles the limitation of low-resolution microphone arrays by introducing a complex-valued DBPN-based upsampling from 4ch to 32ch. It benchmarks SIM-based DoAE with DeepWave, incorporating a K-means post-processing pipeline and GRU-enhanced end-to-end models with ADPIT loss, validating on LOCATA and STARSS23 datasets. The results show that 32ch SIM-based DoAE can achieve competitive localization performance (e.g., $LE \,=\$ $14.8^\8$ and $LR \,=\$ $99.20$ on LOCATA), while GRU variants improve LE further, albeit with trade-offs in LR; upsampling 4ch inputs maintains potential when integrated into a DeepWave-like pipeline. Overall, the work demonstrates the relevance and practicality of acoustic imaging for DoAE and provides a scalable path to leverage SIMs on datasets with varying microphone counts.

Abstract

Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution microphone arrays, yet most DoAE datasets use low-resolution ones. Therefore, we first propose a super-resolution method to upsample low-resolution microphones. Next, we benchmark DoAE models that use SIMs as input. We arrive to a model that uses SIMs for DoAE estimation and outperforms a baseline and a state-of-the-art model. Our study highlights the relevance of acoustic imaging for DoAE tasks.

Robust DOA estimation using deep acoustic imaging

TL;DR

The paper addresses DoAE using SIMs (spherical intensity maps) as input and tackles the limitation of low-resolution microphone arrays by introducing a complex-valued DBPN-based upsampling from 4ch to 32ch. It benchmarks SIM-based DoAE with DeepWave, incorporating a K-means post-processing pipeline and GRU-enhanced end-to-end models with ADPIT loss, validating on LOCATA and STARSS23 datasets. The results show that 32ch SIM-based DoAE can achieve competitive localization performance (e.g., and on LOCATA), while GRU variants improve LE further, albeit with trade-offs in LR; upsampling 4ch inputs maintains potential when integrated into a DeepWave-like pipeline. Overall, the work demonstrates the relevance and practicality of acoustic imaging for DoAE and provides a scalable path to leverage SIMs on datasets with varying microphone counts.

Abstract

Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution microphone arrays, yet most DoAE datasets use low-resolution ones. Therefore, we first propose a super-resolution method to upsample low-resolution microphones. Next, we benchmark DoAE models that use SIMs as input. We arrive to a model that uses SIMs for DoAE estimation and outperforms a baseline and a state-of-the-art model. Our study highlights the relevance of acoustic imaging for DoAE tasks.
Paper Structure (13 sections, 1 equation, 3 figures, 2 tables)

This paper contains 13 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The acoustic image by DeepWave (top) and DASB (bottom) for two sound sources. Dots denote ground truth.
  • Figure 2: Signal pipeline in the models we study.
  • Figure 3: CDBPN upsampling of a single reverberant white noise source directly facing the front of the microphone.