Table of Contents
Fetching ...

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

Mohamed F. Mansour

TL;DR

The paper tackles direction-of-arrival estimation for a single sound source in reverberant and noisy environments, focusing on embedded hardware constraints. It introduces a physics-based Acoustic Wave Decomposition (AWD) that maps microphone-array observations ${\mathbf{p}}(\omega; t)$ to directional components and uses a maximum-likelihood criterion that fuses delay-based and energy-based cues to estimate the azimuth $\hat{\phi}$. Key contributions include the combination of time-delay and energy likelihoods derived from AWD, geometry-agnostic localization via a device acoustic dictionary, and the explicit modeling of surface scattering to mitigate spatial aliasing; the method shows robustness across room conditions and array geometries with reduced dependence on denoising. Empirical results on two array configurations with about 55k utterances demonstrate about $6^{\circ}$ MAE at high SNR and clear improvements over SRP-PHAT and a DNN baseline, particularly in high-error regimes, indicating strong practical potential for embedded DoA systems.

Abstract

We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach is established with measured data of different microphone array configurations under various usage scenarios.

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

TL;DR

The paper tackles direction-of-arrival estimation for a single sound source in reverberant and noisy environments, focusing on embedded hardware constraints. It introduces a physics-based Acoustic Wave Decomposition (AWD) that maps microphone-array observations to directional components and uses a maximum-likelihood criterion that fuses delay-based and energy-based cues to estimate the azimuth . Key contributions include the combination of time-delay and energy likelihoods derived from AWD, geometry-agnostic localization via a device acoustic dictionary, and the explicit modeling of surface scattering to mitigate spatial aliasing; the method shows robustness across room conditions and array geometries with reduced dependence on denoising. Empirical results on two array configurations with about 55k utterances demonstrate about MAE at high SNR and clear improvements over SRP-PHAT and a DNN baseline, particularly in high-error regimes, indicating strong practical potential for embedded DoA systems.

Abstract

We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach is established with measured data of different microphone array configurations under various usage scenarios.

Paper Structure

This paper contains 10 sections, 13 equations, 2 figures.

Figures (2)

  • Figure 1: Mean Absolute Error in degrees of the proposed algorithm for microphone arrays of size $8$ and $4$
  • Figure 2: Cumulative density function of the absolute error for proposed algorithm vs (a.) SRP-PHAT with 8-mic, (b) CRNN-SSL with 4-mic