Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment
Mohamed F. Mansour
TL;DR
The paper tackles direction-of-arrival estimation for a single sound source in reverberant and noisy environments, focusing on embedded hardware constraints. It introduces a physics-based Acoustic Wave Decomposition (AWD) that maps microphone-array observations ${\mathbf{p}}(\omega; t)$ to directional components and uses a maximum-likelihood criterion that fuses delay-based and energy-based cues to estimate the azimuth $\hat{\phi}$. Key contributions include the combination of time-delay and energy likelihoods derived from AWD, geometry-agnostic localization via a device acoustic dictionary, and the explicit modeling of surface scattering to mitigate spatial aliasing; the method shows robustness across room conditions and array geometries with reduced dependence on denoising. Empirical results on two array configurations with about 55k utterances demonstrate about $6^{\circ}$ MAE at high SNR and clear improvements over SRP-PHAT and a DNN baseline, particularly in high-error regimes, indicating strong practical potential for embedded DoA systems.
Abstract
We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach is established with measured data of different microphone array configurations under various usage scenarios.
