Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

Mohamed F. Mansour

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

Mohamed F. Mansour

TL;DR

The paper tackles direction-of-arrival estimation for a single sound source in reverberant and noisy environments, focusing on embedded hardware constraints. It introduces a physics-based Acoustic Wave Decomposition (AWD) that maps microphone-array observations ${\mathbf{p}}(\omega; t)$ to directional components and uses a maximum-likelihood criterion that fuses delay-based and energy-based cues to estimate the azimuth $\hat{\phi}$. Key contributions include the combination of time-delay and energy likelihoods derived from AWD, geometry-agnostic localization via a device acoustic dictionary, and the explicit modeling of surface scattering to mitigate spatial aliasing; the method shows robustness across room conditions and array geometries with reduced dependence on denoising. Empirical results on two array configurations with about 55k utterances demonstrate about $6^{\circ}$ MAE at high SNR and clear improvements over SRP-PHAT and a DNN baseline, particularly in high-error regimes, indicating strong practical potential for embedded DoA systems.

Abstract

We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach is established with measured data of different microphone array configurations under various usage scenarios.

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

TL;DR

to directional components and uses a maximum-likelihood criterion that fuses delay-based and energy-based cues to estimate the azimuth

. Key contributions include the combination of time-delay and energy likelihoods derived from AWD, geometry-agnostic localization via a device acoustic dictionary, and the explicit modeling of surface scattering to mitigate spatial aliasing; the method shows robustness across room conditions and array geometries with reduced dependence on denoising. Empirical results on two array configurations with about 55k utterances demonstrate about

MAE at high SNR and clear improvements over SRP-PHAT and a DNN baseline, particularly in high-error regimes, indicating strong practical potential for embedded DoA systems.

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

TL;DR

Abstract

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)