Estimation of binary time-frequency masks from ambient noise
José Luis Romero, Michael Speckbacher
TL;DR
This work formalizes the intuition that a binary time-frequency mask can be recovered from ambient noise by analyzing the average spectrogram of filtered noise. The authors introduce a practical estimator based on the lower-quantile of the averaged spectrogram and prove that, under a finite-perimeter largeness condition on $\Omega$, the recovered mask $\widehat{\Omega}$ matches $\Omega$ up to a boundary layer with high probability, independent of the noise variance. They extend the results to real white noise and provide an expectation bound showing the reconstruction error scales with the boundary length $|\partial\Omega|$, with the error shrinking as $K$ grows. The analysis hinges on the spectral properties of the time-frequency localization operator $H_\Omega$, concentration of measure for quadratic forms, and reproducing-kernel techniques to control uniform deviations, yielding practical guidance for choosing $K$ and windows in ambient-noise scenarios.
Abstract
We investigate the retrieval of a binary time-frequency mask from a few observations of filtered white ambient noise. Confirming household wisdom in acoustic modeling, we show that this is possible by inspecting the average spectrogram of ambient noise. Specifically, we show that the lower quantile of the average of $\mathcal{O}(\log(|Ω|/\varepsilon))$ masked spectrograms is enough to identify a rather general mask $Ω$ with confidence at least $\varepsilon$, up to shape details concentrated near the boundary of $Ω$. As an application, the expected measure of the estimation error is dominated by the perimeter of the time-frequency mask. The estimator requires no knowledge of the noise variance, and only a very qualitative profile of the filtering window, but no exact knowledge of it.
