Table of Contents
Fetching ...

Is Phase Really Needed for Weakly-Supervised Dereverberation ?

Marius Rodrigues, Louis Bahrman, Roland Badeau, Gaël Richard

TL;DR

This paper addresses whether reverberant phase is informative in weakly supervised dereverberation. Using Statistical Wave Field Theory, it derives that late reverberation induces uniform, White-like phase perturbations in the frequency domain, implying that the wet phase carries little useful information. The authors validate this with a weakly supervised training setup, showing that phase-invariant losses improve dereverberation performance, particularly in SRMR and SISDR, and that phase reconstruction from reverberant phase can hinder results. The work suggests designing phase-invariant dereverberation systems and opens avenues for combining such approaches with phase retrieval as a subsequent step, enhancing robustness when dry-phase information is unavailable.

Abstract

In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time-frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.

Is Phase Really Needed for Weakly-Supervised Dereverberation ?

TL;DR

This paper addresses whether reverberant phase is informative in weakly supervised dereverberation. Using Statistical Wave Field Theory, it derives that late reverberation induces uniform, White-like phase perturbations in the frequency domain, implying that the wet phase carries little useful information. The authors validate this with a weakly supervised training setup, showing that phase-invariant losses improve dereverberation performance, particularly in SRMR and SISDR, and that phase reconstruction from reverberant phase can hinder results. The work suggests designing phase-invariant dereverberation systems and opens avenues for combining such approaches with phase retrieval as a subsequent step, enhancing robustness when dry-phase information is unavailable.

Abstract

In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time-frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.

Paper Structure

This paper contains 11 sections, 1 theorem, 11 equations, 1 figure, 1 table.

Key Result

Proposition 1

In formal words, if we consider the RIR $h$ to be randomly sampled according to the generalized Polack model, as in Section sec:stat_model, with parameters $\alpha(f)$ and $B(f)$, and $H$ its Fourier transform, then, asymptotically when $f\to+\infty$,

Figures (1)

  • Figure 1: Distribution of the Fourier coefficients of a synthetic RIR in the complex plane, at 3 different frequencies. To parametrize the RIR, $\alpha(f)$ and $B(f)$ are autoregressive (AR) profiles of order 8 whose poles are randomly chosen in the unit disk. Here, their means over $f$ are respectively $\bar{\alpha} \simeq 0.0113 s^{-1}$ (corresponding to a reverberation time of approximately $82~ ms$) and $\bar{B} \simeq 0.0029$

Theorems & Definitions (1)

  • Proposition 1