An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement
Pin-Jui Ku, Chun-Wei Ho, Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee
TL;DR
The paper addresses phase reconstruction and speech enhancement by introducing an explicit consistency-preserving loss that enforces magnitude–phase consistency via the STFT-consistency condition, rather than forcing estimation of the original phase. The core idea is to train models to produce spectrograms that are consistent with their inverse STFT, using the loss $L_{EC}$ that operates on complex spectrograms and does not depend on ground-truth phase. Across PR and SE tasks on VB-DMD and WSJ0-CHiME3 with TF-GridNet, the proposed loss yields competitive or superior results compared to conventional phase losses, particularly in challenging, low-SNR conditions. This approach broadens the solution space for phase estimation by prioritizing internal consistency over exact phase recovery, with practical impact for improving speech quality without requiring precise phase reconstruction.
Abstract
In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phase using a deep model, our idea is to exploit ad-hoc constraints to directly generate a consistent pair of magnitude and phase. Specifically, the proposed loss forces a set of complex numbers to be a consistent short-time Fourier transform (STFT) representation, i.e., to be the spectrogram of a real signal. Our approach thus avoids the difficulty of estimating the original phase, which is highly unstructured and sensitive to time shift. The influence of our proposed loss is first assessed on a PR task, experimentally demonstrating that our approach is viable. Next, we show its effectiveness on an SE task, using both the VB-DMD and WSJ0-CHiME3 data sets. On VB-DMD, our approach is competitive with conventional solutions. On the challenging WSJ0-CHiME3 set, the proposed framework compares favourably over those techniques that explicitly estimate the phase.
