Comparative Analysis Of Discriminative Deep Learning-Based Noise Reduction Methods In Low SNR Scenarios
Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel
TL;DR
The paper addresses the challenge of discriminative deep learning–based noise reduction in very low SNR environments by conducting a comprehensive comparison across training data regimes, loss functions, target estimation strategies, processing paradigms (masking, mapping, and deep filtering), and model capacity using the DNS Challenge data. It finds that loss functions focusing on different signal aspects, notably MS and MT, outperform SI-SDR and JL, and that direct speech estimation generally yields better results than indirect estimation, with DCCRN providing the strongest performance due to extensive temporal context. However, improvements degrade at SNRs below $-10$ dB when speech is heavily masked, indicating limitations of current discriminative methods and suggesting generative approaches as a future direction. The study provides practical guidance for selecting data, targets, losses, and architectures in low-SNR speech enhancement and highlights the need for generative techniques to handle the most challenging cases.
Abstract
In this study, we conduct a comparative analysis of deep learning-based noise reduction methods in low signal-to-noise ratio (SNR) scenarios. Our investigation primarily focuses on five key aspects: The impact of training data, the influence of various loss functions, the effectiveness of direct and indirect speech estimation techniques, the efficacy of masking, mapping, and deep filtering methodologies, and the exploration of different model capacities on noise reduction performance and speech quality. Through comprehensive experimentation, we provide insights into the strengths, weaknesses, and applicability of these methods in low SNR environments. The findings derived from our analysis are intended to assist both researchers and practitioners in selecting better techniques tailored to their specific applications within the domain of low SNR noise reduction.
