Table of Contents
Fetching ...

Comparative Analysis Of Discriminative Deep Learning-Based Noise Reduction Methods In Low SNR Scenarios

Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel

TL;DR

The paper addresses the challenge of discriminative deep learning–based noise reduction in very low SNR environments by conducting a comprehensive comparison across training data regimes, loss functions, target estimation strategies, processing paradigms (masking, mapping, and deep filtering), and model capacity using the DNS Challenge data. It finds that loss functions focusing on different signal aspects, notably MS and MT, outperform SI-SDR and JL, and that direct speech estimation generally yields better results than indirect estimation, with DCCRN providing the strongest performance due to extensive temporal context. However, improvements degrade at SNRs below $-10$ dB when speech is heavily masked, indicating limitations of current discriminative methods and suggesting generative approaches as a future direction. The study provides practical guidance for selecting data, targets, losses, and architectures in low-SNR speech enhancement and highlights the need for generative techniques to handle the most challenging cases.

Abstract

In this study, we conduct a comparative analysis of deep learning-based noise reduction methods in low signal-to-noise ratio (SNR) scenarios. Our investigation primarily focuses on five key aspects: The impact of training data, the influence of various loss functions, the effectiveness of direct and indirect speech estimation techniques, the efficacy of masking, mapping, and deep filtering methodologies, and the exploration of different model capacities on noise reduction performance and speech quality. Through comprehensive experimentation, we provide insights into the strengths, weaknesses, and applicability of these methods in low SNR environments. The findings derived from our analysis are intended to assist both researchers and practitioners in selecting better techniques tailored to their specific applications within the domain of low SNR noise reduction.

Comparative Analysis Of Discriminative Deep Learning-Based Noise Reduction Methods In Low SNR Scenarios

TL;DR

The paper addresses the challenge of discriminative deep learning–based noise reduction in very low SNR environments by conducting a comprehensive comparison across training data regimes, loss functions, target estimation strategies, processing paradigms (masking, mapping, and deep filtering), and model capacity using the DNS Challenge data. It finds that loss functions focusing on different signal aspects, notably MS and MT, outperform SI-SDR and JL, and that direct speech estimation generally yields better results than indirect estimation, with DCCRN providing the strongest performance due to extensive temporal context. However, improvements degrade at SNRs below dB when speech is heavily masked, indicating limitations of current discriminative methods and suggesting generative approaches as a future direction. The study provides practical guidance for selecting data, targets, losses, and architectures in low-SNR speech enhancement and highlights the need for generative techniques to handle the most challenging cases.

Abstract

In this study, we conduct a comparative analysis of deep learning-based noise reduction methods in low signal-to-noise ratio (SNR) scenarios. Our investigation primarily focuses on five key aspects: The impact of training data, the influence of various loss functions, the effectiveness of direct and indirect speech estimation techniques, the efficacy of masking, mapping, and deep filtering methodologies, and the exploration of different model capacities on noise reduction performance and speech quality. Through comprehensive experimentation, we provide insights into the strengths, weaknesses, and applicability of these methods in low SNR environments. The findings derived from our analysis are intended to assist both researchers and practitioners in selecting better techniques tailored to their specific applications within the domain of low SNR noise reduction.
Paper Structure (8 sections, 6 equations, 4 figures, 1 table)

This paper contains 8 sections, 6 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: PESQ and SI-SDR improvement for the DCCRN model trained on both high (dotted line) and low (solid line) SNR datasets, employing various loss functions. Please note that the legends are labeled according to the model's loss function and training dataset.
  • Figure 2: PESQ improvement for the DCCRN model trained with the low SNR training dataset for indirect (dotted line) and direct (solid line) speech estimation with mapping, CRM and DF approaches.
  • Figure 3: PESQ improvement for different SOTA models trained with the low SNR training dataset for direct speech estimation using CRM masking method (models are ordered in terms of ascending MACS).
  • Figure 4: Example of a (a) clean speech signal (b) masked by strong noise at -14 dB SNR and (c,d) estimated clean speech signal.