Table of Contents
Fetching ...

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Ui-Hyeop Shin, Jun Hyung Kim, Jangyeon Kim, Wooseok Kim, Hyung-Min Park

Abstract

Speech dereverberation in distant-microphone scenarios remains challenging due to the high correlation between reverberation and target signals, often leading to poor generalization in real-world environments. We propose IF-CorrNet, a correlation-to-filter architecture designed for robustness against acoustic variability. Unlike conventional black-box mapping methods that directly estimate complex spectra, IF-CorrNet explicitly exploits inter-frame STFT correlations to estimate multi-frame deep filters for each time-frequency bin. By shifting the learning objective from direct mapping to filter estimation, the network effectively constrains the solution space, which simplifies the training process and mitigates overfitting to synthetic data. Experimental results on the REVERB Challenge dataset demonstrate that IF-CorrNet achieves a substantial gain in the SRMR metric on RealData, confirming its robustness in suppressing reverberation and noise in practical, non-synthetic environments.

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Abstract

Speech dereverberation in distant-microphone scenarios remains challenging due to the high correlation between reverberation and target signals, often leading to poor generalization in real-world environments. We propose IF-CorrNet, a correlation-to-filter architecture designed for robustness against acoustic variability. Unlike conventional black-box mapping methods that directly estimate complex spectra, IF-CorrNet explicitly exploits inter-frame STFT correlations to estimate multi-frame deep filters for each time-frequency bin. By shifting the learning objective from direct mapping to filter estimation, the network effectively constrains the solution space, which simplifies the training process and mitigates overfitting to synthetic data. Experimental results on the REVERB Challenge dataset demonstrate that IF-CorrNet achieves a substantial gain in the SRMR metric on RealData, confirming its robustness in suppressing reverberation and noise in practical, non-synthetic environments.
Paper Structure (15 sections, 4 equations, 3 figures, 4 tables)

This paper contains 15 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overall architecture of IF-CorrNet. Inter-frame correlations are processed through input layer, and frequency and time modules to estimate multi-frame filters.
  • Figure 2: Plot of (a) PESQ and (b) SRMR results on REVERB Challenge dataset depending on the number of taps $L$.
  • Figure 3: Spectrogram of sample utterance of RealData on REVERB Challenge Dataset: (a) Input and output from (b) IF-Corr + MF-Filter, (c) SF-Raw + MF-Filter, and (d) SF-Raw + SF-Mask.