Table of Contents
Fetching ...

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain

Pengyu Wang, Xiaofei Li

TL;DR

Rec-RIR tackles monaural blind room impulse response identification by leveraging the convolutive transfer function (CTF) approximation in the STFT domain. It introduces an end-to-end DNN with cross-band and narrow-band blocks that maps the input reverberant spectrum to a fixed-length CTF filter, trained through reconstruction of noise-free reverberant spectra and aided by auxiliary losses. A pseudo intrusive measurement process then converts the estimated CTF into a time-domain RIR, enabling direct comparison with intrusive measurements. Experimental results demonstrate state-of-the-art accuracy in RIR estimation and acoustic-parameter metrics (RT60, DRR, C50), with robust performance on long recordings and open-source code available for reproducibility.

Abstract

This paper presents Rec-RIR for monaural blind room impulse response (RIR) identification. Rec-RIR is developed based on the convolutive transfer function (CTF) approximation, which models reverberation effect within narrow-band filter banks in the short-time Fourier transform domain. Specifically, we propose a deep neural network (DNN) with cross-band and narrow-band blocks to estimate the CTF filter. The DNN is trained through reconstructing the noise-free reverberant speech spectra. This objective enables stable and straightforward supervised training. Subsequently, a pseudo intrusive measurement process is employed to convert the CTF filter estimate into RIR by simulating a common intrusive RIR measurement procedure. Experimental results demonstrate that Rec-RIR achieves state-of-the-art performance in both RIR identification and acoustic parameter estimation. Open-source codes are available online at https://github.com/Audio-WestlakeU/Rec-RIR.

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain

TL;DR

Rec-RIR tackles monaural blind room impulse response identification by leveraging the convolutive transfer function (CTF) approximation in the STFT domain. It introduces an end-to-end DNN with cross-band and narrow-band blocks that maps the input reverberant spectrum to a fixed-length CTF filter, trained through reconstruction of noise-free reverberant spectra and aided by auxiliary losses. A pseudo intrusive measurement process then converts the estimated CTF into a time-domain RIR, enabling direct comparison with intrusive measurements. Experimental results demonstrate state-of-the-art accuracy in RIR estimation and acoustic-parameter metrics (RT60, DRR, C50), with robust performance on long recordings and open-source code available for reproducibility.

Abstract

This paper presents Rec-RIR for monaural blind room impulse response (RIR) identification. Rec-RIR is developed based on the convolutive transfer function (CTF) approximation, which models reverberation effect within narrow-band filter banks in the short-time Fourier transform domain. Specifically, we propose a deep neural network (DNN) with cross-band and narrow-band blocks to estimate the CTF filter. The DNN is trained through reconstructing the noise-free reverberant speech spectra. This objective enables stable and straightforward supervised training. Subsequently, a pseudo intrusive measurement process is employed to convert the CTF filter estimate into RIR by simulating a common intrusive RIR measurement procedure. Experimental results demonstrate that Rec-RIR achieves state-of-the-art performance in both RIR identification and acoustic parameter estimation. Open-source codes are available online at https://github.com/Audio-WestlakeU/Rec-RIR.

Paper Structure

This paper contains 15 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Workflow of Rec-RIR.
  • Figure 2: Architecture of the proposed network.
  • Figure 3: Example of ground truth and estimated RIRs.
  • Figure 4: 2-D projections of CTF embeddings on SimACE.