Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
Pengyu Wang, Xiaofei Li
TL;DR
Rec-RIR tackles monaural blind room impulse response identification by leveraging the convolutive transfer function (CTF) approximation in the STFT domain. It introduces an end-to-end DNN with cross-band and narrow-band blocks that maps the input reverberant spectrum to a fixed-length CTF filter, trained through reconstruction of noise-free reverberant spectra and aided by auxiliary losses. A pseudo intrusive measurement process then converts the estimated CTF into a time-domain RIR, enabling direct comparison with intrusive measurements. Experimental results demonstrate state-of-the-art accuracy in RIR estimation and acoustic-parameter metrics (RT60, DRR, C50), with robust performance on long recordings and open-source code available for reproducibility.
Abstract
This paper presents Rec-RIR for monaural blind room impulse response (RIR) identification. Rec-RIR is developed based on the convolutive transfer function (CTF) approximation, which models reverberation effect within narrow-band filter banks in the short-time Fourier transform domain. Specifically, we propose a deep neural network (DNN) with cross-band and narrow-band blocks to estimate the CTF filter. The DNN is trained through reconstructing the noise-free reverberant speech spectra. This objective enables stable and straightforward supervised training. Subsequently, a pseudo intrusive measurement process is employed to convert the CTF filter estimate into RIR by simulating a common intrusive RIR measurement procedure. Experimental results demonstrate that Rec-RIR achieves state-of-the-art performance in both RIR identification and acoustic parameter estimation. Open-source codes are available online at https://github.com/Audio-WestlakeU/Rec-RIR.
