Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features

Hanyu Meng; Jeroen Breebaart; Jeremy Stoddard; Vidhyasaharan Sethu; Eliathamby Ambikairajah

Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features

Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, Vidhyasaharan Sethu, Eliathamby Ambikairajah

TL;DR

The paper tackles blind, frequency-dependent estimation of room acoustic parameters from FOA recordings, proposing the Spectro-Spatial Covariance Vector (SSCV) to encode temporal, spectral, and inter-channel information. A novel FOA-Conv3D back-end utilizes 3D convolutions over SSCV features to jointly capture time, frequency, and spatial cues, achieving lower errors and higher variance explained than single-channel approaches across 10 bands for $T_{60}$, DRR, and $C_{50}$. Evaluations on Spatial Librispeech-Lite show significant improvements with spatial information and recurrent architectures, establishing a new state-of-the-art for FOA-based blind estimation of frequency-varying acoustic parameters. The work enables more faithful, dynamic spatial audio rendering for VR/AR by providing robust multi-band estimates from FOA inputs and suggests avenues to extend to geometry and source orientation in future work.

Abstract

Estimating frequency-varying acoustic parameters is essential for enhancing immersive perception in realistic spatial audio creation. In this paper, we propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR), and clarity (C50) across 10 frequency bands using first-order Ambisonics (FOA) speech recordings as inputs. The proposed framework utilizes a novel feature named Spectro-Spatial Covariance Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal. Our models significantly outperform existing single-channel methods with only spectral information, reducing estimation errors by more than half for all three acoustic parameters. Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder. FOA-Conv3D outperforms the convolutional neural network (CNN) and recurrent convolutional neural network (CRNN) backends, achieving lower estimation errors and accounting for a higher proportion of variance (PoV) for all 3 acoustic parameters.

Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features

TL;DR

Abstract

Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)