Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

Davide Berghi; Philip J. B. Jackson

Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

Davide Berghi, Philip J. B. Jackson

TL;DR

We address distance estimation in 3D SELD by introducing reverberation-based input features that capture distance cues from early reflections. Two feature families are proposed: DRR-based representations from direct and reverberant energy and an autocorrelation-based measure (stpACC) of early-floor reflections; both are designed to concatenate with standard SELD inputs. Pre-training on synthetic data and applying data augmentation improve distance accuracy and overall SELD performance, with autocorrelation-based features yielding the largest gains on STARSS23, reducing $RDE$ and improving $SELD$ across the dataset. The approach demonstrates that incorporating reverberation cues into input features can enhance 3D SELD when distance labels are available, offering a path toward more accurate spatial audio understanding in real rooms.

Abstract

Sound event localization and detection (SELD) involves predicting active sound event classes over time while estimating their positions. The localization subtask in SELD is usually treated as a direction of arrival estimation problem, ignoring source distance. Only recently, SELD was extended to 3D by incorporating distance estimation, enabling the prediction of sound event positions in 3D space (3D SELD). However, existing methods lack input features designed for distance estimation. We argue that reverberation encodes valuable information for this task. This paper introduces two novel feature formats for 3D SELD based on reverberation: one using direct-to-reverberant ratio (DRR) and another leveraging signal autocorrelation to provide the model with insights into early reflections. Pre-training on synthetic data improves relative distance error (RDE) and overall SELD score, with autocorrelation-based features reducing RDE by over 3 percentage points on the STARSS23 dataset. The code to extract the features is available at github.com/dberghi/SELD-distance-features.

Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

TL;DR

Abstract

Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)