Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Iran R. Roman; Christopher Ick; Sivan Ding; Adrian S. Roman; Brian McFee; Juan P. Bello

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

TL;DR

This work tackles data scarcity and limited acoustic diversity in sound event localization and detection (SELD) by introducing SpatialScaper, a library that enables parametric simulation of virtual rooms and moving sound sources, as well as augmentation of existing SELD datasets. The approach leverages both real and synthetic room impulse responses (RIRs) and an ambisonics-enabled pipeline to generate richly varied SELD data at scale. Case studies show that increasing acoustic diversity in training data improves localization error (LE) and overall robustness, with augmentations such as channel swapping further enhancing performance. SpatialScaper thus offers a practical, open-source solution for building robust SELD models applicable to real-world acoustic environments.

Abstract

Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

TL;DR

Abstract

Paper Structure (15 sections, 4 figures, 1 table)

This paper contains 15 sections, 4 figures, 1 table.

Introduction
Related work
Spatial Scaper
Instantiating a room scape
Adding background noise
Spatializing target events
Triggering the room scape generation
Augmenting existing SELD recordings
Case study: improving SELDnet
Model, training procedure, and metrics
Exp 1: Adding acoustic diversity to the training data
Exp 2: Replicating "DCASE" with augmentations
Results
Conclusions
Acknowledgements

Figures (4)

Figure 1: SpatialScaper data generation pipeline.
Figure 2: Instantiating a soundscape using a virtual room, microphone, background noise, and a moving foreground event.
Figure 3: Using SpatialScaper to augment a SELD dataset via the augmentations recently proposed by Wang et al. wang2023four
Figure 4: Performance on the test split (STARSS23 "dev-test-sony") as a function of adding rooms (i.e. increasing acoustic diversity) to the training split.

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

TL;DR

Abstract

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Authors

TL;DR

Abstract

Table of Contents

Figures (4)