Table of Contents
Fetching ...

The impact of non-target events in synthetic soundscapes for sound event detection

Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell

TL;DR

The paper investigates how non-target events in synthetic soundscapes affect sound event detection (SED) performance within the DCASE 2021 Task 4 framework. Using DESED-derived synthetic subsets with and without non-target events, and by varying the target-to-non-target signal-to-noise ratio (TNTSNR), the study analyzes training/validation configurations and cross-domain matching to recorded recordings. Key findings show that including non-target events in only one phase can improve detection under certain metrics, and that TNTSNR tuning (notably at 15 dB) can better align synthetic training with real-world recordings, while non-target-only evaluation reveals potential class-wise false alarms and confusion. The work highlights the importance of dataset-generation strategies for synthetic SED and points to future research on per-class distributions and acoustic similarity between target and non-target events to further close the synthetic-recorded gap.

Abstract

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes. Until recently only target sound events were considered when synthesizing the soundscapes. However, recorded soundscapes often contain a substantial amount of non-target events that may affect the performance. In this paper, we focus on the impact of these non-target events in the synthetic soundscapes. Firstly, we investigate to what extent using non-target events alternatively during the training or validation phase (or none of them) helps the system to correctly detect target events. Secondly, we analyze to what extend adjusting the signal-to-noise ratio between target and non-target events at training improves the sound event detection performance. The results show that using both target and non-target events for only one of the phases (validation or training) helps the system to properly detect sound events, outperforming the baseline (which uses non-target events in both phases). The paper also reports the results of a preliminary study on evaluating the system on clips that contain only non-target events. This opens questions for future work on non-target subset and acoustic similarity between target and non-target events which might confuse the system.

The impact of non-target events in synthetic soundscapes for sound event detection

TL;DR

The paper investigates how non-target events in synthetic soundscapes affect sound event detection (SED) performance within the DCASE 2021 Task 4 framework. Using DESED-derived synthetic subsets with and without non-target events, and by varying the target-to-non-target signal-to-noise ratio (TNTSNR), the study analyzes training/validation configurations and cross-domain matching to recorded recordings. Key findings show that including non-target events in only one phase can improve detection under certain metrics, and that TNTSNR tuning (notably at 15 dB) can better align synthetic training with real-world recordings, while non-target-only evaluation reveals potential class-wise false alarms and confusion. The work highlights the importance of dataset-generation strategies for synthetic SED and points to future research on per-class distributions and acoustic similarity between target and non-target events to further close the synthetic-recorded gap.

Abstract

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes. Until recently only target sound events were considered when synthesizing the soundscapes. However, recorded soundscapes often contain a substantial amount of non-target events that may affect the performance. In this paper, we focus on the impact of these non-target events in the synthetic soundscapes. Firstly, we investigate to what extent using non-target events alternatively during the training or validation phase (or none of them) helps the system to correctly detect target events. Secondly, we analyze to what extend adjusting the signal-to-noise ratio between target and non-target events at training improves the sound event detection performance. The results show that using both target and non-target events for only one of the phases (validation or training) helps the system to properly detect sound events, outperforming the baseline (which uses non-target events in both phases). The paper also reports the results of a preliminary study on evaluating the system on clips that contain only non-target events. This opens questions for future work on non-target subset and acoustic similarity between target and non-target events which might confuse the system.

Paper Structure

This paper contains 15 sections, 7 tables.