The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Han Yin; Yang Xiao; Rohan Kumar Das; Jisheng Bai; Ting Dang

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang

TL;DR

The task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the first edition of the ESDD challenge are presented and common architectural choices and training strategies among top-performing systems are analyzed.

Abstract

Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the ESDD challenge was launched, attracting 97 registered teams and receiving 1,748 valid submissions. This paper presents the task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the challenge results. Furthermore, we analyze common architectural choices and training strategies among top-performing systems. Finally, we discuss potential future research directions for ESDD, outlining key opportunities and open problems to guide subsequent studies in this field.

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

TL;DR

Abstract

Paper Structure (14 sections, 2 figures, 5 tables)

This paper contains 14 sections, 2 figures, 5 tables.

Introduction
Task, Database and Challenge
ESDD: Task at a Glance
EnvSDD Database
ESDD Challenge
Track 1: ESDD in Unseen Generators
Track 2: Black-Box Low-Resource Data
Evaluation Metric and Challenge Baselines
Challenge Results and Findings
Overall Results and Analyses
Performance on Different Generators
Insights and Future Directions
Conclusions
Generative AI Use Disclosure

Figures (2)

Figure 1: Overview of the two tracks fo ESDD challenge.
Figure 2: Deepfake detection performance in EER (%) of different systems across various audio generators in Track 1 and Track 2.

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

TL;DR

Abstract

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Authors

TL;DR

Abstract

Table of Contents

Figures (2)