Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems
Francesca Ronchini, Romain Serizel
TL;DR
The paper addresses the environmental impact and energy footprint of deep learning-based sound event detection by analyzing DCASE Task 4 submissions from 2022 and 2023. It standardizes energy reporting, uses MACs via THOP, and introduces hardware-aware EW-PSDS with the relation $EW-PSDS = PSDS \cdot \frac{kWh_{baseline}}{kWh_{submission}}$ to balance performance and energy. The findings show that energy consumption and model complexity do not always align with performance, and ensembles can boost accuracy at higher energy cost, while thresholding can reduce footprint with limited PSDS loss. The work advocates multi-metric, task-aware energy evaluations to guide sustainable design of SED systems and reduce environmental impact in practical deployments.
Abstract
In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption. As researchers in this domain and organizers of one of the Detection and Classification of Acoustic Scenes and Events challenges tasks, we recognize the importance of addressing the environmental impact of data-driven SED systems. In this paper, we propose an analysis focused on SED systems based on the challenge submissions. This includes a comparison across the past two years and a detailed analysis of this year's SED systems. Through this research, we aim to explore how the SED systems are evolving every year in relation to their energy efficiency implications.
