Table of Contents
Fetching ...

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems

Francesca Ronchini, Romain Serizel

TL;DR

The paper addresses the environmental impact and energy footprint of deep learning-based sound event detection by analyzing DCASE Task 4 submissions from 2022 and 2023. It standardizes energy reporting, uses MACs via THOP, and introduces hardware-aware EW-PSDS with the relation $EW-PSDS = PSDS \cdot \frac{kWh_{baseline}}{kWh_{submission}}$ to balance performance and energy. The findings show that energy consumption and model complexity do not always align with performance, and ensembles can boost accuracy at higher energy cost, while thresholding can reduce footprint with limited PSDS loss. The work advocates multi-metric, task-aware energy evaluations to guide sustainable design of SED systems and reduce environmental impact in practical deployments.

Abstract

In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption. As researchers in this domain and organizers of one of the Detection and Classification of Acoustic Scenes and Events challenges tasks, we recognize the importance of addressing the environmental impact of data-driven SED systems. In this paper, we propose an analysis focused on SED systems based on the challenge submissions. This includes a comparison across the past two years and a detailed analysis of this year's SED systems. Through this research, we aim to explore how the SED systems are evolving every year in relation to their energy efficiency implications.

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems

TL;DR

The paper addresses the environmental impact and energy footprint of deep learning-based sound event detection by analyzing DCASE Task 4 submissions from 2022 and 2023. It standardizes energy reporting, uses MACs via THOP, and introduces hardware-aware EW-PSDS with the relation to balance performance and energy. The findings show that energy consumption and model complexity do not always align with performance, and ensembles can boost accuracy at higher energy cost, while thresholding can reduce footprint with limited PSDS loss. The work advocates multi-metric, task-aware energy evaluations to guide sustainable design of SED systems and reduce environmental impact in practical deployments.

Abstract

In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption. As researchers in this domain and organizers of one of the Detection and Classification of Acoustic Scenes and Events challenges tasks, we recognize the importance of addressing the environmental impact of data-driven SED systems. In this paper, we propose an analysis focused on SED systems based on the challenge submissions. This includes a comparison across the past two years and a detailed analysis of this year's SED systems. Through this research, we aim to explore how the SED systems are evolving every year in relation to their energy efficiency implications.
Paper Structure (9 sections, 1 equation, 9 figures, 2 tables)

This paper contains 9 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Relation between system complexity and energy consumption at training for 2023 entries, compared with the two baselines systems.
  • Figure 2: Relation between system complexity and energy consumption at test for 2023 entries, compared with the two baselines systems.
  • Figure 3: PSDS_1 and energy consumption at training for best 2023 systems, compared with the two baselines systems.
  • Figure 4: PSDS_1 and energy consumption at test for best performance 2023 systems, compared with the two baselines systems.
  • Figure 5: Relation between MACs and energy consumption at training for 2023 entries, compared with the two baselines systems.
  • ...and 4 more figures