From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems
Constance Douwes, Romain Serizel
TL;DR
The paper investigates how compute cost relates to energy consumption in training and testing neural networks for sound event detection, focusing on MLP, CNN, RNN, and CRNN architectures on DESED-based audio tagging. Using forward FLOPs measured by a profiler and backward FLOPs approximated by a $2:1$ ratio, the study links energy to computation while tracking GPU energy with CodeCarbon and GPU/memory usage with Nvidia SMI. Results reveal architecture-dependent relationships: energy generally scales with $FLOPs$, but the strength and shape of this relation vary across MLP/RNN versus CNN/CRNN, and training often incurs higher energy than testing due to memory traffic. A key finding is the strong correlation between energy and GPU utilization across phases, suggesting GPU load as a practical predictor for energy and supporting the development of architecture-aware, hardware-normalized energy indicators for greener AI in audio processing.
Abstract
The massive use of machine learning models, particularly neural networks, has raised serious concerns about their environmental impact. Indeed, over the last few years we have seen an explosion in the computing costs associated with training and deploying these systems. It is, therefore, crucial to understand their energy requirements in order to better integrate them into the evaluation of models, which has so far focused mainly on performance. In this paper, we study several neural network architectures that are key components of sound event detection systems, using an audio tagging task as an example. We measure the energy consumption for training and testing small to large architectures and establish complex relationships between the energy consumption, the number of floating-point operations, the number of parameters, and the GPU/memory utilization.
