Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition
Karn N. Watcharasupat, Yiwei Ding, T. Aleksandra Ma, Pavan Seshadri, Alexander Lerch
TL;DR
This work investigates how to estimate not only the central tendency but also the uncertainty of subjective music emotion responses in MER. It compares methods that require empirical uncertainty during training (MSE, KLD) with those that do not (NLL, random seeds, MC Dropout) on the DEAM dataset, using a Gaussian-output framework with valence-arousal targets. The key finding is that while mean predictions are attainable, the models fail to reliably quantify inter-rater uncertainty, highlighting a fundamental challenge in modeling subjectivity in regression tasks. The results underscore the need for novel uncertainty estimation approaches and richer data to capture the inherent variability of human emotional responses to music, with implications for building trustworthy MER systems.
Abstract
Any data annotation for subjective tasks shows potential variations between individuals. This is particularly true for annotations of emotional responses to musical stimuli. While older approaches to music emotion recognition systems frequently addressed this uncertainty problem through probabilistic modeling, modern systems based on neural networks tend to ignore the variability and focus only on predicting central tendencies of human subjective responses. In this work, we explore several methods for estimating not only the central tendencies of the subjective responses to a musical stimulus, but also for estimating the uncertainty associated with these responses. In particular, we investigate probabilistic loss functions and inference-time random sampling. Experimental results indicate that while the modeling of the central tendencies is achievable, modeling of the uncertainty in subjective responses proves significantly more challenging with currently available approaches even when empirical estimates of variations in the responses are available.
