Table of Contents
Fetching ...

Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition

Karn N. Watcharasupat, Yiwei Ding, T. Aleksandra Ma, Pavan Seshadri, Alexander Lerch

TL;DR

This work investigates how to estimate not only the central tendency but also the uncertainty of subjective music emotion responses in MER. It compares methods that require empirical uncertainty during training (MSE, KLD) with those that do not (NLL, random seeds, MC Dropout) on the DEAM dataset, using a Gaussian-output framework with valence-arousal targets. The key finding is that while mean predictions are attainable, the models fail to reliably quantify inter-rater uncertainty, highlighting a fundamental challenge in modeling subjectivity in regression tasks. The results underscore the need for novel uncertainty estimation approaches and richer data to capture the inherent variability of human emotional responses to music, with implications for building trustworthy MER systems.

Abstract

Any data annotation for subjective tasks shows potential variations between individuals. This is particularly true for annotations of emotional responses to musical stimuli. While older approaches to music emotion recognition systems frequently addressed this uncertainty problem through probabilistic modeling, modern systems based on neural networks tend to ignore the variability and focus only on predicting central tendencies of human subjective responses. In this work, we explore several methods for estimating not only the central tendencies of the subjective responses to a musical stimulus, but also for estimating the uncertainty associated with these responses. In particular, we investigate probabilistic loss functions and inference-time random sampling. Experimental results indicate that while the modeling of the central tendencies is achievable, modeling of the uncertainty in subjective responses proves significantly more challenging with currently available approaches even when empirical estimates of variations in the responses are available.

Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition

TL;DR

This work investigates how to estimate not only the central tendency but also the uncertainty of subjective music emotion responses in MER. It compares methods that require empirical uncertainty during training (MSE, KLD) with those that do not (NLL, random seeds, MC Dropout) on the DEAM dataset, using a Gaussian-output framework with valence-arousal targets. The key finding is that while mean predictions are attainable, the models fail to reliably quantify inter-rater uncertainty, highlighting a fundamental challenge in modeling subjectivity in regression tasks. The results underscore the need for novel uncertainty estimation approaches and richer data to capture the inherent variability of human emotional responses to music, with implications for building trustworthy MER systems.

Abstract

Any data annotation for subjective tasks shows potential variations between individuals. This is particularly true for annotations of emotional responses to musical stimuli. While older approaches to music emotion recognition systems frequently addressed this uncertainty problem through probabilistic modeling, modern systems based on neural networks tend to ignore the variability and focus only on predicting central tendencies of human subjective responses. In this work, we explore several methods for estimating not only the central tendencies of the subjective responses to a musical stimulus, but also for estimating the uncertainty associated with these responses. In particular, we investigate probabilistic loss functions and inference-time random sampling. Experimental results indicate that while the modeling of the central tendencies is achievable, modeling of the uncertainty in subjective responses proves significantly more challenging with currently available approaches even when empirical estimates of variations in the responses are available.
Paper Structure (21 sections, 11 equations, 5 figures, 3 tables)

This paper contains 21 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Illustration of different ways of uncertainty estimation.
  • Figure 2: Distribution of mean and SD of valence ratings
  • Figure 3: Distribution of mean and SD of arousal ratings
  • Figure 4: Empirical and corresponding predicted means of arousal and valence. For single-model methods, the visualization was derived from the outputs of one model realization (seed 41).
  • Figure 5: Empirical and corresponding predicted standard deviations of arousal and valence. For single-model methods, the visualization was derived from the outputs of one model realization (seed 41).