Joint Learning of Emotions in Music and Generalized Sounds
Federico Simonetta, Francesca Certo, Stavros Ntalampiras
TL;DR
This work addresses cross-domain emotion recognition by investigating whether generalized sounds and music share a common emotional space along the arousal and valence axes. It builds a shared feature space using 6375 openSMILE ComParE features extracted from IADS-E and PMEmo, and evaluates ElasticNet, SVR, and AutoML within a dataset-augmentation framework controlled by $k$ and $p$ in the mixed dataset $k \times | ext{IADS-E}| + p \times | ext{PMEmo}|$. The results show that augmentation improves both arousal and valence predictions, with AutoML achieving state-of-the-art performance and arousal benefiting more from cross-domain transfer ($R^2$ > 0.15) than valence. The findings highlight the effectiveness of non-linear models in a shared affective space and suggest broader applications to include additional data classes for diverse affective tasks, offering a simple yet powerful route to enhance AER/MER systems.
Abstract
In this study, we aim to determine if generalized sounds and music can share a common emotional space, improving predictions of emotion in terms of arousal and valence. We propose the use of multiple datasets as a multi-domain learning technique. Our approach involves creating a common space encompassing features that characterize both generalized sounds and music, as they can evoke emotions in a similar manner. To achieve this, we utilized two publicly available datasets, namely IADS-E and PMEmo, following a standardized experimental protocol. We employed a wide variety of features that capture diverse aspects of the audio structure including key parameters of spectrum, energy, and voicing. Subsequently, we performed joint learning on the common feature space, leveraging heterogeneous model architectures. Interestingly, this synergistic scheme outperforms the state-of-the-art in both sound and music emotion prediction. The code enabling full replication of the presented experimental pipeline is available at https://github.com/LIMUNIMI/MusicSoundEmotions.
