nEMO: Dataset of Emotional Speech in Polish
Iwona Christop
TL;DR
nEMO addresses the scarcity of Polish emotional speech data for SER by presenting a simulated six-emotion corpus recorded from nine native Polish speakers. The dataset comprises 4,481 recordings totaling over 3 hours, with carefully designed linguistic content spanning 90 uncommon Polish phonemes and accompanying transcriptions and speaker metadata. It was evaluated with three classical classifiers on MFCC features, achieving 83.95% accuracy with Random Forest, illustrating clear discriminability among emotions while highlighting challenges with the surprise category. The dataset is freely available under CC BY-NC-SA 4.0 on Hugging Face and GitHub, enabling researchers to train and evaluate SER, ASR, and TTS systems for Polish. This work fills a gap in Slavic-language resources and supports cross-linguistic emotion recognition research.
Abstract
Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems. However, a major issue in this field is the lack of datasets that adequately represent basic emotional states across various language families. As datasets covering Slavic languages are rare, there is a need to address this research gap. This paper presents the development of nEMO, a novel corpus of emotional speech in Polish. The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language adequately. The corpus is freely available under the terms of a Creative Commons license (CC BY-NC-SA 4.0).
