Table of Contents
Fetching ...

nEMO: Dataset of Emotional Speech in Polish

Iwona Christop

TL;DR

nEMO addresses the scarcity of Polish emotional speech data for SER by presenting a simulated six-emotion corpus recorded from nine native Polish speakers. The dataset comprises 4,481 recordings totaling over 3 hours, with carefully designed linguistic content spanning 90 uncommon Polish phonemes and accompanying transcriptions and speaker metadata. It was evaluated with three classical classifiers on MFCC features, achieving 83.95% accuracy with Random Forest, illustrating clear discriminability among emotions while highlighting challenges with the surprise category. The dataset is freely available under CC BY-NC-SA 4.0 on Hugging Face and GitHub, enabling researchers to train and evaluate SER, ASR, and TTS systems for Polish. This work fills a gap in Slavic-language resources and supports cross-linguistic emotion recognition research.

Abstract

Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems. However, a major issue in this field is the lack of datasets that adequately represent basic emotional states across various language families. As datasets covering Slavic languages are rare, there is a need to address this research gap. This paper presents the development of nEMO, a novel corpus of emotional speech in Polish. The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language adequately. The corpus is freely available under the terms of a Creative Commons license (CC BY-NC-SA 4.0).

nEMO: Dataset of Emotional Speech in Polish

TL;DR

nEMO addresses the scarcity of Polish emotional speech data for SER by presenting a simulated six-emotion corpus recorded from nine native Polish speakers. The dataset comprises 4,481 recordings totaling over 3 hours, with carefully designed linguistic content spanning 90 uncommon Polish phonemes and accompanying transcriptions and speaker metadata. It was evaluated with three classical classifiers on MFCC features, achieving 83.95% accuracy with Random Forest, illustrating clear discriminability among emotions while highlighting challenges with the surprise category. The dataset is freely available under CC BY-NC-SA 4.0 on Hugging Face and GitHub, enabling researchers to train and evaluate SER, ASR, and TTS systems for Polish. This work fills a gap in Slavic-language resources and supports cross-linguistic emotion recognition research.

Abstract

Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems. However, a major issue in this field is the lack of datasets that adequately represent basic emotional states across various language families. As datasets covering Slavic languages are rare, there is a need to address this research gap. This paper presents the development of nEMO, a novel corpus of emotional speech in Polish. The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language adequately. The corpus is freely available under the terms of a Creative Commons license (CC BY-NC-SA 4.0).
Paper Structure (12 sections, 2 figures, 3 tables)

This paper contains 12 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Performance metrics of SVM, Logistic Regression, and Random Forest classifiers on the nEMO dataset.
  • Figure 2: Comparison of confusion matrices for SVM, Logistic Regression, and Random Forest classifiers on the nEMO dataset.