Table of Contents
Fetching ...

GeeSanBhava: Sentiment Tagged Sinhala Music Video Comment Data Set

Yomal De Mel, Nisansa de Silva

TL;DR

This work tackles emotion recognition in Sinhala song comments by introducing GeeSanBhava, a large dataset of $63{,}471$ YouTube comments annotated in Russell's Valence-Arousal space by three raters, achieving Fleiss's kappa of $84.96\%$. It analyzes the relationship between comment emotions and song emotions using cosine similarity and standardized emotion vectors, and demonstrates a zero-shot Sinhala YouTube classification pipeline built on Sinhala News baselines with multiple embeddings; the best three-layer MLP reaches a ROC-AUC of $0.887$. The study contributes a valuable resource for Sinhala NLP and music emotion recognition, highlights biases in user-generated content, and establishes practical baselines for cross-domain emotion mapping and classification in a low-resource language. These results offer a solid foundation for future multi-class emotion modeling and advanced deep learning architectures in Sinhala MIR and affective computing.

Abstract

This study introduce GeeSanBhava, a high-quality data set of Sinhala song comments extracted from YouTube manually tagged using Russells Valence-Arousal model by three independent human annotators. The human annotators achieve a substantial inter-annotator agreement (Fleiss kappa = 84.96%). The analysis revealed distinct emotional profiles for different songs, highlighting the importance of comment based emotion mapping. The study also addressed the challenges of comparing comment-based and song-based emotions, mitigating biases inherent in user-generated content. A number of Machine learning and deep learning models were pre-trained on a related large data set of Sinhala News comments in order to report the zero-shot result of our Sinhala YouTube comment data set. An optimized Multi-Layer Perceptron model, after extensive hyperparameter tuning, achieved a ROC-AUC score of 0.887. The model is a three-layer MLP with a configuration of 256, 128, and 64 neurons. This research contributes a valuable annotated dataset and provides insights for future work in Sinhala Natural Language Processing and music emotion recognition.

GeeSanBhava: Sentiment Tagged Sinhala Music Video Comment Data Set

TL;DR

This work tackles emotion recognition in Sinhala song comments by introducing GeeSanBhava, a large dataset of YouTube comments annotated in Russell's Valence-Arousal space by three raters, achieving Fleiss's kappa of . It analyzes the relationship between comment emotions and song emotions using cosine similarity and standardized emotion vectors, and demonstrates a zero-shot Sinhala YouTube classification pipeline built on Sinhala News baselines with multiple embeddings; the best three-layer MLP reaches a ROC-AUC of . The study contributes a valuable resource for Sinhala NLP and music emotion recognition, highlights biases in user-generated content, and establishes practical baselines for cross-domain emotion mapping and classification in a low-resource language. These results offer a solid foundation for future multi-class emotion modeling and advanced deep learning architectures in Sinhala MIR and affective computing.

Abstract

This study introduce GeeSanBhava, a high-quality data set of Sinhala song comments extracted from YouTube manually tagged using Russells Valence-Arousal model by three independent human annotators. The human annotators achieve a substantial inter-annotator agreement (Fleiss kappa = 84.96%). The analysis revealed distinct emotional profiles for different songs, highlighting the importance of comment based emotion mapping. The study also addressed the challenges of comparing comment-based and song-based emotions, mitigating biases inherent in user-generated content. A number of Machine learning and deep learning models were pre-trained on a related large data set of Sinhala News comments in order to report the zero-shot result of our Sinhala YouTube comment data set. An optimized Multi-Layer Perceptron model, after extensive hyperparameter tuning, achieved a ROC-AUC score of 0.887. The model is a three-layer MLP with a configuration of 256, 128, and 64 neurons. This research contributes a valuable annotated dataset and provides insights for future work in Sinhala Natural Language Processing and music emotion recognition.

Paper Structure

This paper contains 13 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Frequency distribution of emotions annotated by three independent judges. Each row represents a song, with the columns showing the emotional frequency derived from comments for each annotator.
  • Figure 2: Emotion visualization for Sinhala songs based on comment analysis and annotation