EmoGRACE: Aspect-based emotion analysis for social media data
Christina Zorenböhmer, Sebastian Schmidt, Bernd Resch
TL;DR
This work tackles the lack of labeled data for aspect-based emotion analysis on social media by constructing the first English ABEA training set (2,621 Tweets) with group annotation and majority voting. It repurposes and fine-tunes the GRACE ABSA model for joint ATE and AEC, achieving a plateau at an average F1 of 50.8% across configurations, with 70.1% for ATE alone and 46.9% for the joint extraction. The study identifies data scarcity and higher task complexity as the main bottlenecks, noting overfitting risks and limited generalization despite hyperparameter tuning. It suggests data augmentation and the use of generative foundation models as promising directions, and points to potential domain-specific applications such as disaster response or public-health monitoring to realize ABEE in practice.
Abstract
While sentiment analysis has advanced from sentence to aspect-level, i.e., the identification of concrete terms related to a sentiment, the equivalent field of Aspect-based Emotion Analysis (ABEA) is faced with dataset bottlenecks and the increased complexity of emotion classes in contrast to binary sentiments. This paper addresses these gaps, by generating a first ABEA training dataset, consisting of 2,621 English Tweets, and fine-tuning a BERT-based model for the ABEA sub-tasks of Aspect Term Extraction (ATE) and Aspect Emotion Classification (AEC). The dataset annotation process was based on the hierarchical emotion theory by Shaver et al. [1] and made use of group annotation and majority voting strategies to facilitate label consistency. The resulting dataset contained aspect-level emotion labels for Anger, Sadness, Happiness, Fear, and a None class. Using the new ABEA training dataset, the state-of-the-art ABSA model GRACE by Luo et al. [2] was fine-tuned for ABEA. The results reflected a performance plateau at an F1-score of 70.1% for ATE and 46.9% for joint ATE and AEC extraction. The limiting factors for model performance were broadly identified as the small training dataset size coupled with the increased task complexity, causing model overfitting and limited abilities to generalize well on new data.
