Large Language Models on Fine-grained Emotion Detection Dataset with Data Augmentation and Transfer Learning
Kaipeng Wang, Zhi Jing, Yongye Su, Yikun Han
TL;DR
The paper tackles fine_grained emotion detection on GoEmotions, addressing dataset challenges such as imbalance and bias, and evaluating multiple approaches from strong baselines to advanced data augmentation and cross_domain transfer. It reproduces baseline BERT results across three taxonomies, compares RoBERTa, and demonstrates that ProtAugment combined with CARER-based transfer yields the strongest gains, while traditional LLMs like GPT-4 struggle in zero_shot settings due to hallucination and mislabeling. The work provides concrete evidence that targeted data augmentation and cross_domain transfer can significantly improve macro_F1 scores on GoEmotions, and it highlights the limitations of current LLMs for fine_grained emotion labeling. Overall, the study offers practical pathways to improve emotion detection in NLP and points to a need for broader surveys across emotion datasets to synthesize methods and performances.
Abstract
This paper delves into enhancing the classification performance on the GoEmotions dataset, a large, manually annotated dataset for emotion detection in text. The primary goal of this paper is to address the challenges of detecting subtle emotions in text, a complex issue in Natural Language Processing (NLP) with significant practical applications. The findings offer valuable insights into addressing the challenges of emotion detection in text and suggest directions for future research, including the potential for a survey paper that synthesizes methods and performances across various datasets in this domain.
