From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

Minxue Niu; Mimansa Jaiswal; Emily Mower Provost

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

Minxue Niu, Mimansa Jaiswal, Emily Mower Provost

TL;DR

It is found that common metrics that use aggregated human annotations as ground truth can underestimate the performance, of GPT-4 and the human evaluation experiment reveals a consistent preference for GPT-4 annotations over humans across multiple datasets and evaluators.

Abstract

Training emotion recognition models has relied heavily on human annotated data, which present diversity, quality, and cost challenges. In this paper, we explore the potential of Large Language Models (LLMs), specifically GPT4, in automating or assisting emotion annotation. We compare GPT4 with supervised models and or humans in three aspects: agreement with human annotations, alignment with human perception, and impact on model training. We find that common metrics that use aggregated human annotations as ground truth can underestimate the performance, of GPT-4 and our human evaluation experiment reveals a consistent preference for GPT-4 annotations over humans across multiple datasets and evaluators. Further, we investigate the impact of using GPT-4 as an annotation filtering process to improve model training. Together, our findings highlight the great potential of LLMs in emotion annotation tasks and underscore the need for refined evaluation methodologies.

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

TL;DR

Abstract

Paper Structure (12 sections, 4 figures, 3 tables)

This paper contains 12 sections, 4 figures, 3 tables.

Introduction
Related Work
Data
Methods
GPT-4 Prompting
Automatic Evaluation Metrics
Supervised model: Finetuned BERT
Human Evaluation
Results
GPT-4 Zero-shot Performance
Impact on Model Training
Discussion, Limitations and Conclusion

Figures (4)

Figure 1: Disagreements between human and GPT annotations on ISEAR.
Figure 2: Human vs. GPT-4 classification.
Figure 3: GPT-4 classification vs. generation.
Figure 4: Human preference ratio comparing human annotations, GPT-4 classification annotations and GPT-4 generation annotations on emotion classification tasks.

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

TL;DR

Abstract

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)