Table of Contents
Fetching ...

"Only ChatGPT gets me": An Empirical Analysis of GPT versus other Large Language Models for Emotion Detection in Text

Florian Lecourt, Madalina Croitoru, Konstantin Todorov

TL;DR

The study evaluates how well large language models detect expressed emotions in text by comparing GPT-family models and other LLMs against a state-of-the-art baseline on the GoEmotions dataset, using the macro $F1_{macro}$ score. Prompt engineering markedly improves ChatGPT’s emotion-detection performance, but GPT models generally do not surpass specialized classifiers like BERT-based SOTA models. The results show GPT-4o offers marginal gains over GPT-3.5-Turbo, while very large models such as Llama-3-70b can approach but not exceed GPT-derived performance; dictionary-based corrections do not improve results. The work highlights the need for semantically aware metrics and multi-dataset validation to better capture nuanced emotion detection in AI systems intended for empathetic human–computer interaction.

Abstract

This work investigates the capabilities of large language models (LLMs) in detecting and understanding human emotions through text. Drawing upon emotion models from psychology, we adopt an interdisciplinary perspective that integrates computational and affective sciences insights. The main goal is to assess how accurately they can identify emotions expressed in textual interactions and compare different models on this specific task. This research contributes to broader efforts to enhance human-computer interaction, making artificial intelligence technologies more responsive and sensitive to users' emotional nuances. By employing a methodology that involves comparisons with a state-of-the-art model on the GoEmotions dataset, we aim to gauge LLMs' effectiveness as a system for emotional analysis, paving the way for potential applications in various fields that require a nuanced understanding of human language.

"Only ChatGPT gets me": An Empirical Analysis of GPT versus other Large Language Models for Emotion Detection in Text

TL;DR

The study evaluates how well large language models detect expressed emotions in text by comparing GPT-family models and other LLMs against a state-of-the-art baseline on the GoEmotions dataset, using the macro score. Prompt engineering markedly improves ChatGPT’s emotion-detection performance, but GPT models generally do not surpass specialized classifiers like BERT-based SOTA models. The results show GPT-4o offers marginal gains over GPT-3.5-Turbo, while very large models such as Llama-3-70b can approach but not exceed GPT-derived performance; dictionary-based corrections do not improve results. The work highlights the need for semantically aware metrics and multi-dataset validation to better capture nuanced emotion detection in AI systems intended for empathetic human–computer interaction.

Abstract

This work investigates the capabilities of large language models (LLMs) in detecting and understanding human emotions through text. Drawing upon emotion models from psychology, we adopt an interdisciplinary perspective that integrates computational and affective sciences insights. The main goal is to assess how accurately they can identify emotions expressed in textual interactions and compare different models on this specific task. This research contributes to broader efforts to enhance human-computer interaction, making artificial intelligence technologies more responsive and sensitive to users' emotional nuances. By employing a methodology that involves comparisons with a state-of-the-art model on the GoEmotions dataset, we aim to gauge LLMs' effectiveness as a system for emotional analysis, paving the way for potential applications in various fields that require a nuanced understanding of human language.

Paper Structure

This paper contains 15 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Graphical representation of the Lövheim model LovheimCubeEmotions2024.
  • Figure 2: 2D representation of the Plutchik model Plutchik_wheel_2024
  • Figure 3: 3D representation of the Plutchik model article
  • Figure 4: Example prompt kocon_chatgpt_2023
  • Figure 5: Evaluation flowchart
  • ...and 4 more figures