AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Xin Hong; Yuan Gong; Vidhyasaharan Sethu; Ting Dang

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Xin Hong, Yuan Gong, Vidhyasaharan Sethu, Ting Dang

TL;DR

The paper tackles ambiguity in emotion labeling by leveraging large language models (LLMs) to predict full emotion distributions $\hat{p}(x)$ rather than single labels, comparing them to ground-truth distributions $p(x)$ inferred from multiple annotators. It introduces zero-shot and few-shot prompting augmented with contextual dialogue history and, in some cases, speech features represented textually, to enhance in-context learning. Across MSP-Podcast, IEMOCAP, and GoEmotions, the approach yields significant improvements in uncertainty-calibrated metrics (e.g., Jensen-Shannon divergence, Bhattacharyya coefficient, $R^2$, and calibration error) and shows clear benefits from including context windows of size $M$ (optimally around 10–20) and multimodal prompts. The findings indicate that LLMs are more effective for less ambiguous emotions and offer a pathway toward more natural, emotion-aware conversational AI with practical implications for adaptive communication strategies.

Abstract

Recent advancements in Large Language Models (LLMs) have demonstrated great success in many Natural Language Processing (NLP) tasks. In addition to their cognitive intelligence, exploring their capabilities in emotional intelligence is also crucial, as it enables more natural and empathetic conversational AI. Recent studies have shown LLMs' capability in recognizing emotions, but they often focus on single emotion labels and overlook the complex and ambiguous nature of human emotions. This study is the first to address this gap by exploring the potential of LLMs in recognizing ambiguous emotions, leveraging their strong generalization capabilities and in-context learning. We design zero-shot and few-shot prompting and incorporate past dialogue as context information for ambiguous emotion recognition. Experiments conducted using three datasets indicate significant potential for LLMs in recognizing ambiguous emotions, and highlight the substantial benefits of including context information. Furthermore, our findings indicate that LLMs demonstrate a high degree of effectiveness in recognizing less ambiguous emotions and exhibit potential for identifying more ambiguous emotions, paralleling human perceptual capabilities.

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

TL;DR

The paper tackles ambiguity in emotion labeling by leveraging large language models (LLMs) to predict full emotion distributions

rather than single labels, comparing them to ground-truth distributions

inferred from multiple annotators. It introduces zero-shot and few-shot prompting augmented with contextual dialogue history and, in some cases, speech features represented textually, to enhance in-context learning. Across MSP-Podcast, IEMOCAP, and GoEmotions, the approach yields significant improvements in uncertainty-calibrated metrics (e.g., Jensen-Shannon divergence, Bhattacharyya coefficient,

, and calibration error) and shows clear benefits from including context windows of size

(optimally around 10–20) and multimodal prompts. The findings indicate that LLMs are more effective for less ambiguous emotions and offer a pathway toward more natural, emotion-aware conversational AI with practical implications for adaptive communication strategies.

Abstract

Paper Structure (19 sections, 4 equations, 4 figures, 5 tables)

This paper contains 19 sections, 4 equations, 4 figures, 5 tables.

Introduction
Related work
Ambiguous emotion recognition via LLMs
System overview
Prompt design
Zero-shot and few-shot prompting
Prompt with speech features
Context-aware recognition
Experimental setup and results
Experimental setup
Dataset
Models
Evaluations
Baseline descriptions
Performance on ambiguity-aware prediction
...and 4 more sections

Figures (4)

Figure 1: System overview. $A_i, i \in [1, N]$ represents the $i^{th}$ annotator, $L$ represent the evaluation metrics, including both ambiguity-centric and accuracy-centric metrics.
Figure 2: Performance comparison with increasing context windows using text and speech in MSP-Podcast.
Figure 3: Performance comparison among different levels of ambiguity in MSP-Podcast. A small entropy indicates less ambiguous emotion.
Figure 4: Comparison of W-F1 across five entropy groups with context window = 0 and 30 using MSP-Podcast.

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

TL;DR

Abstract

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)