Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

Smriti Singh; Cornelia Caragea; Junyi Jessy Li

Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

Smriti Singh, Cornelia Caragea, Junyi Jessy Li

TL;DR

This work interrogates whether human emotion triggers meaningfully contribute to emotion prediction by introducing EmoTrigger, a linguist-annotated dataset of 900 short social-media texts mapped across three emotion corpora. It benchmarks large language models (GPT-4, Llama2-Chat, Alpaca) and fine-tuned transformers (EmoBERTA) against unsupervised baselines (EmoLex, TopicRank) to assess trigger identification and feature salience via SHAP. The key finding is that triggers are largely not salient features for most models, with keyphrases showing much stronger alignment to model salience; GPT-4 remains the notable exception in trigger identification capability. This suggests current open-source models rely more on topical cues than genuine emotion-trigger reasoning, highlighting a gap between human appraisal-driven emotion understanding and contemporary NLP models. The EmoTrigger dataset provides a foundation for further research into trigger-aware, interpretable emotion models and their alignment with psychological theories of appraisal.

Abstract

Situations and events evoke emotions in humans, but to what extent do they inform the prediction of emotion detection models? This work investigates how well human-annotated emotion triggers correlate with features that models deemed salient in their prediction of emotions. First, we introduce a novel dataset EmoTrigger, consisting of 900 social media posts sourced from three different datasets; these were annotated by experts for emotion triggers with high agreement. Using EmoTrigger, we evaluate the ability of large language models (LLMs) to identify emotion triggers, and conduct a comparative analysis of the features considered important for these tasks between LLMs and fine-tuned models. Our analysis reveals that emotion triggers are largely not considered salient features for emotion prediction models, instead there is intricate interplay between various features and the task of emotion detection.

Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

TL;DR

Abstract

Paper Structure (29 sections, 5 figures, 10 tables)

This paper contains 29 sections, 5 figures, 10 tables.

Introduction
The EmoTrigger Dataset
Source Datasets
Annotation
Study Design
LLMs
Fine-tuned Transformers
Unsupervised Comparators
Results
Are LLMs capable of detecting emotions and identifying their triggers?
To what extent do emotion prediction models rely on features that reflect emotion triggers?
How often do the triggers overlap with keyphrases or emotion words?
Qualitative Analysis
Trigger identification
LLM's "attribution" of its own predictions
...and 14 more sections

Figures (5)

Figure 1: Examples from EmoTrigger. Triggers highlighted with the same color as the respective emotions.
Figure 2: Examples of trigger identification. Gold label triggers are highlighted, keyphrases are underlined and EmoLex words are in italics. The emotions are in subscript to the triggers that caused them.
Figure 3: Examples of how the Alpaca-specific prompt allows the model to interact with the text in a more coherent manner.
Figure 4: Examples of emotion detection. The gold label emotions are indicated in square-brackets.
Figure 5: Tabular comparison between features taken into consideration by various models for emotion detection. Gold label emotions are presented in square brackets. Triggers are highlighted, keyphrases are underlined and EmoLex words are italicized.

Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

TL;DR

Abstract

Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

Authors

TL;DR

Abstract

Table of Contents

Figures (5)