Table of Contents
Fetching ...

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

Benjamin Reichman, Adar Avasian, Samuel Webster, Larry Heck

TL;DR

An emotional regularization framework is proposed that constrains emotion-conditioned representational drift during training and improves reading comprehension in both emotionally-varying and non-emotionally varying datasets, yielding consistent gains under distribution shift and in-domain improvements on several benchmarks.

Abstract

Large language models are routinely deployed on text that varies widely in emotional tone, yet their reasoning behavior is typically evaluated without accounting for emotion as a source of representational variation. Prior work has largely treated emotion as a prediction target, for example in sentiment analysis or emotion classification. In contrast, we study emotion as a latent factor that shapes how models attend to and reason over text. We analyze how emotional tone systematically alters attention geometry in transformer models, showing that metrics such as locality, center-of-mass distance, and entropy vary across emotions and correlate with downstream question-answering performance. To facilitate controlled study of these effects, we introduce Affect-Uniform ReAding QA (AURA-QA), a question-answering dataset with emotionally balanced, human-authored context passages. Finally, an emotional regularization framework is proposed that constrains emotion-conditioned representational drift during training. Experiments across multiple QA benchmarks demonstrate that this approach improves reading comprehension in both emotionally-varying and non-emotionally varying datasets, yielding consistent gains under distribution shift and in-domain improvements on several benchmarks.

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

TL;DR

An emotional regularization framework is proposed that constrains emotion-conditioned representational drift during training and improves reading comprehension in both emotionally-varying and non-emotionally varying datasets, yielding consistent gains under distribution shift and in-domain improvements on several benchmarks.

Abstract

Large language models are routinely deployed on text that varies widely in emotional tone, yet their reasoning behavior is typically evaluated without accounting for emotion as a source of representational variation. Prior work has largely treated emotion as a prediction target, for example in sentiment analysis or emotion classification. In contrast, we study emotion as a latent factor that shapes how models attend to and reason over text. We analyze how emotional tone systematically alters attention geometry in transformer models, showing that metrics such as locality, center-of-mass distance, and entropy vary across emotions and correlate with downstream question-answering performance. To facilitate controlled study of these effects, we introduce Affect-Uniform ReAding QA (AURA-QA), a question-answering dataset with emotionally balanced, human-authored context passages. Finally, an emotional regularization framework is proposed that constrains emotion-conditioned representational drift during training. Experiments across multiple QA benchmarks demonstrate that this approach improves reading comprehension in both emotionally-varying and non-emotionally varying datasets, yielding consistent gains under distribution shift and in-domain improvements on several benchmarks.
Paper Structure (34 sections, 7 equations, 12 figures, 8 tables)

This paper contains 34 sections, 7 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Distribution of emotions for a web corpus penedo2023refinedweb.
  • Figure 2: Emotion Distribution for TweetQA and FriendsQA.
  • Figure 3: Disparity in performance across emotions for LLaMA-3.1-8B on AURA-QA.
  • Figure 4: Emotion-specific differences in attention geometry across features. The heatmap reports one-vs-rest Cohen’s d effect sizes, comparing each emotion to all others for each attention feature. Colors indicate the direction and magnitude of deviation relative to the global distribution. Features are ordered by across-emotion variance, highlighting attention dimensions most sensitive to emotional tone. Sarcasm is omitted for clarity due to its extreme divergence.
  • Figure 5: Differences in the attentional pattern between emotions. For each query token, attention differences across keys are standardized by subtracting the row mean and dividing by the row standard deviation, highlighting relative redistribution of attention rather than absolute magnitude changes.
  • ...and 7 more figures