Table of Contents
Fetching ...

Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion

Jingxiang Zhang, Lujia Zhong

TL;DR

This paper investigates whether modern LLMs encode a structured internal geometry of emotion by building a large, emotion-balanced Reddit corpus (~400k utterances) across Ekman’s seven categories and applying lightweight, frozen-network probes to Qwen3 and LLaMA models. It reveals that emotional representations emerge early, peak in middle layers, and become more separable with model scale, while system prompts can steer the expressed emotion and emotional signals persist for hundreds of tokens. The study contributes an open-source probing toolkit and dataset, delivering the first large-scale, layer-wise map of emotion in contemporary LLMs and offering guidance for transparency, alignment, and safety in affective AI. The findings have practical significance for designing controllable, auditable AI that can understand or generate emotion with predictable internal dynamics.

Abstract

Large Language Models (LLMs) are increasingly expected to navigate the nuances of human emotion. While research confirms that LLMs can simulate emotional intelligence, their internal emotional mechanisms remain largely unexplored. This paper investigates the latent emotional representations within modern LLMs by asking: how, where, and for how long is emotion encoded in their neural architecture? To address this, we introduce a novel, large-scale Reddit corpus of approximately 400,000 utterances, balanced across seven basic emotions through a multi-stage process of classification, rewriting, and synthetic generation. Using this dataset, we employ lightweight "probes" to read out information from the hidden layers of various Qwen3 and LLaMA models without altering their parameters. Our findings reveal that LLMs develop a surprisingly well-defined internal geometry of emotion, which sharpens with model scale and significantly outperforms zero-shot prompting. We demonstrate that this emotional signal is not a final-layer phenomenon but emerges early and peaks mid-network. Furthermore, the internal states are both malleable (they can be influenced by simple system prompts) and persistent, as the initial emotional tone remains detectable for hundreds of subsequent tokens. We contribute our dataset, an open-source probing toolkit, and a detailed map of the emotional landscape within LLMs, offering crucial insights for developing more transparent and aligned AI systems. The code and dataset are open-sourced.

Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion

TL;DR

This paper investigates whether modern LLMs encode a structured internal geometry of emotion by building a large, emotion-balanced Reddit corpus (~400k utterances) across Ekman’s seven categories and applying lightweight, frozen-network probes to Qwen3 and LLaMA models. It reveals that emotional representations emerge early, peak in middle layers, and become more separable with model scale, while system prompts can steer the expressed emotion and emotional signals persist for hundreds of tokens. The study contributes an open-source probing toolkit and dataset, delivering the first large-scale, layer-wise map of emotion in contemporary LLMs and offering guidance for transparency, alignment, and safety in affective AI. The findings have practical significance for designing controllable, auditable AI that can understand or generate emotion with predictable internal dynamics.

Abstract

Large Language Models (LLMs) are increasingly expected to navigate the nuances of human emotion. While research confirms that LLMs can simulate emotional intelligence, their internal emotional mechanisms remain largely unexplored. This paper investigates the latent emotional representations within modern LLMs by asking: how, where, and for how long is emotion encoded in their neural architecture? To address this, we introduce a novel, large-scale Reddit corpus of approximately 400,000 utterances, balanced across seven basic emotions through a multi-stage process of classification, rewriting, and synthetic generation. Using this dataset, we employ lightweight "probes" to read out information from the hidden layers of various Qwen3 and LLaMA models without altering their parameters. Our findings reveal that LLMs develop a surprisingly well-defined internal geometry of emotion, which sharpens with model scale and significantly outperforms zero-shot prompting. We demonstrate that this emotional signal is not a final-layer phenomenon but emerges early and peaks mid-network. Furthermore, the internal states are both malleable (they can be influenced by simple system prompts) and persistent, as the initial emotional tone remains detectable for hundreds of subsequent tokens. We contribute our dataset, an open-source probing toolkit, and a detailed map of the emotional landscape within LLMs, offering crucial insights for developing more transparent and aligned AI systems. The code and dataset are open-sourced.

Paper Structure

This paper contains 30 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: 2‑D KDE contours (density level at 25% for outer line and 50% for inner line of the peak KDE value) of the six Ekman emotions + neutral, showing clear separation in Qwen3‑8B’s final‑layer space.
  • Figure 2: Top: Example prompt templates for our three core emotion‑processing tasks. Bottom: Percentage distribution of sentence lengths by emotion category for the three data sources (Natural, Rewritten, Synthetic).
  • Figure 3: The probing architecture. An input utterance is passed through the frozen LLM. At a selected layer $\ell$, a representation vector is extracted (e.g., from the final token's hidden state). This vector is then fed into a lightweight, two-layer MLP probe trained to classify the emotion.
  • Figure 4: KDE contour plots and corresponding confusion matrices for the final‐layer emotion probes, arranged by model for each column. The top row of each column shows the KDE contours at 25% (outer) and 50% (inner) of the peak density for each emotion, and the bottom row shows the confusion matrix. As the model scale increases, clusters become tighter and more separable, and the confusion matrices grow more diagonally dominant.
  • Figure 5: Layer-wise emergence of separable emotion clusters in Qwen3-4B. 2-D KDE maps of probe outputs at layers 9 (25%), 18 (50%), 27 (75%), and 36 (100%).
  • ...and 2 more figures