Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion
Jingxiang Zhang, Lujia Zhong
TL;DR
This paper investigates whether modern LLMs encode a structured internal geometry of emotion by building a large, emotion-balanced Reddit corpus (~400k utterances) across Ekman’s seven categories and applying lightweight, frozen-network probes to Qwen3 and LLaMA models. It reveals that emotional representations emerge early, peak in middle layers, and become more separable with model scale, while system prompts can steer the expressed emotion and emotional signals persist for hundreds of tokens. The study contributes an open-source probing toolkit and dataset, delivering the first large-scale, layer-wise map of emotion in contemporary LLMs and offering guidance for transparency, alignment, and safety in affective AI. The findings have practical significance for designing controllable, auditable AI that can understand or generate emotion with predictable internal dynamics.
Abstract
Large Language Models (LLMs) are increasingly expected to navigate the nuances of human emotion. While research confirms that LLMs can simulate emotional intelligence, their internal emotional mechanisms remain largely unexplored. This paper investigates the latent emotional representations within modern LLMs by asking: how, where, and for how long is emotion encoded in their neural architecture? To address this, we introduce a novel, large-scale Reddit corpus of approximately 400,000 utterances, balanced across seven basic emotions through a multi-stage process of classification, rewriting, and synthetic generation. Using this dataset, we employ lightweight "probes" to read out information from the hidden layers of various Qwen3 and LLaMA models without altering their parameters. Our findings reveal that LLMs develop a surprisingly well-defined internal geometry of emotion, which sharpens with model scale and significantly outperforms zero-shot prompting. We demonstrate that this emotional signal is not a final-layer phenomenon but emerges early and peaks mid-network. Furthermore, the internal states are both malleable (they can be influenced by simple system prompts) and persistent, as the initial emotional tone remains detectable for hundreds of subsequent tokens. We contribute our dataset, an open-source probing toolkit, and a detailed map of the emotional landscape within LLMs, offering crucial insights for developing more transparent and aligned AI systems. The code and dataset are open-sourced.
