Don't Get Too Excited -- Eliciting Emotions in LLMs
Gino Franco Fazzi, Julie Skoven Hinge, Stefan Heinrich, Paolo Burelli
TL;DR
The paper tackles how open-weight LLMs elicit and maintain emotional states in dialogue along the Valence-Arousal ($V$,$A$) space. It introduces a two-stage methodology combining LLM-based sentiment analysis with multiturn, persona-driven conversations, using a SAM-based mapping of $V$ and $A$ to 25 states for evaluation. By assessing 12 open-weight LLMs and focusing on four top performers, it examines the impact of partner emotion and conditioned personality prompts on VA trajectories across 20-turn interactions, revealing notable inter-model variability and convergence dynamics. The findings indicate that while some models can reproduce targeted VA values, extreme emotional prompts often normalize toward neutral states, and conversational dynamics can moderate affective outputs. These insights have practical implications for developing emotionally intelligent AI, highlighting the need for improved affect modeling, human-in-the-loop evaluation, and multimodal approaches to capture richer emotional cues in real-world interactions.
Abstract
This paper investigates the challenges of affect control in large language models (LLMs), focusing on their ability to express appropriate emotional states during extended dialogues. We evaluated state-of-the-art open-weight LLMs to assess their affective expressive range in terms of arousal and valence. Our study employs a novel methodology combining LLM-based sentiment analysis with multiturn dialogue simulations between LLMs. We quantify the models' capacity to express a wide spectrum of emotions and how they fluctuate during interactions. Our findings reveal significant variations among LLMs in their ability to maintain consistent affect, with some models demonstrating more stable emotional trajectories than others. Furthermore, we identify key challenges in affect control, including difficulties in producing and maintaining extreme emotional states and limitations in adapting affect to changing conversational contexts. These findings have important implications for the development of more emotionally intelligent AI systems and highlight the need for improved affect modelling in LLMs.
