Table of Contents
Fetching ...

AI with Emotions: Exploring Emotional Expressions in Large Language Models

Shin-nosuke Ishikawa, Atsushi Yoshino

TL;DR

This work investigates whether contemporary LLMs can express controllable emotions by conditioning outputs with Russell's Circumplex ARV framework. It prompts 12 evenly spaced arousal-valence states (unit-length vectors with $||v||=1$) and evaluates responses from multiple closed and open models using an independent GoEmotions-based sentiment classifier, mapping outputs to $ (Valence, Arousal)$ and measuring cosine similarity to the specified state. Results show generally positive alignment, with GPT-4, GPT-4 Turbo, and Llama3 70B Instruct achieving the strongest consistency across questions, while GPT-3.5 Turbo lags; some open-model prompts exhibit occasional role-play violations. The study demonstrates the feasibility of emotion-controlled text generation for emotion-aware AI agents, outlining practical applications and highlighting the need to study emotional dynamics, cultural variability, and ethical considerations in deployment.

Abstract

The human-level performance of Large Language Models (LLMs) across various tasks has raised expectations for the potential of Artificial Intelligence (AI) to possess emotions someday. To explore the capability of current LLMs to express emotions in their outputs, we conducted an experiment using several LLMs (OpenAI GPT, Google Gemini, Meta Llama3, and Cohere Command R+) to role-play as agents answering questions with specified emotional states. We defined the emotional states using Russell's Circumplex model, a well-established framework that characterizes emotions along the sleepy-activated (arousal) and pleasure-displeasure (valence) axes. We chose this model for its simplicity, utilizing two continuous parameters, which allows for better controllability in applications involving continuous changes in emotional states. The responses generated were evaluated using a sentiment analysis model, independent of the LLMs, trained on the GoEmotions dataset. The evaluation showed that the emotional states of the generated answers were consistent with the specifications, demonstrating the LLMs' capability for emotional expression. This indicates the potential for LLM-based AI agents to simulate emotions, opening up a wide range of applications for emotion-based interactions, such as advisors or consultants who can provide advice or opinions with a personal touch.

AI with Emotions: Exploring Emotional Expressions in Large Language Models

TL;DR

This work investigates whether contemporary LLMs can express controllable emotions by conditioning outputs with Russell's Circumplex ARV framework. It prompts 12 evenly spaced arousal-valence states (unit-length vectors with ) and evaluates responses from multiple closed and open models using an independent GoEmotions-based sentiment classifier, mapping outputs to and measuring cosine similarity to the specified state. Results show generally positive alignment, with GPT-4, GPT-4 Turbo, and Llama3 70B Instruct achieving the strongest consistency across questions, while GPT-3.5 Turbo lags; some open-model prompts exhibit occasional role-play violations. The study demonstrates the feasibility of emotion-controlled text generation for emotion-aware AI agents, outlining practical applications and highlighting the need to study emotional dynamics, cultural variability, and ethical considerations in deployment.

Abstract

The human-level performance of Large Language Models (LLMs) across various tasks has raised expectations for the potential of Artificial Intelligence (AI) to possess emotions someday. To explore the capability of current LLMs to express emotions in their outputs, we conducted an experiment using several LLMs (OpenAI GPT, Google Gemini, Meta Llama3, and Cohere Command R+) to role-play as agents answering questions with specified emotional states. We defined the emotional states using Russell's Circumplex model, a well-established framework that characterizes emotions along the sleepy-activated (arousal) and pleasure-displeasure (valence) axes. We chose this model for its simplicity, utilizing two continuous parameters, which allows for better controllability in applications involving continuous changes in emotional states. The responses generated were evaluated using a sentiment analysis model, independent of the LLMs, trained on the GoEmotions dataset. The evaluation showed that the emotional states of the generated answers were consistent with the specifications, demonstrating the LLMs' capability for emotional expression. This indicates the potential for LLM-based AI agents to simulate emotions, opening up a wide range of applications for emotion-based interactions, such as advisors or consultants who can provide advice or opinions with a personal touch.

Paper Structure

This paper contains 15 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Input prompt for text generation with a specified emotion expression in the presented experiment. The specified arousal and valence values are filled in during the experiment.
  • Figure 2: Mapping of the GoEmotions labels in the arousal--valence space, as detailed in Table \ref{['tab_word_correspondence']}. All labels are positioned at a distance of 1 from the origin, with the exception of the"neutral" label. Refer to the text for more details.
  • Figure 3: Examples of answers generated with specified emotional states include: (1) GPT-4 with arousal: 0.866, valence: -0.5 for question 1, (2) GPT-3.5 turbo with the same state for question 1, (3) GPT-4 with the opposite state, arousal: -0.866, valence: 0.5, for question 1, and (4) GPT-4 with arousal: 0.866, valence: -0.5 for question 4.
  • Figure 4: Correlation of emotional states in radial coordinates in the arousal--valence space between the state specified in the input prompt and the evaluated state of the output. The thick solid black lines indicate identical angles (e.g., perfectly reproduced emotional states), while the gray solid and dashed lines represent deviations of $\pm$180$^\circ$ and $\pm$90$^\circ$, respectively.
  • Figure 5: Histogram of the cosine similarities between correct and predicted labels in the arousal--valence space. The histogram peaks at 1.0, indicating significant number of the text are classified correctly.
  • ...and 3 more figures