The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Cheng Li; Jindong Wang; Yixuan Zhang; Kaijie Zhu; Xinyi Wang; Wenxin Hou; Jianxun Lian; Fang Luo; Qiang Yang; Xing Xie

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

TL;DR

This work investigates whether generative AI models exhibit emotion-like processing and how such processing can be leveraged or mitigated. It introduces three theory-grounded approaches—EmotionPrompt to boost, EmotionAttack to impair, and EmotionDecode to explain emotional effects—tested across language and multimodal models on semantic understanding, reasoning, and generation tasks. Key findings show that both textual and visual emotional stimuli can meaningfully improve or degrade performance, with multimodal models showing pronounced sensitivity to visual prompts. EmotionDecode provides a neuroscience-inspired interpretation, positing dopamine-like reward/punishment mechanisms and identifying deeper-layer representations linked to these effects. The study highlights practical implications for prompt engineering, model robustness, and human-AI interaction while outlining limitations and avenues for future work in psychology-informed AI research.

Abstract

Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifically, we propose three approaches: 1) EmotionPrompt to enhance AI model performance, 2) EmotionAttack to impair AI model performance, and 3) EmotionDecode to explain the effects of emotional stimuli, both benign and malignant. Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it. Additionally, EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain. Our work heralds a novel avenue for exploring psychology to enhance our understanding of generative AI models.

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

TL;DR

Abstract

Paper Structure (36 sections, 8 figures, 25 tables)

This paper contains 36 sections, 8 figures, 25 tables.

Introduction
Related Work
Methods
EmotionPrompt
EmotionAttack
EmotionDecode
Results
The benign and malignant effects of emotional stimuli on AI models
EmotionDecode uncovers the effectiveness of emotional stimuli on AI models
Influence factors
Comparison of textual and visual prompts
The combined effect of textual and visual EmotionPrompt
Human study and case analysis
More discussions
Conclusion and Discussion
...and 21 more sections

Figures (8)

Figure 1: Overview of our research to unveil emotions in generative AI models. (a) We proposed EmotionPrompt and EmotionAttack to increase and impair the performance of AI models, respectively. (b) EmotionDecode explained how emotional stimuli work in AI models.
Figure 2: The main results with standard erros of textual and visual EmotionPrompt and EmotionAttack on generative AI models. The results above 0 are from EmotionPrompt and the results below 0 are from EmotionAttack.
Figure 3: Results of EmotionDecode. Each column represents the layer of Llama2-13b, and each row denotes a task. The numbers in each cell denote the performance of using the decoded meta prompts as emotional stimuli for EmotionPrompt and EmotionDecode. The lower GPT-4 results are obtained by transferring the prompts from Llama to GPT-4. The color represents the performance of the stimulus on various tasks in Llama-2 and GPT-4. Red means better performance, while blue means weaker performance.
Figure 4: (a) Ablation studies on temperature for EmotionPrompt. (b) Best stimuli for EmotionPrompt and EmotionAttack. The color of each bar serves as an indicator of the performance achieved by the corresponding stimuli. Red means better performance, while blue means weaker performance.
Figure 5: (a) Comparison of textual and visual prompts from the same psychological theory. (b) Results of human study on performance, truthfulness and responsibility.
...and 3 more figures

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

TL;DR

Abstract

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Authors

TL;DR

Abstract

Table of Contents

Figures (8)