The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games
Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Mikhail Baklashkin, Andrey V. Savchenko, Ilya Makarov
TL;DR
The paper investigates how injecting explicit emotional states into large language models (LLMs) influences decision-making in behavioral game theory settings. It introduces a flexible prompt-chaining framework that adds five Ekman-based emotions and uses separate pipelines for repeated vs. bargaining games, evaluating both alignment with human behavior and decision optimality. Across four games (Dictator, Ultimatum, Prisoner’s Dilemma, Battle of the Sexes) and two models (GPT-3.5, GPT-4), it finds that emotions can significantly alter strategy and payoff, with GPT-3.5 showing stronger alignment to human data in bargaining, while GPT-4 demonstrates greater fairness and robustness yet can be perturbed by anger. The results highlight both the potential and limits of emotional prompting for simulating human-like decision-making in AI and point to dynamic emotion modeling as a direction for future work.
Abstract
Behavior study experiments are an important part of society modeling and understanding human interactions. In practice, many behavioral experiments encounter challenges related to internal and external validity, reproducibility, and social bias due to the complexity of social interactions and cooperation in human user studies. Recent advances in Large Language Models (LLMs) have provided researchers with a new promising tool for the simulation of human behavior. However, existing LLM-based simulations operate under the unproven hypothesis that LLM agents behave similarly to humans as well as ignore a crucial factor in human decision-making: emotions. In this paper, we introduce a novel methodology and the framework to study both, the decision-making of LLMs and their alignment with human behavior under emotional states. Experiments with GPT-3.5 and GPT-4 on four games from two different classes of behavioral game theory showed that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies. While there is a strong alignment between the behavioral responses of GPT-3.5 and human participants, particularly evident in bargaining games, GPT-4 exhibits consistent behavior, ignoring induced emotions for rationality decisions. Surprisingly, emotional prompting, particularly with `anger' emotion, can disrupt the "superhuman" alignment of GPT-4, resembling human emotional responses.
