StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?
Guobin Shen, Dongcheng Zhao, Aorigele Bao, Xiang He, Yiting Dong, Yi Zeng
TL;DR
The paper investigates whether stress affects Large Language Models (LLMs) similarly to humans by introducing StressPrompt, a dataset of 100 psychology-grounded prompts annotated for stress levels and evaluated across multiple benchmarks. It combines prompt engineering with a Representation Engineering–inspired Stress Scanner to quantify internal state changes (hidden states) and relate them to performance across tasks, following the Yerkes-Dodson law that moderate arousal optimizes performance. Key findings show that LLMs generally peak at moderate stress levels for reasoning, instruction following, and emotional intelligence, while high stress degrades certain faculties like bias detection; deeper layers exhibit stronger neural signatures of stress, especially on the last token. The work provides practical guidance for designing robust AI systems capable of maintaining high performance in real-world, stress-prone environments and contributes a framework for studying human-like cognitive dynamics in artificial agents.
Abstract
Human beings often experience stress, which can significantly influence their performance. This study explores whether Large Language Models (LLMs) exhibit stress responses similar to those of humans and whether their performance fluctuates under different stress-inducing prompts. To investigate this, we developed a novel set of prompts, termed StressPrompt, designed to induce varying levels of stress. These prompts were derived from established psychological frameworks and carefully calibrated based on ratings from human participants. We then applied these prompts to several LLMs to assess their responses across a range of tasks, including instruction-following, complex reasoning, and emotional intelligence. The findings suggest that LLMs, like humans, perform optimally under moderate stress, consistent with the Yerkes-Dodson law. Notably, their performance declines under both low and high-stress conditions. Our analysis further revealed that these StressPrompts significantly alter the internal states of LLMs, leading to changes in their neural representations that mirror human responses to stress. This research provides critical insights into the operational robustness and flexibility of LLMs, demonstrating the importance of designing AI systems capable of maintaining high performance in real-world scenarios where stress is prevalent, such as in customer service, healthcare, and emergency response contexts. Moreover, this study contributes to the broader AI research community by offering a new perspective on how LLMs handle different scenarios and their similarities to human cognition.
