Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents
Stephen Pilli, Vivek Nallur
TL;DR
This work investigates whether large language models can emulate biased human decision-making at the individual level within conversational contexts. Using a human experiment (N=$1100$) across three classic decision scenarios and two prior-dialogue complexities, the study benchmarks status quo bias in chatbot-assisted decisions and then tests LLMs (GPT-4/5) trained with demographic cues and prior dialogue transcripts to predict human biases. Results show robust status quo effects in humans for Budget Allocation and College Jobs, modest amplification under cognitive load, and mixed replication by LLM agents depending on prompting; HL3 prompts yield stronger but sometimes misleading bias alignment. The findings demonstrate the potential and limitations of LLM-based behavioral simulations for bias-aware interactive systems and emphasize the need for careful prompt design and validation when modeling individual-level decision dynamics.
Abstract
Cognitive biases often shape human decisions. While large language models (LLMs) have been shown to reproduce well-known biases, a more critical question is whether LLMs can predict biases at the individual level and emulate the dynamics of biased human behavior when contextual factors, such as cognitive load, interact with these biases. We adapted three well-established decision scenarios into a conversational setting and conducted a human experiment (N=1100). Participants engaged with a chatbot that facilitates decision-making through simple or complex dialogues. Results revealed robust biases. To evaluate how LLMs emulate human decision-making under similar interactive conditions, we used participant demographics and dialogue transcripts to simulate these conditions with LLMs based on GPT-4 and GPT-5. The LLMs reproduced human biases with precision. We found notable differences between models in how they aligned human behavior. This has important implications for designing and evaluating adaptive, bias-aware LLM-based AI systems in interactive contexts.
