Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
Xuhui Zhou, Zhe Su, Tiwalayo Eisape, Hyunwoo Kim, Maarten Sap
TL;DR
This paper interrogates the realism of LLM-based social simulations by contrasting Script (omniscient) and Agents (information-asymmetric) modes within a Sotopia-inspired framework. It shows that Script mode substantially inflates goal success and naturalness compared to Agents, revealing a core challenge: information asymmetry in realistic human interactions. The authors further test learning from Script-generated data via finetuning and find selective improvements accompanied by biases that degrade generalization to real-world settings. They propose reporting standards via a Simulation Card and outline avenues to improve realism, such as modeling theory of mind and external context rather than relying on omniscient access. Overall, the work clarifies the limits of current simulation paradigms and provides practical guidelines for more credible training and evaluation of AI agents in social tasks.
Abstract
Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.
