Can LLMs Replace Economic Choice Prediction Labs? The Case of Language-based Persuasion Games
Eilam Shapira, Omer Madmon, Roi Reichart, Moshe Tennenholtz
TL;DR
The paper demonstrates that large language models can generate synthetic data that train predictors of human choices in language-based persuasion games, sometimes outperforming predictors trained on actual human data when sample sizes are large enough. It further shows that fine-tuning LLMs on human data can yield direct predictors and high-quality data generators, with a calibration–accuracy trade-off that can be mitigated by the Double Use of human data for Augmented Learning (DUAL). Crucially, the study highlights history as a central driver of human decision-making in repeated interactions, showing that history-based patterns enable more accurate predictions than sentiment cues alone. The findings suggest a scalable, data-efficient path for modeling human decision-making in linguistically rich economic settings, with broad implications for synthetic data generation, model calibration, and ethical considerations in AI-driven behavioral research.
Abstract
Human choice prediction in economic contexts is crucial for applications in marketing, finance, public policy, and more. This task, however, is often constrained by the difficulties in acquiring human choice data. With most experimental economics studies focusing on simple choice settings, the AI community has explored whether LLMs can substitute for humans in these predictions and examined more complex experimental economics settings. However, a key question remains: can LLMs generate training data for human choice prediction? We explore this in language-based persuasion games, a complex economic setting involving natural language in strategic interactions. Our experiments show that models trained on LLM-generated data can effectively predict human behavior in these games and even outperform models trained on actual human data. Beyond data generation, we investigate the dual role of LLMs as both data generators and predictors, introducing a comprehensive empirical study on the effectiveness of utilizing LLMs for data generation, human choice prediction, or both. We then utilize our choice prediction framework to analyze how strategic factors shape decision-making, showing that interaction history (rather than linguistic sentiment alone) plays a key role in predicting human decision-making in repeated interactions. Particularly, when LLMs capture history-dependent decision patterns similarly to humans, their predictive success improves substantially. Finally, we demonstrate the robustness of our findings across alternative persuasion-game settings, highlighting the broader potential of using LLM-generated data to model human decision-making.
