Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
Emma Rose Madden
TL;DR
The paper argues that while Large Language Models can serve as high-capacity pattern matchers for quasi-predictive interpolation in social science, they are not principled Bayesian reasoners and do not provide true probabilistic inferences about populations. It proposes a reframed usage with explicit scope conditions and practical guardrails—such as independent draws, preregistered baselines, reliability-aware validation, and subgroup calibration—to harness their utility for prototyping and forecasting while avoiding overinterpretation. The analysis highlights phenomena like introspective hallucination and order-dependence (CID violations), and stresses the need for stress-testing, transparency, and domain-specific calibration rather than universal applicability. Collectively, the work outlines a pragmatic, cautionary path for leveraging synthetic LLM agents that prioritizes calibration, reproducibility, and bounded generalization in social science research.
Abstract
Large Language Models (LLMs) are being increasingly used as synthetic agents in social science, in applications ranging from augmenting survey responses to powering multi-agent simulations. This paper outlines cautions that should be taken when interpreting LLM outputs and proposes a pragmatic reframing for the social sciences in which LLMs are used as high-capacity pattern matchers for quasi-predictive interpolation under explicit scope conditions and not as substitutes for probabilistic inference. Practical guardrails such as independent draws, preregistered human baselines, reliability-aware validation, and subgroup calibration, are introduced so that researchers may engage in useful prototyping and forecasting while avoiding category errors.
