Table of Contents
Fetching ...

LLM Social Simulations Are a Promising Research Method

Jacy Reese Anthis, Ryan Liu, Sean M. Richardson, Austin C. Kozlowski, Bernard Koch, James Evans, Erik Brynjolfsson, Michael Bernstein

TL;DR

This position paper argues that large language models can prospectively simulate human research subjects to augment social science data, addressing issues of sampling, cost, and bias. It identifies five tractable challenges—diversity, bias, sycophancy, alienness, and generalization—and outlines promising directions (prompting, distribution elicitation, steering vectors, and training/tuning) to overcome them, supported by recent empirical work. The authors advocate cautious, exploratory use now, paired with conceptual modeling and iterative evaluation to improve generalizability as AI capabilities advance. They also discuss applications, ethical considerations, and the need for evaluation science to validate the promise of LLM-based social simulations.

Abstract

Accurate and verifiable large language model (LLM) simulations of human research subjects promise an accessible data source for understanding human behavior and training new AI systems. However, results to date have been limited, and few social scientists have adopted this method. In this position paper, we argue that the promise of LLM social simulations can be achieved by addressing five tractable challenges. We ground our argument in a review of empirical comparisons between LLMs and human research subjects, commentaries on the topic, and related work. We identify promising directions, including context-rich prompting and fine-tuning with social science datasets. We believe that LLM social simulations can already be used for pilot and exploratory studies, and more widespread use may soon be possible with rapidly advancing LLM capabilities. Researchers should prioritize developing conceptual models and iterative evaluations to make the best use of new AI systems.

LLM Social Simulations Are a Promising Research Method

TL;DR

This position paper argues that large language models can prospectively simulate human research subjects to augment social science data, addressing issues of sampling, cost, and bias. It identifies five tractable challenges—diversity, bias, sycophancy, alienness, and generalization—and outlines promising directions (prompting, distribution elicitation, steering vectors, and training/tuning) to overcome them, supported by recent empirical work. The authors advocate cautious, exploratory use now, paired with conceptual modeling and iterative evaluation to improve generalizability as AI capabilities advance. They also discuss applications, ethical considerations, and the need for evaluation science to validate the promise of LLM-based social simulations.

Abstract

Accurate and verifiable large language model (LLM) simulations of human research subjects promise an accessible data source for understanding human behavior and training new AI systems. However, results to date have been limited, and few social scientists have adopted this method. In this position paper, we argue that the promise of LLM social simulations can be achieved by addressing five tractable challenges. We ground our argument in a review of empirical comparisons between LLMs and human research subjects, commentaries on the topic, and related work. We identify promising directions, including context-rich prompting and fine-tuning with social science datasets. We believe that LLM social simulations can already be used for pilot and exploratory studies, and more widespread use may soon be possible with rapidly advancing LLM capabilities. Researchers should prioritize developing conceptual models and iterative evaluations to make the best use of new AI systems.

Paper Structure

This paper contains 33 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Six applications of LLM social simulations. The most difficult applications are complete studies that are human-possible (HP), where it would be possible to use human subjects, or human-impossible (HI), such as large-scale policy experiments.