Table of Contents
Fetching ...

LLM-Based Social Simulations Require a Boundary

Zengqing Wu, Run Peng, Takayuki Ito, Makoto Onizuka, Chuan Xiao

TL;DR

The paper argues that LLM-based social simulations must operate within clear boundaries due to the tendency of models to produce an homogeneous average persona that suppresses behavioral diversity. It introduces a variance–mean framework to assess alignment at both individual and collective levels and reviews 21 recent studies, finding that mean alignment is commonly tested but variance is often neglected, with many results showing lower heterogeneity than humans. The authors propose practical guidelines: tailor validation depth to the heterogeneity demands of the research question, explicitly report variance, and constrain claims to collective-level qualitative patterns when variance is insufficient. They advocate a boundary-aware approach to using LLMs in social science, aiming to improve rigor and ensure that AI-driven simulations generate genuine explanatory insights rather than artifacts. Overall, the work boundaries the use of LLM-based simulations while outlining concrete steps to enhance their reliability and relevance for social science research.

Abstract

This position paper argues that LLM-based social simulations require clear boundaries to make meaningful contributions to social science. While Large Language Models (LLMs) offer promising capabilities for simulating human behavior, their tendency to produce homogeneous outputs, acting as an "average persona", fundamentally limits their ability to capture the behavioral diversity essential for complex social dynamics. We examine why heterogeneity matters for social simulations and how current LLMs fall short, analyzing the relationship between mean alignment and variance in LLM-generated behaviors. Through a systematic review of representative studies, we find that validation practices often fail to match the heterogeneity requirements of research questions: while most papers include ground truth comparisons, fewer than half explicitly assess behavioral variance, and most that do report lower variance than human populations. We propose that researchers should: (1) match validation depth to the heterogeneity demands of their research questions, (2) explicitly report variance alongside mean alignment, and (3) constrain claims to collective-level qualitative patterns when variance is insufficient. Rather than dismissing LLM-based simulation, we advocate for a boundary-aware approach that ensures these methods contribute genuine insights to social science.

LLM-Based Social Simulations Require a Boundary

TL;DR

The paper argues that LLM-based social simulations must operate within clear boundaries due to the tendency of models to produce an homogeneous average persona that suppresses behavioral diversity. It introduces a variance–mean framework to assess alignment at both individual and collective levels and reviews 21 recent studies, finding that mean alignment is commonly tested but variance is often neglected, with many results showing lower heterogeneity than humans. The authors propose practical guidelines: tailor validation depth to the heterogeneity demands of the research question, explicitly report variance, and constrain claims to collective-level qualitative patterns when variance is insufficient. They advocate a boundary-aware approach to using LLMs in social science, aiming to improve rigor and ensure that AI-driven simulations generate genuine explanatory insights rather than artifacts. Overall, the work boundaries the use of LLM-based simulations while outlining concrete steps to enhance their reliability and relevance for social science research.

Abstract

This position paper argues that LLM-based social simulations require clear boundaries to make meaningful contributions to social science. While Large Language Models (LLMs) offer promising capabilities for simulating human behavior, their tendency to produce homogeneous outputs, acting as an "average persona", fundamentally limits their ability to capture the behavioral diversity essential for complex social dynamics. We examine why heterogeneity matters for social simulations and how current LLMs fall short, analyzing the relationship between mean alignment and variance in LLM-generated behaviors. Through a systematic review of representative studies, we find that validation practices often fail to match the heterogeneity requirements of research questions: while most papers include ground truth comparisons, fewer than half explicitly assess behavioral variance, and most that do report lower variance than human populations. We propose that researchers should: (1) match validation depth to the heterogeneity demands of their research questions, (2) explicitly report variance alongside mean alignment, and (3) constrain claims to collective-level qualitative patterns when variance is insufficient. Rather than dismissing LLM-based simulation, we advocate for a boundary-aware approach that ensures these methods contribute genuine insights to social science.

Paper Structure

This paper contains 49 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of our claims. We value the goal of social simulations as a means to advance social science, e.g. by explaining social patterns, instead of focusing on "perfect" replication of real-world societies. We further examine possible simulation scenarios (e.g., aligned or misaligned means and variances) and advocate for a stronger emphasis on qualitative analysis of collective patterns.
  • Figure 2: Distribution of chosen numbers by GPT-4 (blue) vs. humans (red), adapted from KBC wu2024shall. The LLM reproduces peak values (33, 50, 66) aligning with human choices, indicating aligned mean. However, the frequency of non-peak values is markedly lower than humans, highlighting low variance.