Simulating Society Requires Simulating Thought
Chance Jiajie Li, Jiayi Wu, Zhenze Mo, Ao Qu, Yuhan Tang, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Liang, Luis Alonso, Kent Larson
TL;DR
The paper argues that simulating complex society with large language models requires cognitively grounded reasoning rather than surface-level plausibility. It introduces GenMinds, a framework for structured belief representations via causal motifs and belief graphs, and RECAP, a benchmark for reasoning fidelity including traceability, demographic grounding, and intervention coherence. By shifting from output-centric prompts to a cognition-centric paradigm grounded in causal, compositional, and revisable reasoning, the work offers a path toward agents that simulate how people think, not just what they say. The proposed framework aims to enable transparent diagnostics, pluralistic modeling of public reasoning, and principled evaluation for high-stakes social simulations, while outlining open challenges and next steps for the field.
Abstract
Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior; it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior, primarily through prompting and supervised fine-tuning. Yet current simulations remain grounded in a behaviorist "demographics in, behavior out" paradigm, focusing on surface-level plausibility. As a result, they often lack internal coherence, causal reasoning, and belief traceability, making them unreliable for modeling how people reason, deliberate, and respond to interventions. To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought, not just language, for social simulations.
