Table of Contents
Fetching ...

Simulating Society Requires Simulating Thought

Chance Jiajie Li, Jiayi Wu, Zhenze Mo, Ao Qu, Yuhan Tang, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Liang, Luis Alonso, Kent Larson

TL;DR

The paper argues that simulating complex society with large language models requires cognitively grounded reasoning rather than surface-level plausibility. It introduces GenMinds, a framework for structured belief representations via causal motifs and belief graphs, and RECAP, a benchmark for reasoning fidelity including traceability, demographic grounding, and intervention coherence. By shifting from output-centric prompts to a cognition-centric paradigm grounded in causal, compositional, and revisable reasoning, the work offers a path toward agents that simulate how people think, not just what they say. The proposed framework aims to enable transparent diagnostics, pluralistic modeling of public reasoning, and principled evaluation for high-stakes social simulations, while outlining open challenges and next steps for the field.

Abstract

Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior; it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior, primarily through prompting and supervised fine-tuning. Yet current simulations remain grounded in a behaviorist "demographics in, behavior out" paradigm, focusing on surface-level plausibility. As a result, they often lack internal coherence, causal reasoning, and belief traceability, making them unreliable for modeling how people reason, deliberate, and respond to interventions. To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought, not just language, for social simulations.

Simulating Society Requires Simulating Thought

TL;DR

The paper argues that simulating complex society with large language models requires cognitively grounded reasoning rather than surface-level plausibility. It introduces GenMinds, a framework for structured belief representations via causal motifs and belief graphs, and RECAP, a benchmark for reasoning fidelity including traceability, demographic grounding, and intervention coherence. By shifting from output-centric prompts to a cognition-centric paradigm grounded in causal, compositional, and revisable reasoning, the work offers a path toward agents that simulate how people think, not just what they say. The proposed framework aims to enable transparent diagnostics, pluralistic modeling of public reasoning, and principled evaluation for high-stakes social simulations, while outlining open challenges and next steps for the field.

Abstract

Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior; it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior, primarily through prompting and supervised fine-tuning. Yet current simulations remain grounded in a behaviorist "demographics in, behavior out" paradigm, focusing on surface-level plausibility. As a result, they often lack internal coherence, causal reasoning, and belief traceability, making them unreliable for modeling how people reason, deliberate, and respond to interventions. To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought, not just language, for social simulations.

Paper Structure

This paper contains 37 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: From surface imitation to cognitively grounded social simulation. Current LLM-based simulations (top left) capture only surface opinions, shaped by demographics or language patterns, while the deeper belief formation processes remain unmodeled (bottom left, beneath the waterline). This yields population-level simulations that are flattened and stereotyped, reflecting aggregated personas rather than genuine diversity. In contrast, cognitively grounded reasoning (bottom right) models the latent belief dynamics behind individual decisions, producing collective patterns that are heterogeneous, interpretable, and causally faithful.
  • Figure 2: Motif-based belief graph and intervention. Natural language responses are parsed into motif-level causal links, forming a personalized belief graph. A simulated intervention on Transparency propagates downstream updates, shown as highlighted nodes and edges.