Table of Contents
Fetching ...

LLM-based Human Simulations Have Not Yet Been Reliable

Qian Wang, Jiaying Wu, Zichen Jiang, Zhenheng Tang, Bingqiao Luo, Nuo Chen, Wei Chen, Bingsheng He

TL;DR

The paper argues that current LLM-based human simulations are not reliably representative of real human behavior due to intrinsic model biases and design flaws. It formalizes a general framework for simulations, analyzes social, economic, policy, and psychological domains, and identifies core weaknesses in cognition, memory, and validation. A systematic solution framework is proposed, emphasizing enriched data foundations, improved LLM capabilities, and rigorous multi-level validation to enhance fidelity and trustworthiness, along with an operational algorithm. The work highlights practical implications for research and applications, and provides a pathway toward more credible, human-aligned simulations with robust verification. Overall, it calls for a shift from ad-hoc performance toward verifiable reliability in LLM-driven human simulations.

Abstract

Large Language Models (LLMs) are increasingly employed for simulating human behaviors across diverse domains. However, our position is that current LLM-based human simulations remain insufficiently reliable, as evidenced by significant discrepancies between their outcomes and authentic human actions. Our investigation begins with a systematic review of LLM-based human simulations in social, economic, policy, and psychological contexts, identifying their common frameworks, recent advances, and persistent limitations. This review reveals that such discrepancies primarily stem from inherent limitations of LLMs and flaws in simulation design, both of which are examined in detail. Building on these insights, we propose a systematic solution framework that emphasizes enriching data foundations, advancing LLM capabilities, and ensuring robust simulation design to enhance reliability. Finally, we introduce a structured algorithm that operationalizes the proposed framework, aiming to guide credible and human-aligned LLM-based simulations. To facilitate further research, we provide a curated list of related literature and resources at https://github.com/Persdre/awesome-llm-human-simulation.

LLM-based Human Simulations Have Not Yet Been Reliable

TL;DR

The paper argues that current LLM-based human simulations are not reliably representative of real human behavior due to intrinsic model biases and design flaws. It formalizes a general framework for simulations, analyzes social, economic, policy, and psychological domains, and identifies core weaknesses in cognition, memory, and validation. A systematic solution framework is proposed, emphasizing enriched data foundations, improved LLM capabilities, and rigorous multi-level validation to enhance fidelity and trustworthiness, along with an operational algorithm. The work highlights practical implications for research and applications, and provides a pathway toward more credible, human-aligned simulations with robust verification. Overall, it calls for a shift from ad-hoc performance toward verifiable reliability in LLM-driven human simulations.

Abstract

Large Language Models (LLMs) are increasingly employed for simulating human behaviors across diverse domains. However, our position is that current LLM-based human simulations remain insufficiently reliable, as evidenced by significant discrepancies between their outcomes and authentic human actions. Our investigation begins with a systematic review of LLM-based human simulations in social, economic, policy, and psychological contexts, identifying their common frameworks, recent advances, and persistent limitations. This review reveals that such discrepancies primarily stem from inherent limitations of LLMs and flaws in simulation design, both of which are examined in detail. Building on these insights, we propose a systematic solution framework that emphasizes enriching data foundations, advancing LLM capabilities, and ensuring robust simulation design to enhance reliability. Finally, we introduce a structured algorithm that operationalizes the proposed framework, aiming to guide credible and human-aligned LLM-based simulations. To facilitate further research, we provide a curated list of related literature and resources at https://github.com/Persdre/awesome-llm-human-simulation.
Paper Structure (21 sections, 2 equations, 2 figures, 2 tables)

This paper contains 21 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Flow of This Position Paper. We start by reviewing the current LLM-based human simulations, and then identify the causes of the gaps between simulation outputs and real-world human behavior. Finally, we propose targeted solutions for advancing the reliability of LLM-based human simulation.
  • Figure 2: Overview of the Proposed Solution Framework. It details three core components: (a) Enriched Data Foundations, (b) Improved LLM Capabilities, and (c) Trustworthy Simulation Design through Robust Validation.