Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need
Yuqi Bai, Tianyu Huang, Kun Sun, Yuting Chen
TL;DR
This work addresses the challenge of using large language models to simulate human personality for social experiments. It introduces a world-model framework that grounds virtual personas in census-driven skeletons and then generates richly detailed profiles, enabling end-to-end personality testing with the Big Five. By replacing CFA with engineering analytics, the study demonstrates that persona detail robustly improves stability, identifiability, and population-level realism, unveiling a Scaling Law: More Detailed and Realistic Persona Profiles Are All You Need. These findings offer a practical path for scalable, ethics-conscious social simulations using LLMs with implications for methodology and policy in AI-enabled social science research.
Abstract
This research focuses on using large language models (LLMs) to simulate social experiments, exploring their ability to emulate human personality in virtual persona role-playing. The research develops an end-to-end evaluation framework, including individual-level analysis of stability and identifiability, as well as population-level analysis called progressive personality curves to examine the veracity and consistency of LLMs in simulating human personality. Methodologically, this research proposes important modifications to traditional psychometric approaches (CFA and construct validity) which are unable to capture improvement trends in LLMs at their current low-level simulation, potentially leading to remature rejection or methodological misalignment. The main contributions of this research are: proposing a systematic framework for LLM virtual personality evaluation; empirically demonstrating the critical role of persona detail in personality simulation quality; and identifying marginal utility effects of persona profiles, especially a Scaling Law in LLM personality simulation, offering operational evaluation metrics and a theoretical foundation for applying large language models in social science experiments.
