Generative Personality Simulation via Theory-Informed Structured Interview
Pengda Wang, Huiqi Zou, Han Jiang, Hanjie Chen, Tianjun Sun, Xiaoyuan Yi, Ziang Xiao, Frederick L. Oswald
TL;DR
The paper addresses the scarcity of heterogeneous, human-like data in LLM-based psychometrics by introducing PSI, a theory-informed structured interview framework that elicits personality-relevant narratives. PSI is evaluated via a measurement-theory pipeline, including reliability, validity, and CFA-based structure, across three experiments comparing LLM-simulated data to human data. The authors release a 357-transcript PSI dataset and demonstrate that PSI improves heterogeneity, preserves latent structure, and predicts personality-related behaviors more faithfully than prior methods. The approach offers a scalable, interpretable path to generating psychometrically grounded, human-like data for AI-assisted social science research.
Abstract
Despite their potential as human proxies, LLMs often fail to generate heterogeneous data with human-like diversity, thereby diminishing their value in advancing social science research. To address this gap, we propose a novel method to incorporate psychological insights into LLM simulation through the Personality Structured Interview (PSI). PSI leverages psychometric scale-development procedures to capture personality-related linguistic information from a formal psychological perspective. To systematically evaluate simulation fidelity, we developed a measurement theory grounded evaluation procedure that considers the latent construct nature of personality and evaluates its reliability, structural validity, and external validity. Results from three experiments demonstrate that PSI effectively improves human-like heterogeneity in LLM-simulated personality data and predicts personality-related behavioral outcomes. We further offer a theoretical framework for designing theory-informed structured interviews to enhance the reliability and effectiveness of LLMs in simulating human-like data for broader psychometric research.
