Quantifying the Persona Effect in LLM Simulations
Tiancheng Hu, Nigel Collier
TL;DR
This work quantifies how much persona variables account for annotation variance in subjective NLP tasks and examines whether prompting LLMs with persona information can improve prediction of human annotations. Using a mixed-effects framework across multiple datasets, persona variables explain only a small fraction of variance, while text effects dominate. In zero-shot experiments, persona prompting yields modest but statistically significant improvements, with larger gains in samples exhibiting high annotator disagreement but low overall dispersion. A case study with ANES data reveals a linear relation between target and predicted explainable variance, showing large models can approximate a substantial share of ground-truth variance but struggle when baseline predictability is low, highlighting the limits of current persona-prompting for diverse perspectives. The findings call for cautious deployment of LLM-based simulations, improved dataset designs to elevate predictor power, and thorough validation to ensure fidelity to human perspectives across domains and languages.
Abstract
Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating persona variables via prompting in LLMs provides modest but statistically significant improvements. Persona prompting is most effective in samples where many annotators disagree, but their disagreements are relatively minor. Notably, we find a linear relationship in our setting: the stronger the correlation between persona variables and human annotations, the more accurate the LLM predictions are using persona prompting. In a zero-shot setting, a powerful 70b model with persona prompting captures 81% of the annotation variance achievable by linear regression trained on ground truth annotations. However, for most subjective NLP datasets, where persona variables have limited explanatory power, the benefits of persona prompting are limited.
