Table of Contents
Fetching ...

Quantifying the Persona Effect in LLM Simulations

Tiancheng Hu, Nigel Collier

TL;DR

This work quantifies how much persona variables account for annotation variance in subjective NLP tasks and examines whether prompting LLMs with persona information can improve prediction of human annotations. Using a mixed-effects framework across multiple datasets, persona variables explain only a small fraction of variance, while text effects dominate. In zero-shot experiments, persona prompting yields modest but statistically significant improvements, with larger gains in samples exhibiting high annotator disagreement but low overall dispersion. A case study with ANES data reveals a linear relation between target and predicted explainable variance, showing large models can approximate a substantial share of ground-truth variance but struggle when baseline predictability is low, highlighting the limits of current persona-prompting for diverse perspectives. The findings call for cautious deployment of LLM-based simulations, improved dataset designs to elevate predictor power, and thorough validation to ensure fidelity to human perspectives across domains and languages.

Abstract

Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating persona variables via prompting in LLMs provides modest but statistically significant improvements. Persona prompting is most effective in samples where many annotators disagree, but their disagreements are relatively minor. Notably, we find a linear relationship in our setting: the stronger the correlation between persona variables and human annotations, the more accurate the LLM predictions are using persona prompting. In a zero-shot setting, a powerful 70b model with persona prompting captures 81% of the annotation variance achievable by linear regression trained on ground truth annotations. However, for most subjective NLP datasets, where persona variables have limited explanatory power, the benefits of persona prompting are limited.

Quantifying the Persona Effect in LLM Simulations

TL;DR

This work quantifies how much persona variables account for annotation variance in subjective NLP tasks and examines whether prompting LLMs with persona information can improve prediction of human annotations. Using a mixed-effects framework across multiple datasets, persona variables explain only a small fraction of variance, while text effects dominate. In zero-shot experiments, persona prompting yields modest but statistically significant improvements, with larger gains in samples exhibiting high annotator disagreement but low overall dispersion. A case study with ANES data reveals a linear relation between target and predicted explainable variance, showing large models can approximate a substantial share of ground-truth variance but struggle when baseline predictability is low, highlighting the limits of current persona-prompting for diverse perspectives. The findings call for cautious deployment of LLM-based simulations, improved dataset designs to elevate predictor power, and thorough validation to ensure fidelity to human perspectives across domains and languages.

Abstract

Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating persona variables via prompting in LLMs provides modest but statistically significant improvements. Persona prompting is most effective in samples where many annotators disagree, but their disagreements are relatively minor. Notably, we find a linear relationship in our setting: the stronger the correlation between persona variables and human annotations, the more accurate the LLM predictions are using persona prompting. In a zero-shot setting, a powerful 70b model with persona prompting captures 81% of the annotation variance achievable by linear regression trained on ground truth annotations. However, for most subjective NLP datasets, where persona variables have limited explanatory power, the benefits of persona prompting are limited.
Paper Structure (28 sections, 4 figures, 6 tables)

This paper contains 28 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Illustration of persona prompting. We prepend the persona information of an annotator before the text sample and task description to investigate the capacity of LLMs to simulate diverse perspectives in subjective NLP tasks.
  • Figure 2: Mean improvement in MAE with persona prompting across four 70b models in annotations characterized by low/high entropy and standard deviation, with darker colors denoting more substantial improvement in predictions.
  • Figure 3: Comparison of predicted $R^2$ and target $R^2$. Each point in the X-Y plane represents an experimental result with persona prompting, where the x-coordinate signifies the target $R^2$ and the y-coordinate denotes the predicted $R^2$. We then fit a linear regression line and also plot the maximum linear regression model performance line $y=x$ in the same figure.
  • Figure 4: Comparison of predicted $R^2$ and target $R^2$. Each point in the X-Y plane represents an experimental result with persona prompting. We then fit a linear regression line and also plot the maximum linear regression model performance line $y=x$ in the same figure.