Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

Erika Elizabeth Taday Morocho; Lorenzo Cima; Tiziano Fagni; Marco Avvenuti; Stefano Cresci

Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

Erika Elizabeth Taday Morocho, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, Stefano Cresci

TL;DR

It is found that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance, highlighting a key adverse impact of current persona-based simulation practices.

Abstract

Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance. Persona effects are highly heterogeneous as most items exhibit minimal change, while a small subset of questions and underrepresented subgroups experience disproportionate distortions. Our findings highlight a key adverse impact of current persona-based simulation practices: demographic conditioning can redistribute error in ways that undermine subgroup fidelity and risk misleading downstream analyses.

Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 1 figure, 5 tables)

This paper contains 19 sections, 3 equations, 1 figure, 5 tables.

Introduction
Related Work
LLMs as Synthetic Survey Respondents
LLM Personas, Personality, and Alignment
LLM Agents and Social Simulations
LLM Data for Computational Social Science
Data and Methods
Dataset and Preprocessing
Survey data
Persona attributes
Evaluation Setup
Prompting conditions and matched comparison
Decoding protocol
Response parsing and quality control
Evaluation metrics
...and 4 more sections

Figures (1)

Figure 1: Detailed results for the Llama-2-13B (top row) and Qwen3-4B (bottom row) models. For each model, we report a comparison of hard similarity (HS, blue-colored, left panel) and soft similarity (SS, red-colored, central panel) scores obtained for each question using the vanilla (V, x-axis) and persona-based (PB, y-axis) versions of the models. Additionally, for each model, the right panel shows the item-wise differences between the PB and V model variants, in terms of HS (blue-colored, top) and SS (red-colored, bottom). Positive differences ($>0$) indicate that PB outperforms V for the corresponding item and metric.

Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

TL;DR

Abstract

Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

Authors

TL;DR

Abstract

Table of Contents

Figures (1)