Table of Contents
Fetching ...

Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency

Jan Batzner, Volker Stocker, Bingjun Tang, Anusha Natarajan, Qinhao Chen, Stefan Schmid, Gjergji Kasneci

TL;DR

The paper investigates how synthetic personae are used in LLM alignment research and finds pervasive gaps in task specification, population representativeness, and ecological validity. It conducts a structured review of 63 studies from 2023–2025 and develops a Persona Transparency Checklist organized around six dimensions to improve rigor and reproducibility. The authors provide six concrete recommendations for clearer task definition, explicit population targeting, empirical grounding of data, ecological validity considerations, full reproducibility, and acknowledgment of author context. Collectively, the work advances transparency and generalizability in persona-based evaluations for high-stakes LLM applications.

Abstract

Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.

Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency

TL;DR

The paper investigates how synthetic personae are used in LLM alignment research and finds pervasive gaps in task specification, population representativeness, and ecological validity. It conducts a structured review of 63 studies from 2023–2025 and develops a Persona Transparency Checklist organized around six dimensions to improve rigor and reproducibility. The authors provide six concrete recommendations for clearer task definition, explicit population targeting, empirical grounding of data, ecological validity considerations, full reproducibility, and acknowledgment of author context. Collectively, the work advances transparency and generalizability in persona-based evaluations for high-stakes LLM applications.

Abstract

Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.

Paper Structure

This paper contains 32 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Differences in Synthetic Persona Construction. Demonstrated on an adapted example from hu2024quantifying, a study included in our review corpus.
  • Figure 2: Persona Transparency Checklist.
  • Figure 3: Global Author Location Distribution: The major author university affiliations in our corpus are the USA (102, 34%), China (54, 18%), South Korea (52, 17%), India (23, 8%), Singapore (17, 6%), and Japan (15, 5%), among others.
  • Figure 4: Pathways Toward Enhanced Transparency: Recommendations for Synthetic Persona Construction.