German General Personas: A Survey-Derived Persona Prompt Collection for Population-Aligned LLM Studies
Jens Rupprecht, Leon Fröhling, Claudia Wagner, Markus Strohmaier
TL;DR
This work introduces German General Personas (GGP), a survey-derived, population-aligned collection of 5,246 persona prompts grounded in the ALLBUS German General Social Survey. By combining a fixed core demographic block with a data-driven TOP-k attribute set, GGP enables plug-and-play prompting of diverse LLMs to simulate population response distributions across 27 outcome variables. Across a range of models and data regimes, GGP-based prompts often outperform baseline distribution-prediction methods, especially in data-scarce settings, and exhibit limited sensitivity to representativity. The study highlights the utility of principled attribute selection and prompt-based population alignment for scalable, interpretable NLP-based social simulations, while noting ethical and methodological caveats. Overall, GGP provides a practical, empirically grounded resource for researching population-aligned prompting in NLP and CSS contexts.
Abstract
The use of Large Language Models (LLMs) for simulating human perspectives via persona prompting is gaining traction in computational social science. However, well-curated, empirically grounded persona collections remain scarce, limiting the accuracy and representativeness of such simulations. Here we introduce the German General Personas (GGP) collection, a comprehensive and representative persona prompt collection built from the German General Social Survey (ALLBUS). The GGP and its persona prompts are designed to be easily plugged into prompts for all types of LLMs and tasks, steering models to generate responses aligned with the underlying German population. We evaluate GGP by prompting various LLMs to simulate survey response distributions across diverse topics, demonstrating that GGP-guided LLMs outperform state-of-the-art classifiers, particularly under data scarcity. Furthermore, we analyze how the representativity and attribute selection within persona prompts affect alignment with population responses. Our findings suggest that GGP provides a potentially valuable resource for research on LLM-based social simulations that enables more systematic explorations of population-aligned persona prompting in NLP and social science research.
