Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

Georg Ahnert; Anna-Carolina Haensch; Barbara Plank; Markus Strohmaier

Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

Georg Ahnert, Anna-Carolina Haensch, Barbara Plank, Markus Strohmaier

TL;DR

This work addresses the lack of standardization in simulating closed-ended survey responses with LLMs by systematically comparing eight Survey Response Generation Methods across four political-attitude datasets and ten open-weight LLMs. It evaluates both individual-level and subpopulation-level alignment using macro F1-scores and distributional distances, revealing substantial differences across methods. The findings show that Restricted Generation Methods, especially Restricted Choice, yield the best overall alignment and are more computationally efficient than Open Generation approaches, while Token Probability-Based Methods perform poorly, and reasoning-based outputs do not reliably improve results. The study provides practical guidelines for selecting SRG methods in in-silico surveys and discusses limitations, generalizability, and ethical considerations for future work.

Abstract

Many in-silico simulations of human survey responses with large language models (LLMs) focus on generating closed-ended survey responses, whereas LLMs are typically trained to generate open-ended text instead. Previous research has used a diverse range of methods for generating closed-ended survey responses with LLMs, and a standard practice remains to be identified. In this paper, we systematically investigate the impact that various Survey Response Generation Methods have on predicted survey responses. We present the results of 32 mio. simulated survey responses across 8 Survey Response Generation Methods, 4 political attitude surveys, and 10 open-weight language models. We find significant differences between the Survey Response Generation Methods in both individual-level and subpopulation-level alignment. Our results show that Restricted Generation Methods perform best overall, and that reasoning output does not consistently improve alignment. Our work underlines the significant impact that Survey Response Generation Methods have on simulated survey responses, and we develop practical recommendations on the application of Survey Response Generation Methods.

Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

TL;DR

Abstract

Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)