QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models
Maximilian Kreutner, Jens Rupprecht, Georg Ahnert, Ahmed Salem, Markus Strohmaier
TL;DR
QSTN tackles robustness issues in questionnaire-style prompting for large language models by providing a modular, open-source framework to systematically vary questionnaire presentation, prompt perturbations, and response generation methods. It supports both local and remote inference alongside a no-code UI, enabling scalable in-silico surveys and reproducible analyses across datasets and models. Key findings show that questionnaire presentation and response-generation strategy substantially influence alignment with human answers and can reduce computational costs, with battery-style presentation often yielding the best subpopulation alignment and restricted generation offering efficiency gains. The work advances reproducibility in LLM-based questionnaire research and broadens applicability to data annotation, psychometrics, and persona studies.
Abstract
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation ($>40 $ million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers, and can be obtained for a fraction of the compute cost. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.
